Data Management: Buy vs. Build

Shared by  Andrew Armour  on July 16, 2020

in Data Management, Business Insights, Data Strategy

The State of Data in the Enterprise

The new reality businesses face is fairly simple: capital expenditures have been frozen and finding inefficiencies in operational expenses has become priority one.

McKinsey research found, even before the pandemic hit, that 92% of organizations thought their business models would require a significant effort in digital transformation to remain competitive.

Based on the recent and ongoing economic downturn, “spend” is a scary word. Digital transformation and data growth may take a back seat to bottom-line savings and stamping out redundancies. But that leaves organizations with a difficult question: "How long can we postpone access to the data that can drive our business forward?"

Data, Data Everywhere (and not a drop to drink)

Although there is a lot of data available, not all data is accessible. The key challenges in connecting to data surround the clerical tasks that are required to make data usable. Data preparation is the least attractive but most time-consuming part of any data professional’s job, and often is the biggest hindrance to data science innovation.

With the volume of data available, it’s challenging for organizations to determine what data they need, and harder still to locate the sources and datasets. Connecting to different portals for different data, navigating licences, and monitoring for updates are all requirements. Even when that’s done, the data is often not in the right format for analysis.

The next step is transformation, to structure the data and schema to align with your needs. Then sharing it across your organization while ensuring that the data is both current and valid, and that you can trace its structure and content across updates. If it sounds like a lot to do, it is.

Data Management - Buy or Build?

These kinds of tasks take up serious amounts of time when you’re connecting to anything more than a few sources. This means that while you might be seeing some benefits from using more data in your organization, it’s probably not getting you the results you thought it would. Don’t worry though, you’re in good company – almost 70% of large companies say they’re not truly data-driven.

Eventually, the need for a data management platform will catch up to any organization, and out of that need will come a new question: Build or Buy?

The Challenges of an Internal Build

Companies need to find ways of connecting to data in a repeatable, scalable way. Ad hoc data isn’t an option, because if you need more data, then you need more people. That’s not a scalable model. So what does it look like to build your own data processing pipeline?

A diagram of the time, skills, and money needed to build a data management platform


While there are benefits to building in-house, it’s expensive, from both a time and money perspective. Hiring, designing, launching, hosting, warehousing, and maintaining are each massive undertakings.

And, given that data technology is the hottest sector right now, the technologies are changing daily. When data isn’t your core competency and your finger isn’t on the pulse, your solution might be obsolete before it ever sees the light of day.

High Overhead, Low Output

So you don’t need a full end-to-end management platform (yet) – what if you connect to just a few sources? What if the data doesn’t update too often?

Even for (relatively) small quantities of data, it still makes sense to have a central access platform. Auditing your data processes when every case is handled differently becomes a nightmare. It might be easier just to assume it’s making you money because, hey, it’s data – everybody’s using it so it must be good.

Data janitorial tasks take up 80% of a data scientist's time


When your data science teams are bogged down with prep and custodial work that takes up more than 80% of their time (the right side of the diagram), it shouldn’t come as any surprise that business outcomes (the left side) aren’t where they should be.

Using centralized access allows admins to see redundancies among your existing subscriptions. It also takes away the need for each team to do their own data cleansing in isolation.

Realistically, you’re missing several opportunities: stopping the loss of efficiency; better overall project outcomes using higher quality data; allowing your data scientists to focus on the things you hired them for; and freeing up hours and dollars to flow new data into your organization. Even when it comes to individual datasets and seemingly simple connections, it makes a lot of sense to start with high-quality data from a single source of truth, and an up-front purchase can save you ongoing costs that add up fast.

Tech Debt Has High Interest

Getting stuck into ways of working is common. Some measure of tech debt is expected, but it’s not always easy to see when that starts costing you more than it’s earning. Innovation happens fast, and legacy systems and methods might be weighing you down.

Seeing where you are now is the first step towards improving, and if it’s hard to audit your systems, you’ve already encountered something that needs fixing (fast).

Data Governance, Quality, and Auditability

So, let’s say your in-house solution really knocks it out of the park on data connections, access, and distribution. What about governance? Security, transparency, and traceability need to be high priority when you’re working with data. Data governance is an extremely difficult component to build into an external data pipeline, as it requires auditability and monitoring to ensure data integrity.

Finding the Right Solution for Your Organization

There’s no one-size-fits-all – every company’s capacities and capabilities are different. When it comes to an external data management solution, there is a significant amount of time, money, and expertise needed to build it from scratch. Even if you’re only connecting to a few sources, there are good reasons to opt for “buy” instead of “build” to make sure you’re getting the most out of every data point.

We’re weathering an economic storm – it’s crucial for your organization to find ways to optimize productivity and eliminate redundancies. Data is not only a crucial component required for innovation but for business continuity. The question is not whether you need a data management solution, but how to spend smart and get a return on every penny.



ThinkData offers a lot more data than that – over 250,000 datasets from more than 75 countries around the world. Browse the Namara Marketplace or request a consultation with one of our data experts to determine a data strategy to help your business succeed.


Never miss an update from ThinkData: