For most organizations, the process of becoming more data-driven starts with better understanding and using their own data. But internal data is just the tip of the iceberg. Underneath the surface of the (data) lake is the untapped value of external data, which has given rise to the data marketplace.
The role of external data in your data strategy
External data is a bit of a mystery to some – we define external data as data that exists outside the four walls of your business. By using external data in conjunction with internal data, businesses can fuel new insights.
According to IDC, 75% of enterprises in 2021 will use external data sources to strengthen cross-functional and decision-making capabilities. External data provides an opportunity to add new data into your ecosystem, enrich your existing database with public data, and test your models with new sources of information. It provides contour and dimension to your internal data, helps you find new areas for exploration and spot points of weakness before they cause issues. It is both equalizer and enhancer, reducing the risk of ‘data tunnel vision’ caused by internal data that may be too narrow in scope, while simultaneously providing new opportunities for growth.
The increased value of external data has created a market that caters to both data buyers and sellers. Organizations are not only eager to find new sources of data that they can use, they are increasingly aware that their own data has value. From these conditions, the concept of a data marketplace has emerged.
The data marketplace, defined
As its name suggests, a data marketplace is a storefront for data; a place where one party can go to discover data from any number of sources. Given the highly distributed nature of consumable data (open data from government portals, unstructured data from websites, third party data from data providers), consumers need a central repository from where they can browse the entire universe of external data.
This might seem like a given: in order to get and use data, you need to be able to find it. Surprisingly, however, the difficulty of finding data is often overlooked. Why? At least part of the problem is that the exponential increase of data, especially public data, creates the illusion of availability. If there’s data everywhere, the logic goes, it’s easy to find.
The reality is that while anyone can find data, finding the right data is hard. Without a system of categorization, metadata and source management, etc., looking for data online is like trying to find a specific book in a library with no system, no shelves, and no book covers.
A data marketplace consolidates this data into a central repository and includes valuable metadata so that it’s easier to index and find. That’s the buy side. From a seller’s perspective, the data marketplace is a central commons through which they can provide their data and connect to purchasers through a secure platform.
As the value of data continues to increase, the rise of the data marketplace incentivizes data sharing. In fact, Gartner predicts that 35% of large organizations by 2022 will be either buyers or sellers of data through an online data marketplace.
The larger the market becomes, the more people want to be a part of it because it’s a new revenue stream. The data comes from various sources ranging from governments and NGOs, to special interest groups, to major corporations.
History of the data marketplace
Data vendors and public data have been around since the start of the web, but the past 10 years have seen meteoric growth in the amount of data out there. From the emergence of open data in the mid 2000s to the advent of the cloud marketplace, the maturation curve of this development consistently follows streamlined discovery and use.
While discoverability and use have become the primary requirements for a data marketplace, there is little else that’s agreed upon as standard practice. Considerations like whether data in a marketplace should be raw or standardized, whether its organization should be organic or curated, and how users will connect to the data are all open to interpretation, and the variety in data marketplaces highlights this identity crisis.
From completely open forums like Data.gov and the EU open data portal, which provide free access to millions of unstandardized datasets, to highly curated cloud marketplaces like the Snowflake Data Exchange, there is an open debate at the heart of data marketplaces that hasn’t been settled: what’s the best way to get data into the hands of consumers?
For some, the answer to this question is about open access. data.world, launched in 2016, takes a community approach to the data marketplace, letting anyone add datasets to a public catalog. By catering to an open community of users with a range of needs, from civic hackers to data scientists, data.world has become a kind of YouTube for data consumers, letting anyone upload their data and seeing what rises to the top.
Others have taken a more rigid approach. For cloud providers like Snowflake, a data marketplace is an opportunity for users to exchange data through the platform. Once the pipes have been set up between databases, users may connect to each others' data with limited overhead. By incentivizing their users to access each others' data through a central commons, the Snowflake Data Exchange increases platform adoption and provides a benefit to consumers who migrate their data to a Snowflake environment. It's a clever redesign of the traditional data marketplace that eliminates the traditional headache of integration (provided you're already using their platform).
Here are a few of the major players who provide publicly accessible data marketplaces:
Benefits of a data marketplace for your data catalog
Discover and access new external data
The importance of including external data in your data strategy is well-established. But with the amount of data out there, it’s hard to navigate through datasets and find the one you need. This is where the data marketplace has served a key function: it centralizes the repository of datasets so organizations can discover and access external data from a common location.
By maintaining and indexing source metadata, the marketplace provides an important degree of information management, letting users search not only by source or keyword but also on attributes within the data, column titles, or data size. This is where the data marketplace becomes useful. But discoverability is only one part of the equation. Finding data is an annoyance, but after it’s found, connecting to it may in fact be the larger inconvenience.
Take Google Dataset Search, for example. Google made waves a few years ago when they announced that they’d indexed the entire world of public data and made it searchable, just like their flagship search engine. The problem with this approach, however, is that indexing data may put data at your fingertips, but it doesn’t render it functionally usable; it’s still just a piece of information that exists on the web. “I know how to find that” is not as valuable as “I know how to use that.”
At its core, a data marketplace primarily enables discovery. But after discovery, it’s critical that the marketplace at least points towards a solution that will help you ingest the underlying data.
Monetizing your own data
As more businesses demand external data, its value in the market increases, which has enabled organizations to turn their datasets into a new revenue stream. The data marketplace creates a secure platform to monetize and distribute the data products that organizations create.
Data volumes are growing exponentially and virtually every business produces valuable data that can be monetized. Unfortunately for most companies, it is sitting idle in their internal databases. In fact, Forrester reports that on average, 60-73% of all data within organizations goes unused. Much of this data can be monetized via a data marketplace.
Easier data discovery
The data marketplace also enables easier data discovery by creating a central place for all of your potential data sources. In addition, having an API-enabled data marketplace of trusted data speeds your time to insight for innovation.
Before the data marketplace, data vendors would sell their datasets directly to users as a flat file. Users derive little value from this because they need to handle and manage the data themselves. The data marketplace not only provides the dataset, but the fine-tuning that transforms raw data into ready-to-use products.
The data marketplace will continue to grow with the industry, and maybe even outpace it (we predict data discovery will become a very prominent theme in the next 5 years). As data providers generate more data, and the desire to become more data-driven increases in every organization, the value of a platform to facilitate this exchange grows.
There is a need for frictionless access to curated data that can be integrated into a data catalog. The future of the data marketplace is not just to sell data, but to offer intelligence, insights, and a competitive advantage.