3 min read

Namara Ingest vs. a Standard ETL

Matt Pollack January 15, 2020 5:06:24 PM EST

Data Ingest Tech Comparison Data Catalog

One of the greatest challenges facing ETL engineers is:

How can we deal with data variety efficiently?

There’s no clear answer, and seemingly no end to the work – every day, new cases need unique handling as the world of available data grows. This is especially true with open data. A variety of machine and manual data entry systems serve opinionated data, gathered over decades through different means, with no standard way to present it. Entangled pipes going in every direction

Using data from anywhere means handling the variety that comes with it – is your organization ready?

Cellular data formats, like .xlsx, are as blank of a canvas as it gets. These formats have no common standard enforced on them, and that causes problems – a defined standard is a prerequisite for automation. Generic ETL solutions provide tools which combine to solve problems case-by-case, but when the goal is to gather and update thousands of datasets per day, it’s impossible to maintain. Can common ETL pipelines handle variety at scale?

Here’s how it might be handled:

A typical ETL

The first step is connecting to sources, each with their own communication methods and nuances. If you’re lucky, your solution has mappers built to connect to the resources these portals provide. Even so, these resources often exist in chaotic and problematic environments. In fact, the average open data network is prohibitively slow, and doesn’t provide random access, a feature that most solutions rely on for performance optimization. Due to this, each file needs to be downloaded before processing.

Handling file types is the first thing solved by any ETL, so even if the datasets are in different formats, the data should be flowing in smoothly. However, problems arise when archives contain multiple file type options for the same dataset. Now, selection needs to be handled for each case so data isn’t brought in twice or lost. If file selection doesn’t go perfectly, there needs to be a method for mistakes to be corrected. typical ETL workflow – consecutive, time-consuming, and messy

typical ETL workflow – consecutive, time-consuming, and messy

Where are you now?

You’ve now spent hours wrestling with one new source before it’s even usable in your ecosystem. Moreover, we haven’t even discussed headers, footers, guessing column data types, and malformed cells – each of these obstacles is a major time penalty. The advertised performance of the other ETL solution no longer seems very honest.

A typical Atypical ETL

Ingest is ThinkData's solution to the variety problem. It understands how data portals work, and it does so without affecting the output data rate – zero overhead, zero penalty.

Ingest is able to operate faster than industry standards and allows data teams to operate faster, too, even when faced with the variety found in open and public data portals. It understands when archives contain duplicate data, allows for specific file selection within archives, and can easily handle datasets with custom headers, footers, and other artifacts which need special consideration.

Namara ELT/ELR workflow – concurrent jobs, faster workflows, and simple operation

Ingest operates on a streaming download with zero overhead – the data is handled as quickly as it’s served out.

Another boon to your efficiency is that our pipeline doesn’t require a developer to fix the workflow if something goes wrong. Ingest makes manual intervention simple, declarative, and centrally located, and all of these features work together automatically. There’s no costly downtime for every new dataset, and no need for an entire team of people every time you want to bring data into Namara and your organization. What lands in your data warehouse is a polished, ready-to-use product.

Comparison between Namara Ingest and typical ETL solutions

Both solutions are capable of achieving the same end but there are some major differences in the processes:

The generic solution has some appealing features, but requires substantial configuration on every operation, and requires constant management by an experienced team. Ingest, on the other hand, was purpose-built to handle variety, and allows you to tap into data that otherwise wouldn’t make business sense. With the ability to achieve what the generic ETL tool accomplishes and more, Ingest provides workflow benefits that have never before been offered.

The future of Ingest

We’re incredibly proud of where we’ve come, and plenty of companies in the data technology space have taken notice of our progress. Our team is working diligently to expand the capabilities of Ingest with regard to both the variety of data we can handle and the size of the datasets themselves – creating faster, more robust, battle-hardened tools for the enterprise.

Want to learn more about Ingester?

Request a consultation with one of our data experts to learn about how ThinkData’s tech can advance your projects.

A futuristic loom weaving a digital fabric with the text:

4 min read

Data in 2024 — How Should You Prepare?

October 12, 2023 3:22:56 PM EDT

January is closer than it seems, and it's time to start planning (that's not just a note to myself about buying gifts before December 23rd)....

Featured image with a warning sign over a cargo ship and the text: How to better leverage data in risk management

4 min read

How to better leverage data for risk management and crisis response

September 8, 2023 1:06:16 PM EDT

It’s becoming increasingly difficult to manage risk in a global climate that’s growing in complexity. On the other side of this coin, however, is an...

Business Insights Supply Chain Data Use Case

3 min read

Next-Gen Data Catalogs: Eckerson TechVent Summary

July 18, 2023 9:50:46 AM EDT

Wondering what's up with data catalogs lately? We were part of a group of data and governance experts at a virtual event hosted by Eckerson Group to...

Business Insights Data Catalog

Data Catalog Platform

Featured Resource

Calculate your ROI

By Use Case

Featured Solution

See data's value

Data Products

Data Services

Featured Solution

Deliver insights

Company

Resources