Data Quality for External Data Feeds
Almost every company relies on external data, whether that data contains purchase orders, stock prices, shipping data, weather data, or other data that will be used in some way for analytics or operations.
As much your company is using external data now, it will probably rely on external data even more in the future. About 92% of data scientists think their companies should do even more with external data. So if your data teams have anything to say about, you’ll be working with more external data, probably in a broader range of formats, in the future.
The challenge with external data is one of control: you don’t control the schema or values of the data coming in. Partners, customers, and even other data teams in your organization might change schemas (for example, renaming columns) without telling you. Or they might send you data this week that looks like nothing you’ve ever seen before.
Chances are, your data team would probably like to know about these changes before they end up affecting business results, such as a BI report with skewed analysis or business transactions that end up costing your company a lot of money.
It’s hard to change decisions once they’ve been made. What’s easier — especially with Data Culpa — is sanity-checking the data that led to those decisions before they get made.
Sanity-Check External Data Before Processing It
You don’t have be to caught offguard by changes in external data. With a simple programming call, you can pass your external data feeds directly to Data Culpa Validator for quick analysis before you import the data in full.
Validator automatically determines if fields have gone missing or changed drastically, or if entire payloads are suddenly duplicates, indicating stale data upstream. Validator can inspect any type of data, including JSON files, CSVs files, or data streams accessed through APIs.
Read our blog post about why external data is more important than ever for data teams.