Data Quality for Azure Data Lakes
Integrate with Pipelines or Processing Output Downstream
Invoke Data Culpa Validator where you import data as well as where you process it downstream. Gain insights about how imported data affects data as it’s processed downstream and used for analytics, business transactions, or whatever your data use case is.
Share Responsibility with Data Owners and Data Stewards
Tell the file’s owner that someone has gone sideways with the files they are depositing in the data lake before someone downstream has to figure out what’s going on.
Establish Gold Standards for Data and Measure Data Imports Against Them
Without requiring your team to invest long hours manually writing unit tests, Validator automatically models data flows and derives a gold standard for expected data profiles.
When new data arrives in the data lake, Validator automatically compares it to the gold standard and raises detailed alerts if significant anomalies have appeared.
Reduce MTTD and MTTR
Reduce both Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) for data projects.
When you’re consuming data from a third party, your output is only as good as your input. And when you don’t control the input, it’s difficult to know how things are going. Too often, data teams only discover problems when their data project “customer” notices an anomaly.
Validator helps you discover the root cause tied to specific records or fields at specific points in time, so you can quickly zero in on the root cause.
Data Culpa Validator helps you find missing columns, changes in formats or values, before your end-user questions the results of your work.
Data Watchpoints for Specific Stages in a Pipeline
Data Culpa enables you to set up Watchpoints that monitor data flows in a specific pipeline location or data repository over time.
Data Flows Reduce False Alerts
You can link Watchpoints into Flows, which might correspond to a pipeline elsewhere or to a set of pipelines. Flows enable you to reduce false alerts by only firing when the output of data processing has been impacted, while also monitoring inputs that drive that output.
Configurable to Meet Your Needs
We believe “the best UI is no UI,” so we have built our solution to stay out of the way, behind the scenes, except when data quality issues needs to be addressed.
You can configure Watchpoint alarms from the GUI or from a YAML file sourced in GitHub. With numerous controls for frequency and severity, you can tune Data Culpa’s behavior for what you care about and how and when you want to hear from it.
One of the biggest challenges with data quality is that it is contextual. One user doesn’t care about zip codes. Another doesn’t care about phone numbers. Data Culpa enables your users to configure self-service alerts and warnings based on their needs for their data processing.
Support for Major Data Formats
Data Culpa supports structured, semi-structured, and unstructured data.
Data Culpa Validator supports CSV and JSON out of the box and warns you when passed data isn’t parsable. Maybe you have CSVs with extra delimiters or screwy quotes in your data lake. Perhaps someone wrote a JSON dumper that breaks on a new record. Data Culpa helps you zero in on errors like these right away.