Data Quality Intelligence for Modern Data Infrastructures
Your business depends on data for transactions and analytics. But data values and schemas can shift unexpectedly, jeopardizing transaction integrity and analytical results.
Data Culpa catches errors before they steer your business off course.
We help teams working with Machine Learning frameworks and new data platforms such as Snowflake ensure that they’re working with data they can trust.
Discover Schema and Data Changes Before Processing
Get alerts about changes before they affect data engineering results
The typical enterprise is working with data from at least 400 data sources. Those data sources are feeding data into your data lakes and data pipelines. When schemas or data values in that incoming data changes, you need to know right away – not a week later when you discover your BI or ML results are off. Data Culpa detects these changes automatically without requiring you to code countless unit tests. Discover changes at a glance, so you can take the best course of action in time.
Get the Alerts You Care About
Contextual data quality insights
Even when using the same data source, your needs change from pipeline to pipeline, or from ML model to ML model. When you’re building across data feeds to implement different multiple goals, data quality is contextual: what is important to one data engineer, data scientist, or data steward might not be to another. Data Culpa lets each data stakeholder define alerts for the data and pipelines they care about. The result: more signal and less noise.
Data Quality Time Machine
Track data consistency over time
When things go wrong in production, the Mean Time To Resolve (MTTR) can be long: understanding how a schema, record throughput, or changes in values can be tricky to diagnose. Sure you can hack together some kind of comparison engine, but the best time to do that was three months ago.
With Data Culpa, you can simply send data that you’re ingesting with asynchronous processing and catch up on changes when they become apparent. Data Culpa can help reduce “time to detect” by continuously monitoring for sudden shifts in addition to helping you resolve problems once they are found.
Sometimes you’ll need to load or reload a bunch of historical data. Data Culpa lets you tag your initial ingest of data or data “replays” with a timeshift to keep time context and understand data changes.
Need to compare this week’s data to last month’s or last month’s to last year’s? Data Culpa makes these comparisons fast and easy.
The Gold Standard
When you’re importing data as part of a repeatable process, it’s important to know whether the data’s schema and distribution of values match your expectations.
Using Validator, you can automatically compare an imported data set to another data set that you’ve identified as the “gold standard.” Validator alerts you right away if the new schema or values are out of line with the model you’re expecting. If the profile of incoming data doesn’t match your gold standard, you can take appropriate action before ingesting the data and possibly jeopardizing business results.
Import CSV and JSON from continuous running data processes using our open source Python API.
“No code” integrations with Snowflake, MongoDB, Azure Data Lakes. Our connectors are open source and more are on the way.
Compare production vs test environments with ongoing statistical differencing in a modern web UI.
Call Data Culpa Validator from any data pipeline, including programs written in Python, Java, or Ruby. Validator detects changes in structured, semi-structured, and unstructured data.
Validator works with popular data platforms, including: