Catch Data Quality Problems Before They Jeopardize Data Pipelines and Databases
Data Culpa gives data scientists and data engineers near-real-time alerts about changes in data schemas and data values that could jeopardize the integrity of data pipelines and databases. Using Data Culpa Validator, data teams can:
Discover problems before they compromise analytical results and ETL pipelines.
Reduce Mean Time to Repair (MTTR) from days to minutes.
Avoid having to rerun pipelines or AI analyses because of poor data quality.
Spend more time analyzing data and less time cleaning and fixing it.
How It Works
Data Culpa Validator builds metadata models for all the pipeline stages where Validator is invoked. Using these models, Validator detects changes and performs comparisons upon request.
When Validator detects a change that matters, it sends notifications via standard alerting mechanisms, such as Slack, email, or SMS (for PagerDuty or other alerting services).
Data Culpa Validator as a SaaS service or a Linux package for on-premises deployment.
Call Data Culpa Validator from any data pipeline, including programs written in Python, Java, or Ruby. Validator detects changes in structured, semi-structured, and unstructured data.
Validator works with popular data platforms, including:
Got Big Data? Get the Big Picture
Using advanced techniques for analysis, storage, and visualization, Data Culpa detects data problems that other data monitoring techniques miss. (Patents pending.)
Data Culpa Validator provides:
- Continuous Data Pipeline Monitoring
Data changes continuously and data schemas and business objectives change over time. Validator monitors data pipelines continuously so you can detect important changes, whether data is coming from a third-party data feed, a data lake from the team across the hall, or business applications in a data center.
- Hierarchical Views of Data
The real world isn’t flat and chances are your data model isn’t either. Describing the real world with a flat structure and CSV files limits accuracy and potential insights. By supporting hierarchical data models, Validator supports the faster evolution of data.
- Diagnostics for Zooming in from Bar Charts to Individual Records
With Validator, you can drill into visualizations to quickly understand why a pipeline is behaving the way it is, without have to compose manual queries. Quickly discover why data is messing, when a new field appeared, or when a new value appeared in a category.
- Insights into Data in Transit and Data in Situ
Validator provides a single, consistent platform for examining data in pipelines as well as data in databases and data lakes.
- Open Source Client Libraries
Validator’s open source client libraries make it easy for you to add new connectors whether you’re running Validator in the cloud or on premises in an enterprise data center.
- Support for DataOps
Validator stores pipeline configuration data in YAML files that can be sourced in source control systems and deployed as part of your organization’s DevOps practices.