What We Do

Data Culpa

Catch Data Quality Problems Before They Jeopardize Data Pipelines and Databases

Data Culpa gives data scientists and data engineers near-real-time alerts about changes in data schemas and data values that could jeopardize the integrity of data pipelines and databases. Using Data Culpa Validator, data teams can:

Discover Icon

Discover problems before they compromise analytical results and ETL pipelines.

Save Time icon

Reduce Mean Time to Repair (MTTR) from days to minutes.

Avoid icon

Avoid having to rerun pipelines or AI analyses because of poor data quality.

Time icon

Spend more time analyzing data and less time cleaning and fixing it.

How It Works

Data Culpa Validator builds metadata models for all the pipeline stages where Validator is invoked. Using these models, Validator detects changes and performs comparisons upon request.

When Validator detects a change that matters, it sends notifications via standard alerting mechanisms, such as Slack, email, or SMS (for PagerDuty or other alerting services).

Data Culpa Validator as a SaaS service or a Linux package for on-premises deployment.

Validator Diagram

Call Data Culpa Validator from any data pipeline, including programs written in Python, Java, or Ruby. Validator detects changes in structured, semi-structured, and unstructured data.

Validator works with popular data platforms, including:

AWS Logo

Azure logo

Apache Spark logo

Google Cloud logo

MongoDB logo

MySQL logo

PostgreSQL logo

Snowflake logo

Got Big Data? Get the Big Picture

Using advanced techniques for analysis, storage, and visualization, Data Culpa detects data problems that other data monitoring techniques miss. (Patents pending.)

Data Culpa Validator provides:

  • Continuous Data Pipeline Monitoring
    Data changes continuously and data schemas and business objectives change over time. Validator monitors data pipelines continuously so you can detect important changes, whether data is coming from a third-party data feed, a data lake from the team across the hall, or business applications in a data center.
  • Hierarchical Views of Data
    The real world isn’t flat and chances are your data model isn’t either. Describing the real world with a flat structure and CSV files limits accuracy and potential insights. By supporting hierarchical data models, Validator supports the faster evolution of data.
  • Diagnostics for Zooming in from Bar Charts to Individual Records
    With Validator, you can drill into visualizations to quickly understand why a pipeline is behaving the way it is, without have to compose manual queries. Quickly discover why data is messing, when a new field appeared, or when a new value appeared in a category.
  • Insights into Data in Transit and Data in Situ
    Validator provides a single, consistent platform for examining data in pipelines as well as data in databases and data lakes.
  • Open Source Client Libraries
    Validator’s open source client libraries make it easy for you to add new connectors whether you’re running Validator in the cloud or on premises in an enterprise data center.
  • Support for DataOps
    Validator stores pipeline configuration data in YAML files that can be sourced in source control systems and deployed as part of your organization’s DevOps practices.

Try Validator

Contact us to arrange a free trial.