Data Observability for Data Pipelines

Data Culpa Validator provides the observability data teams need for monitoring data quality and trends in data pipelines and repositories.

The Validator dashboard makes it easy to see:

Alerts about important changes in the pipeline or repository

Cohesion: a measure of the consistency of data in this pipeline or repository

Data volumes

Data schemas

Data values

Validator provides an open API, enabling integration with any modern data pipeline orchestration software tooling or ad hoc scripts. We also provide a Python client API that you can access with `pip install dataculpa-client` and other language bindings are on our roadmap. Get in touch with us if you need something else.

Our flexible, asynchronous approach enables you to monitor from anywhere in your pipeline that makes the most sense for your needs: at the ingest, at the output, or somewhere in between.

You can also track intermediate results and tag sets of records with two kinds of metadata. We enable comparison of different environments as foundational metadata, where you can have different environment names, version tags, or pipeline stage tags for a given Validator watchpoint. You can also add metadata to a specific group of records that you’re sending to Validator for appending code state, debug notes, or other information that you might want later when reviewing or analyzing Data Culpa output.

How it Works

Use a no-code or low-code Data Culpa connector to connect Validator to your pipeline, database, data warehouse, or data lake.

Data Culpa reports details about changes in record volume, data values, etc. Data Culpa to report on just the data they’re responsible for.

Based on Data Culpa’s reporting, processing code can decide whether or not to proceed. Data teams might decide to investigate anomalies before processing them.

Pipeline sends output data to Data Culpa for additional monitoring and intelligence.