ABOUT THIS BLOG

In this blog, we’ll discuss all things data quality and data observability. We’ll present interviews with leading data scientists, and we’ll discuss our own working building data observability solutions for data engineers and data scientists working in a broad range of industries.

Five Neat Tricks with Data Culpa Alerts

by | Apr 8, 2022 | Data Culpa

Every data team needs to keep an eye on its data. At the same time, no data team wants to be deluged with alerts. 

Ideally, alerts should direct your attention to the most important changes taking place in your data. If they call attention to every change, you’ll quickly be buried in so many alerts you’ll never work your way through them. But if alerts call attention to just the three or five most important changes that have occurred today, then you have a way to methodically stay on top of what’s happening with your data. 

Watchpoints and Alerts

Here at Data Culpa, we call the place in a data pipeline where you monitor data a watchpoint. Using Data Culpa Validator, our data monitoring platform, you can insert a watchpoint in a data pipeline or in a data repository without writing any code at all. Just use one of our connectors for BigQuery, Snowflake, or whatever data platform you’re using, and configure the watchpoint with your credentials and a few settings. Setup takes just a couple of minutes. 

Once a watchpoint is set up, Validator will monitor your data’s performance over time, checking for inconsistencies and raising alerts when important changes occur.

Because few data teams have an engineer who’s going to spend the day keeping an eye on dashboards, alerts are probably the way most data teams are going to interact with any data monitoring tool, and Validator is no exception. What this means is that getting alerts right is really critical for any data monitoring tool to really be useful to your team.

Here are just some ways we’ve engineered alerts in Validator to give data teams the critical information they need for whatever situation data teams are confronting.

Historical Alerts for a New Pipeline Under Management

Let’s say you’ve got a data pipeline your team has been running for months or years. You decide to start monitoring the pipeline with Data Culpa Validator. Obviously, you expect to get alerts on your pipeline’s performance going forward. But Validator can also show you the alerts that you would have received once it analyzed your pipeline’s performance over time.

This is useful for a variety of reasons, not least of which is you can better understand alerts received today if you can know they appeared on certain occasions in the past.

By the way, we show the past alerts without flooding your Slack channel with them. Speaking of which…

Alert Throttling for Slack Notifications

We give you the option of limiting how many alerts you receive per hour per watchpoint in Slack. That way, if something goes wrong with your data, you can be notified without having to scroll through screenfuls of alerts. 

Statistically Meaningful Alerts

Alerting a data team that something has changed might or might not be useful, depending on what the change is. That’s why our recommended configuration for alerts is detecting changes on zeroes, nulls, or value distribution changes by two standard deviations. We find that performing these calculations on changes yields more meaningful results for analysis. 

Generating a Baseline for Pipeline Performance

When you start monitoring a new pipeline, how can you tell if the variations you’re seeing are significant? If Day 2 differs from Day 1, should you care?

To solve this problem, we offer Demo Mode with alerts. In Demo Mode, we build a pipeline model by duplicating your first day’s data and using it as placeholder data , generating a conjectural history and raising alerts accordingly for the first few days. All these alerts are labeled as Demo alerts, so later, you’ll know that they’re based on extrapolated data. 

But they can be useful immediately. If Days 2 and 3 of your pipeline vary significantly from Day 1, you might want to know that. Alternatively, if Days 2 and 3 turn out to be more representative of your pipeline’s performance in the long term, you can simply disregard the alerts labeled “Demo” and pay attention to any new alerts generated after your pipeline’s first few days.

Try Data Culpa Validator

Want to see Data Culpa Validator alerts in action with your own data? Contact us and set up a free trial.

Have Questions?

Feel free to reach out to us!

NEWSLETTER SIGN UP

Subscribe to the Data Culpa Newsletter