In this blog, we’ll discuss all things data quality and data observability. We’ll present interviews with leading data scientists, and we’ll discuss our own working building data observability solutions for data engineers and data scientists working in a broad range of industries.
ABOUT THIS BLOG
Five Neat Tricks with Data Culpa Alerts
Every data team needs to keep an eye on its data. At the same time, no data team wants to be deluged with alerts.
Ideally, alerts should direct your attention to the most important changes taking place in your data. If they call attention to every change, you’ll quickly be buried in so many alerts you’ll never work your way through them. But if alerts call attention to just the three or five most important changes that have occurred today, then you have a way to methodically stay on top of what’s happening with your data.
Watchpoints and Alerts
Here at Data Culpa, we call the place in a data pipeline where you monitor data a watchpoint. Using Data Culpa Validator, our data monitoring platform, you can insert a watchpoint in a data pipeline or in a data repository without writing any code at all. Just use one of our connectors for BigQuery, Snowflake, or whatever data platform you’re using, and configure the watchpoint with your credentials and a few settings. Setup takes just a couple of minutes.
Once a watchpoint is set up, Validator will monitor your data’s performance over time, checking for inconsistencies and raising alerts when important changes occur.
Because few data teams have an engineer who’s going to spend the day keeping an eye on dashboards, alerts are probably the way most data teams are going to interact with any data monitoring tool. That’s true with Validator, too. Getting alerts right is really critical for Validator or any other data monitoring tool to really be useful to your team.
Here are just some ways we’ve engineered alerts in Validator to give data teams the critical information they need for whatever situation they are confronting.
#1: Historical Alerts for a New Pipeline Under Management
Let’s say you’ve got a data table your team has been running for months or years. You decide to start monitoring the data with Data Culpa Validator. Obviously, you expect to get alerts going forward. But Validator can also show you the alerts that you would have received in the past.
This is useful for a variety of reasons, not least of which is you can better understand alerts received today if you can know they appeared on certain occasions in the past.
By the way, we show the past alerts without flooding your Slack channel with them. Speaking of which . . .
#2: Alert Roll-ups for Slack Notifications
We summarize your alerts into Slack when a cluster of alerts occur. Often when things go wrong, it’s more than one issue, and spamming Slack with dozens of messages isn’t useful. Our summaries consolidate the alerts to the crux of the issue, letting you quickly see the kinds of changes that have happened. In addition, we give you the option of limiting how many alerts you receive per hour per watchpoint in Slack. That way, if something goes wrong and continues to go wrong with your data, you can be notified without having to scroll through screenfuls of alerts.
#3: Statistically Meaningful Alerts
Alerting a data team that something has changed might or might not be useful, depending on what the change is. That’s why our recommended configuration for alerts is detecting changes on distributions, zeroes, nulls, and so on changes by two deviations. We use a mix of median absolute deviations and standard deviations combined with unsupervised and self-supervised techniques to drive our analysis of “change.” We have our these techniques to yield more meaningful results for analysis versus other approaches.
#4: Generating a Demo Baseline for a New Data Pipeline
When you start monitoring a new pipeline, how can you tell if the variations you’re seeing are significant? If Day 2 differs from Day 1, should you care?
To offer guidance for data teams monitoring new pipelines, we offer Demo Mode with alerts. When you activate Demo Mode, Validator automatically builds a pipeline model by duplicating your first day’s data and using it as placeholder data going several days back. In effect, Validator creates a conjectural history so you can began tracking changes to your data. Validator labels alerts about changes to conjectural history as Demo alerts, so later, you’ll know that they’re based on extrapolated data.
But they can be useful immediately. If Days 2 and 3 of your pipeline vary significantly from Day 1, you might want to know that. Alternatively, if Days 2 and 3 turn out to be more representative of your pipeline’s performance in the long term, you can simply disregard the alerts labeled “Demo” and pay attention to any new alerts generated after your pipeline’s first few days.
#5: API-Powered Alert Operations
Does your organization live in GitOps? You can query, verify, and update alert configurations with our API to ensure consistency across watchpoints, or drive change control management from GitHub or other repositories of configuration. You can also query our API for active alert information to drive your own downstream business processes with the intelligence of Data Culpa Validator.
Try Data Culpa Validator
Want to see Data Culpa Validator alerts in action with your own data? Contact us and set up a free trial.
Feel free to reach out to us!
NEWSLETTER SIGN UP
Subscribe to the Data Culpa Newsletter