ABOUT THIS BLOG
In this blog, we’ll discuss all things data quality and data observability. We’ll present interviews with leading data scientists, and we’ll discuss our own working building data observability solutions for data engineers and data scientists working in a broad range of industries.
Data Contracts and Data Products Are Not Enough: Why Data Teams Need Data Contexts
When we founded Data Culpa, one of the first claims we made about the data engineering market was that data monitoring would tie into the testing of data. We believed then — and we still believe — that companies would treat data as a product in the same way they treat...
Effective Data Monitoring Requires a Relative Baseline
Recently I was talking to a customer who had used a competitor’s product for data monitoring. It sounded like the product had the users specify parameters about the data; e.g., “this column should never be null.” This is all well and good, except that the product then...
Data Culpa Receives First Issued Patent for Systems and Methods for Monitoring Data Quality in Data Pipelines and Databases
BOSTON, MA. September 13, 2022 – Data Culpa, Inc., a provider of high-scale, low-touch data observability solutions for data engineering projects of all kinds, today announced that the US Patent & Trademark Office has issued patent #11,429,614 to Data Culpa for...
MongoDB’s Unique Challenges for Data Quality and Application Development
There’s lots to love about MongoDB. Compared to traditional databases, MongoDB makes developing and standing up new applications fast, thanks to the implicit schema of the collections. Need to add a new field? No big deal, just start using it. Great! The hierarchical...
Data Meshes and the Challenges They Create for Data Quality Monitoring
There’s a common ingest pattern we see across customers. Data lands in some file/object storage, e.g., S3, as a JSON/Parquet/CSV/etc, and then is ingested into Snowflake, potentially into two tables: one for a raw load and one or more that transform the data in some...
The Data Quality Hierarchy of Needs
Just as Maslow identified a hierarchy of needs for people, data teams have a hierarchy of needs, beginning with data freshness; including volumes, schemas, and values; and culminating with lineage. In this blog post, which was published in the Data Science area of the...
Try Data Culpa’s Rapid Proof-of-Concept Test and Get Two Months of Data Quality Analysis in Just Three Days
If you’re responsible for data pipelines or any other aspect of your company’s data, you’re busy. You might want to try out new data services, but you probably don’t want to get bogged down in lengthy vendor negotiations or complicated configurations for trials....
Data Culpa Validator: Instant Data Monitoring Without Time-Consuming Customizations
A question we often get asked at Data Culpa is, “If you guys are going to monitor our data, isn’t that going to take a lot of custom engineering to set up?” The answer is, “No.” Data Culpa Validator can be set up in less than an hour. No lengthy professional services...
Five Neat Tricks with Data Culpa Alerts
Every data team needs to keep an eye on its data. At the same time, no data team wants to be deluged with alerts. Ideally, alerts should direct your attention to the most important changes taking place in your data. If they call attention to every change, you’ll...
Validating Data for Pipelines with Data Culpa
Consistent pipeline behavior is critical for any data process. You can use Data Culpa Validator to ensure consistent operation of pipelines as well as data at rest in databases, data lakes, and data warehouses. This introduction shows you how to validate your results...
Using Timeshift in Data Culpa Validator
One of the best features we offer customers who are just getting started with Data Culpa is our "timeshift" feature. Timeshift lets us extract a point in time for a row or a document and have Validator evaluate the data contained within that row or document as if it...
Where Most Data Observability Solutions Fall Short
Observability is the analysis of a system based on its outputs. By analyzing the outputs of a system at various points, it should be possible to infer the internal state of the system and to diagnose problems the system is experiencing. This sounds useful for data...
Introducing Data Culpa Validator
What’s your data doing? Can you tell? Data teams tell us they need better visibility into data pipelines, integrations, repositories, and data lakes. You can hand-code a bunch of unit tests to check for known boundary cases. But coverage will always be limited. And...
Why You Need a Data Quality Strategy for External Data
A company’s data is its most valuable asset — that's a commonplace observation in board rooms and IT labs today. Every organization recognizes the importance of its data. Somewhere in the mission statement for most digital transformation projects, there’s probably a...
Data Quality in Healthcare: An Interview with Shawn Stapleton, PhD, Part 2
As a leader in healthcare innovation at Philips Healthcare, Shawn Stapleton manages a portfolio of technology solutions that have immediate impact for hospitals around the world. Shawn works with a network of clinical partners to identify key needs and to develop...
Data Science in Healthcare: An Interview with Shawn Stapleton, PhD, Part 1
As a leader in healthcare innovation at Philips Healthcare, Shawn Stapleton manages a portfolio of technology solutions that have immediate impact for hospitals around the world. Shawn works with a network of clinical partners to identify key needs and to develop...
Accounting for Bias in Data Analytics: An Interview with Lauren S. Moores, PhD, Part 3
Lauren S. Moores, a.k.a. “the Data Queen,” is Head of Data Sciences Innovation for FL65 and Invaio Sciences, two companies launched by Flagship Pioneering, a venture capital and private equity firm dedicated to creating breakthroughs in human health and sustainability...
Data Coverage and Other Pillars of Data Science: An Interview with Gordon Wong, Part 2
Gordon Wong has been working with data since the early 90’s. His accomplishments include building Fitbit’s data analytics platform from the ground up and serving as HubSpot’s VP of Business Intelligence. He has also worked or consulted for leading brands such as...
Different Roles in Data Science: An Interview with Lauren S. Moores, PhD, Part 2
Lauren S. Moores, a.k.a. “the Data Queen,” is Head of Data Sciences Innovation for FL65 and Invaio Sciences, two companies launched by Flagship Pioneering, a venture capital and private equity firm dedicated to creating breakthroughs in human health and sustainability...
Continuous Analysis, Continuous Insights, and Data Quality: An Interview with Gordon Wong, Part 1
Gordon Wong has been working with data since the early 90’s. His accomplishments include building Fitbit’s data analytics platform from the ground up and serving as HubSpot’s VP of Business Intelligence. He has also worked or consulted for leading brands such as...
Building Data Science Systems: An Interview with Lauren S. Moores, PhD, Part 1
Lauren S. Moores, a.k.a. “the Data Queen,” is Head of Data Sciences Innovation for FL65 and Invaio Sciences, two companies launched by Flagship Pioneering, a venture capital and private equity firm dedicated to creating breakthroughs in human health and sustainability...
Data Quality Problems and How to Fix Them: An Interview with Michal Klos, Part 2
Data engineering teams need to be able to find and fix data quality problems quickly. In part 2 of our interview with Michal Klos, Michal discusses the challenges of discovering and troubleshooting data quality problems in pipelines. Michal is Sr. Director of...
Data Quality Challenges for Data Scientists: An Interview with Michal Klos, Part 1
Data quality is an ongoing challenge for data scientists and data engineers. In this interview, Michal Klos share the insights he's gleaned from a long career building data pipelines and ensuring they deliver high-quality results. Michal is Sr. Director of Engineering...
A Blog about Data Quality
At Data Culpa, we're building solutions to help data engineers and data scientists catch data quality problems before they jeopardize data pipeline results. Nearly every business today is becoming more data-centric. Data not only drives operations and decision making;...
Have Questions?
Feel free to reach out to us!