Data Quality in Healthcare: An Interview with Shawn Stapleton, PhD, Part 2

Shawn Stapleton, PhDAs a leader in healthcare innovation at Philips Healthcare, Shawn Stapleton manages a portfolio of technology solutions that have immediate impact for hospitals around the world. Shawn works with a network of clinical partners to identify key needs and to develop novel solutions that improve the quality and efficiency of clinical care delivery. Shawn sits on several steering committees within Philips to develop and implement innovation strategies across multiple businesses, as well as with multiple granting agencies to advise on national research initiatives in healthcare. As Director of Innovation, Shawn strives put research into practice, bringing ideas to market, and working with worldwide leaders to deploy the next generation of healthcare innovations.

Shawn has a PhD in Medical Biophysics from the University of Toronto and a B.S. with honors in Physics and Computer Science from the University of Victoria. 

Shawn is an investor in Data Culpa, Inc. and a member of the company’s Board of Advisors.

Read Part 1 of our interview here.

Data Quality and the Lack of Standardized Data Tools in Healthcare

Data Culpa:  At Data Culpa, our big theme is data quality. I’m wondering if a data team working with an ad hoc assortment of tools as you’ve described creates any data quality challenges. Does a lack of standardization pose any risks for the consistency or quality of data?

Shawn Stapleton:  There are many layers of complexity in this question. I’ll take you through how data is generated and flows from source to data scientist in healthcare to try and shed light on data quality challenges in healthcare.

Healthcare data is largely produced to drive the business of healthcare delivery. We tend to think that healthcare data primarily consists of clinical information: lab results, genomic information, radiology images and reports, pathology reports, and so on.

Healthcare data is largely produced to drive the business of healthcare delivery.

However, there’s an entire operational machinery to drive a patient examination and the generation of a clinical result. Ordering, exam scheduling, staff scheduling, examination workflow, and billing are all examples of operational healthcare data.

Today this data is stored in a collection of hospital information systems. The main system we tend to discuss is the electronic medical record (EMR) system. In reality, information is also stored in lab information systems, radiology information systems, billing systems, scheduling systems, and more. Some of these systems are department-dependent or are duplicated across departments. Rarely are these systems static. Systems are being upgraded, added to, and migrated over time. The integration, support, and maintenance of these systems results in data quality issues.

Nowadays healthcare data is largely digital. But this wasn’t always the case. Back in the day, hospitals , clinics, and private practices managed their business using paper records. The switch to digital is actually quite recent, starting in the 80’s and only really become widely adopted  after passing of the HITECH Act, which was part of the American Recovery and Reinvestment Act (ARRA) in 2009 which introduced the  concept of meaningful use. The switch to digital meant that a paper cased records need to be digitized to maintain continuity of patient care. This been an ongoing and challenging process, fraught with data quality issues.

Nowadays healthcare data is largely digital.

So what we have in healthcare today is a collection of operational and clinical data stored across multiple systems, with varying data quality. This has profound implications for data science and machine learning. Getting access to data means working closely with the hospital IT to query multiple systems. When you run in data linking and/or data quality issues, or missing data you need to go back to hospital IT to understand the data quality issues and/or get new data. Hospital IT is not in the business of machine learning; they’re in the business of supporting hospital operations. So, you can imagine this process is time consuming. Sometimes you settle for the data you get, and do your best to remove data quality issues with little understanding if the data you removed is indeed that was important and representative.

So what typically happens for data science projects in healthcare is the collecting of heterogeneous data with limited metadata providing context about how the data was collected and how the data is changing over time. Data was likely integrated from multiple healthcare systems, each with varying data quality challenges. The evolution of healthcare systems and devices result in data drift and increased potential for new data quality issues to arise. Without context, data scientists rely on subject matter experts who has been around long enough with knowledge of the data and context in which it was generated.

What typically happens for data science projects in healthcare is the collecting of heterogeneous data with limited metadata providing context about how the data was collected and how the data is changing over time.

Data Culpa:  Do you see a movement underway to standardize tools and processes to make these changes more likely? For example, let’s say a hospital group decides they want to improve radiology outcomes for their patients, and they think that applying AI might be able to do that. By now, they probably recognize that they need to address data quality problems like data drift, and they need to standardize tools. Do you see any sort of top-down initiative looking at addressing this as an issue of enterprise architecture?

Shawn Stapleton:  Absolutely. I mean, it’s top-down and bottom-up, and sideways and it’s coming from every side. There’s a lot of work being done here, but I don’t believe there’s any solid production tools to accomplish this. The real challenge in my mind is understanding data quality across an intertwined collected of systems that evolve over time. There’s substantial value in provided insights into data quality issues that are connected across data sources used in healthcare and processing pipelines used for machine learning.

TO BE CONTINUED.

End of Part 2. Go back and read Part 1.

Data Science in Healthcare: An Interview with Shawn Stapleton, PhD, Part 1

Shawn Stapleton, PhDAs a leader in healthcare innovation at Philips Healthcare, Shawn Stapleton manages a portfolio of technology solutions that have immediate impact for hospitals around the world. Shawn works with a network of clinical partners to identify key needs and to develop novel solutions that improve the quality and efficiency of clinical care delivery. Shawn sits on several steering committees within Philips to develop and implement innovation strategies across multiple businesses, as well as with multiple granting agencies to advise on national research initiatives in healthcare. As Director of Innovation, Shawn strives put research into practice, bringing ideas to market, and working with worldwide leaders to deploy the next generation of healthcare innovations.

Shawn has a PhD in Medical Biophysics from the University of Toronto and a B.S. with honors in Physics and Computer Science from the University of Victoria. 

Shawn is an investor in Data Culpa, Inc. and a member of the company’s Board of Advisors.

Building Actionable Analytics for Healthcare Data

Data Culpa: Can you tell us about your current position and how you got started in data science?

Shawn Stapleton:  I’m the Director of Innovation at Philips, developing innovative solutions for radiology departments. My team’s focus is largely on building actionable analytics and AI on top of hospital big data. I manage an innovation portfolio to drive continuous improvement in the quality and efficiency of delivery of radiology services. This includes everything from managing a patient’s order, scheduling and executing an examination, helping radiologists’ interpretation of the images, and ensuring that appropriate follow-up clinical care based on radiologists’ recommendations are being communicated and managed appropriately.

My team is really on the forefront of this area. We work really closely with hospitals around the country – in fact, around the world. Our mission is to develop and prove out novel solutions that have broad impact using technologies like artificial intelligence, simulations, statistics, business intelligence – just a whole gamut of tools. Some solutions are solely data aggregation and retrospective business intelligence, and some are focused on predictive algorithms. Through collaborations with hospital IT and healthcare workers, we validate our solutions through proof of concept studies, and then scale our solution to multiple hospitals as a product.

Through collaborations with hospital IT and healthcare workers, we validate our solutions through proof of concept studies, and then scale our solution to multiple hospitals as a product.

Using Machine Learning to Model Drug Interactions

How did I get here? It was a long journey. I started off studying computer science and physics in my undergraduate years. It was clear that software development would be useful in my “toolbox” of skills, and I primarily focused on the application of physics in healthcare.

For my PhD, I specifically focused on understanding where cancer drugs go in the body and what that means for cancer treatment response. I essentially integrated disparate data sources together to figure out where a cancer drug goes in the body after injection into a patient. I worked with a multi-disciplinary group of scientists to establish if it can be reliably tracked with imaging, building mathematical models to understand how a patient’s biology affects where the drug goes, and use machine learning to predict how much drug would be delivered to the cancer, with the ultimate goal of providing more context on response to that drug.

This was the origins of my data science career, although I was saddled with the need to collect and manage my own data. I didn’t have access to a database of already collected data. I continued this work as a post-doc at Harvard Medical School, at the Center for Systems Biology located in Massachusetts General Hospital. Essentially continuing to increase the types of data I was collecting to add more specificity to the modeling of drug transport in the body and predicting drug delivery to cancer.

At the end of my post-doc, I had an urge to move away from the research bench. Collecting my own data was getting tedious! I wanted to work with real healthcare patient data. So I joined Optum, which is a division of UnitedHealth Group. At Optum, I focused on both educating and leading a team of data scientists to develop machine-learning-based tools that could improve healthcare. We focused on operational delivery of healthcare and on building clinical decision support tools.

The Challenge of Hiring and Educating Data Scientists

As I mentioned, a large portion of my role was to educate junior data scientists. At that time, it wasn’t easy to hire data scientists. Moreover, no one was really sure what a data scientist role really meant, so how do you hire into it?

To me a data scientist was fundamentally a scientist, which resonated with the leadership at Optum. I tried to instill in my team the ability to develop a solution to a complex problem using solid scientific processes. Machine learning, statistics, etc. were just methodologies and tools used to gain insight into the data. I was then recruited by Philips Research as a senior data scientist, focusing on AI applied to radiology.

To me a data scientist was fundamentally a scientist . . . . I tried to instill in my team the ability to develop a solution to a complex problem using solid scientific processes.

The hot topic was, and still is, building AI algorithms to support a radiologist in their task to identify and classify abnormalities in medical images. I progressed through Philips being promoted to a principle scientist, and then to my current role as the director of innovation.

The Evolution of Data Science Toolkits

Data Culpa:  Is there a standard sort of data science tool kit that you’re working with?

Shawn Stapleton:  The evolution of data science toolkits is absolutely fascinating. Tools have matured remarkably from when I started. I started in MATLAB, which had limited machine learning tools libraries. You could build a neural network in MATLAB, but it wasn’t turnkey. I had no idea how to make it work and I also didn’t have enough data to do anything remarkable. TensorFlow didn’t exist, and scikit learn certainly wasn’t popular amongst my healthcare scientist colleagues.

Today, there are numerous open-source and proprietary/commercial  tools for machine learning. So what tools are my team and I working with? Typically the “tool du jour” and/or the tools they have the most experience with. It’s less coordinated than one might expect. From an innovation standpoint, it’s great not to be limited by conforming to a specific toolset. I like the mindset of “use what you like.”

However, from a product development standpoint, this mindset is a nightmare. During my academic career using whatever tool I wanted never posed any problems, since sharing code or data wasn’t commonplace, and the goal wasn’t to build a product. However, in a company it can be hard to take innovations built using multiple different tools and transfer them into a unified product, particularly in a the highly regulated healthcare environment.

From an innovation standpoint, it’s great not to be limited by conforming to a specific toolset. I like the mindset of “use what you like.” However, from a product development standpoint, this mindset is a nightmare.

We use tools like Docker to help manage this challenge,  but there’s still complexity in the flow of data across microservices. We use tools like MLFLow to help track machine learning experiments. However, the management of data needed to build algorithms isn’t straightforward; it’s always a struggle.

A Lack of Tools for AI and Machine Learning

There is a lack of tools to manage the data pipeline for machine learning in healthcare. Healthcare data is stored by the hospital across multiple disparate systems; and it’s not generally available to researchers or companies for building algorithms. Moreover, the ability to gather a high-quality healthcare data set isn’t straightforward. I need to collaborate with hospitals’ healthcare workers and IT to access, collate, and create a data set that’s sufficient to  build AI algorithms.

The tools to deploy AI algorithms add an additional challenge. There is not a straightforward method to deploy AI in healthcare, but companies like Philips are making huge strides to tackle this problem. Once an algorithm is deployed, there is still a major challenge keeping the algorithm “fresh.”

Healthcare data is constantly evolving, and algorithms need to be updated to reflect these changes. Until recently, there haven’t been any tools that could manage and monitor the flow of data into AI algorithms to ensure they are behaving as expected. In my opinion this is a very cool topic. It’s why I’m so excited to be working with Data Culpa now.

End of Part 1.

Accounting for Bias in Data Analytics: An Interview with Lauren S. Moores, Part 3

Lauren S. Moores, Head of Data Sciences InnovationLauren S. Moores, a.k.a. “the Data Queen,” is Head of Data Sciences Innovation for FL65 and Invaio Sciences, two companies launched by Flagship Pioneering, a venture capital and private equity firm dedicated to creating breakthroughs in human health and sustainability and build life sciences companies. She is also Chair of the Data Advisory Board for USA for UNHCR, a non-profit organization supporting the UN Refugee Agency, and a Teaching Fellow at the Harvard Business Analytics Program. We spoke to Lauren about building data science systems for companies of all sizes, ensuring those platforms maintain high data quality, and accounting for bias in data analytics. (Read Part 1 of our interview here and Part 2 of our interview here.)

Continue reading “Accounting for Bias in Data Analytics: An Interview with Lauren S. Moores, Part 3”

Data Coverage and Other Pillars of Data Science: An Interview with Gordon Wong, Part 2

Data scientist Gordon WongGordon Wong has been working with data since the early 90’s. His accomplishments include building Fitbit’s data analytics platform from the ground up and serving as HubSpot’s VP of Business Intelligence. He has also worked or consulted for leading brands such as ezCater, edX, and ZipCar. In all his engagements, Gordon focuses on repeatable processes, continuous insights, and team building. In Part 2 of our interview, we spoke to Gordon about data coverage and other pillars of data science, and the importance of testing for data quality. You can read Part 1 of our interview here.

Continue reading “Data Coverage and Other Pillars of Data Science: An Interview with Gordon Wong, Part 2”

Different Roles in Data Science: An Interview with Lauren S. Moores, Part 2

Lauren S. Moores, Head of Data Sciences InnovationLauren S. Moores, a.k.a. “the Data Queen,” is Head of Data Sciences Innovation for FL65 and Invaio Sciences, two companies launched by Flagship Pioneering, a venture capital and private equity firm dedicated to creating breakthroughs in human health and sustainability and build life sciences companies. She is also Chair of the Data Advisory Board for USA for UNHCR, a non-profit organization supporting the UN Refugee Agency, and a Teaching Fellow at the Harvard Business Analytics Program. We spoke to Lauren about building data science systems for companies of all sizes, ensuring those platforms maintain high data quality, and different roles in data science for engineers and subject matter experts. (Read Part 1 of our interview here.)

Continue reading “Different Roles in Data Science: An Interview with Lauren S. Moores, Part 2”

Continuous Analysis, Continuous Insights, and Data Quality: An Interview with Gordon Wong, Part 1

Data scientist Gordon WongGordon Wong has been working with data since the early 90’s. His accomplishments include building Fitbit’s data analytics platform from the ground up and serving as HubSpot’s VP of Business Intelligence. He has also worked or consulted for leading brands such as ezCater, edX, and ZipCar. In all his engagements, Gordon focuses on repeatable processes, continuous insights, and team building. We spoke to him about data science, the surprising staying power of SQL, and data quality.

Continue reading “Continuous Analysis, Continuous Insights, and Data Quality: An Interview with Gordon Wong, Part 1”

Building Data Science Systems: An Interview with Lauren S. Moores, Part 1

Lauren S. Moores, Head of Data Sciences InnovationLauren S. Moores, a.k.a. “the Data Queen,” is Head of Data Sciences Innovation for FL65 and Invaio Sciences, two companies launched by Flagship Pioneering, a venture capital and private equity firm dedicated to creating breakthroughs in human health and sustainability and build life sciences companies. She is also Chair of the Data Advisory Board for USA for UNHCR, a non-profit organization supporting the UN Refugee Agency, and a Teaching Fellow at the Harvard Business Analytics Program. We spoke to Lauren about building data science systems for companies of all sizes and ensuring those platforms maintain high data quality.

Continue reading “Building Data Science Systems: An Interview with Lauren S. Moores, Part 1”

Data Quality Challenges for Data Scientists: An Interview with Michal Klos, Part 1

Michal KlosData quality is an ongoing challenge for data scientists and data engineers. In this interview, Michal Klos share the insights he’s gleaned from a long career building data pipelines and ensuring they deliver high-quality results. Michal is Sr. Director of Engineering at Indigo Ag, where he is helping the fight against climate change. Michal has been leading data engineering and platform teams for nearly two decades. Before joining Indigo Ag, he built and scaled massive data platforms at multiple companies, including Localytics, Compete, and WB Games.

Continue reading “Data Quality Challenges for Data Scientists: An Interview with Michal Klos, Part 1”