ABOUT THIS BLOG

In this blog, we’ll discuss all things data quality and data observability. We’ll present interviews with leading data scientists, and we’ll discuss our own working building data observability solutions for data engineers and data scientists working in a broad range of industries.

Data Science in Healthcare: An Interview with Shawn Stapleton, PhD, Part 1

by | Feb 3, 2021 | Data Culpa

As a leader in healthcare innovation at Philips Healthcare, Shawn Stapleton manages a portfolio of technology solutions that have immediate impact for hospitals around the world. Shawn works with a network of clinical partners to identify key needs and to develop novel solutions that improve the quality and efficiency of clinical care delivery. Shawn sits on several steering committees within Philips to develop and implement innovation strategies across multiple businesses, as well as with multiple granting agencies to advise on national research initiatives in healthcare. As Director of Innovation, Shawn strives put research into practice, bringing ideas to market, and working with worldwide leaders to deploy the next generation of healthcare innovations.

Shawn has a PhD in Medical Biophysics from the University of Toronto and a B.S. with honors in Physics and Computer Science from the University of Victoria. 

Shawn is an investor in Data Culpa, Inc. and a member of the company’s Board of Advisors.

Building Actionable Analytics for Healthcare Data

Data Culpa: Can you tell us about your current position and how you got started in data science?

Shawn Stapleton:  I’m the Director of Innovation at Philips, developing innovative solutions for radiology departments. My team’s focus is largely on building actionable analytics and AI on top of hospital big data. I manage an innovation portfolio to drive continuous improvement in the quality and efficiency of delivery of radiology services. This includes everything from managing a patient’s order, scheduling and executing an examination, helping radiologists’ interpretation of the images, and ensuring that appropriate follow-up clinical care based on radiologists’ recommendations are being communicated and managed appropriately.

My team is really on the forefront of this area. We work really closely with hospitals around the country – in fact, around the world. Our mission is to develop and prove out novel solutions that have broad impact using technologies like artificial intelligence, simulations, statistics, business intelligence – just a whole gamut of tools. Some solutions are solely data aggregation and retrospective business intelligence, and some are focused on predictive algorithms. Through collaborations with hospital IT and healthcare workers, we validate our solutions through proof of concept studies, and then scale our solution to multiple hospitals as a product.

Through collaborations with hospital IT and healthcare workers, we validate our solutions through proof of concept studies, and then scale our solution to multiple hospitals as a product.

Using Machine Learning to Model Drug Interactions

How did I get here? It was a long journey. I started off studying computer science and physics in my undergraduate years. It was clear that software development would be useful in my “toolbox” of skills, and I primarily focused on the application of physics in healthcare.

For my PhD, I specifically focused on understanding where cancer drugs go in the body and what that means for cancer treatment response. I essentially integrated disparate data sources together to figure out where a cancer drug goes in the body after injection into a patient. I worked with a multi-disciplinary group of scientists to establish if it can be reliably tracked with imaging, building mathematical models to understand how a patient’s biology affects where the drug goes, and use machine learning to predict how much drug would be delivered to the cancer, with the ultimate goal of providing more context on response to that drug.

This was the origins of my data science career, although I was saddled with the need to collect and manage my own data. I didn’t have access to a database of already collected data. I continued this work as a post-doc at Harvard Medical School, at the Center for Systems Biology located in Massachusetts General Hospital. Essentially continuing to increase the types of data I was collecting to add more specificity to the modeling of drug transport in the body and predicting drug delivery to cancer.

At the end of my post-doc, I had an urge to move away from the research bench. Collecting my own data was getting tedious! I wanted to work with real healthcare patient data. So I joined Optum, which is a division of UnitedHealth Group. At Optum, I focused on both educating and leading a team of data scientists to develop machine-learning-based tools that could improve healthcare. We focused on operational delivery of healthcare and on building clinical decision support tools.

The Challenge of Hiring and Educating Data Scientists

As I mentioned, a large portion of my role was to educate junior data scientists. At that time, it wasn’t easy to hire data scientists. Moreover, no one was really sure what a data scientist role really meant, so how do you hire into it?

To me a data scientist was fundamentally a scientist, which resonated with the leadership at Optum. I tried to instill in my team the ability to develop a solution to a complex problem using solid scientific processes. Machine learning, statistics, etc. were just methodologies and tools used to gain insight into the data. I was then recruited by Philips Research as a senior data scientist, focusing on AI applied to radiology.

To me a data scientist was fundamentally a scientist . . . . I tried to instill in my team the ability to develop a solution to a complex problem using solid scientific processes.

The hot topic was, and still is, building AI algorithms to support a radiologist in their task to identify and classify abnormalities in medical images. I progressed through Philips being promoted to a principle scientist, and then to my current role as the director of innovation.

The Evolution of Data Science Toolkits

Data Culpa:  Is there a standard sort of data science tool kit that you’re working with?

Shawn Stapleton:  The evolution of data science toolkits is absolutely fascinating. Tools have matured remarkably from when I started. I started in MATLAB, which had limited machine learning tools libraries. You could build a neural network in MATLAB, but it wasn’t turnkey. I had no idea how to make it work and I also didn’t have enough data to do anything remarkable. TensorFlow didn’t exist, and scikit learn certainly wasn’t popular amongst my healthcare scientist colleagues.

Today, there are numerous open-source and proprietary/commercial  tools for machine learning. So what tools are my team and I working with? Typically the “tool du jour” and/or the tools they have the most experience with. It’s less coordinated than one might expect. From an innovation standpoint, it’s great not to be limited by conforming to a specific toolset. I like the mindset of “use what you like.”

However, from a product development standpoint, this mindset is a nightmare. During my academic career using whatever tool I wanted never posed any problems, since sharing code or data wasn’t commonplace, and the goal wasn’t to build a product. However, in a company it can be hard to take innovations built using multiple different tools and transfer them into a unified product, particularly in a the highly regulated healthcare environment.

From an innovation standpoint, it’s great not to be limited by conforming to a specific toolset. I like the mindset of “use what you like.” However, from a product development standpoint, this mindset is a nightmare.

We use tools like Docker to help manage this challenge,  but there’s still complexity in the flow of data across microservices. We use tools like MLFLow to help track machine learning experiments. However, the management of data needed to build algorithms isn’t straightforward; it’s always a struggle.

A Lack of Tools for AI and Machine Learning

There is a lack of tools to manage the data pipeline for machine learning in healthcare. Healthcare data is stored by the hospital across multiple disparate systems; and it’s not generally available to researchers or companies for building algorithms. Moreover, the ability to gather a high-quality healthcare data set isn’t straightforward. I need to collaborate with hospitals’ healthcare workers and IT to access, collate, and create a data set that’s sufficient to  build AI algorithms.

The tools to deploy AI algorithms add an additional challenge. There is not a straightforward method to deploy AI in healthcare, but companies like Philips are making huge strides to tackle this problem. Once an algorithm is deployed, there is still a major challenge keeping the algorithm “fresh.”

Healthcare data is constantly evolving, and algorithms need to be updated to reflect these changes. Until recently, there haven’t been any tools that could manage and monitor the flow of data into AI algorithms to ensure they are behaving as expected. In my opinion this is a very cool topic. It’s why I’m so excited to be working with Data Culpa now.

End of Part 1.

Have Questions?

Feel free to reach out to us!

NEWSLETTER SIGN UP

Subscribe to the Data Culpa Newsletter