As competition in the digital health space intensifies and failures become more common, earning regulatory approval and clinical acceptance of new products is more important than ever. Digital clinical trials can show regulators and clinicians how a digital health product can improve the lives of patients, and these trials need access to high-quality, real-world data (RWD).
Thankfully for digital health companies, they’re not starting from scratch. Many have unstructured data assets they’ve created or acquired that they can use to develop useful RWD. When considering the state of your unstructured data, it’s important to consider four variables.
Data sample adequacy: When algorithm designers at digital health companies use inadequate sample sizes, they risk building a product that isn’t relevant to the group of healthcare consumers they’re targeting. It’s especially difficult for algorithm designers to predict how their models in development will perform if they’re tested on data samples that are insufficient. As we saw with Watson for Oncology, overestimating the performance of a digital health tool based on inadequate data can have negative consequences for patients.
Data quality and bias: Duplicative, incomplete or inconsistent data make it more difficult to build the products that patients need. If you have data from many diverse sources, consider whether it’s truly feasible to shoehorn all of this data into a rigid database. It’s also important to keep an eye out for medical coding errors and inconsistencies, especially as different providers take different approaches to defining certain medical events.
Interoperability: It can be difficult to harmonize and make judgments about a data set if it was collected from multiple health systems. Interoperability is a key driver of success for providers—and with regulators, patient advocacy groups and others increasing their focus on it, it’s possible interoperability will be less of an obstacle in the future.
Data governance: Health systems don’t take a uniform approach to privacy and security, which affects the transfer, storage, use, publication and retention of data.
Sometimes it can be problematic for team members who are close to the data in question to assess its usefulness and adequacy. We’ve found it’s often helpful to have someone with an outside perspective ask questions that others may not consider or be hesitant to broach, for fear of veering a plan off course.
About 80% of RWD is unstructured—free-form texts that are difficult to use without significant processing are one example—but much of it can still offer valuable insights into the critical contexts of the patient journey. Fortunately, a number of strategies and tactics make it possible to generate high-quality RWD from unstructured data—and they can be repeated again and again, allowing you to develop policies and practices sustainably for developing high-quality RWD. We’ve seen these six steps offer direction for digital health companies seeking a path forward:
Step 1: Deploy early data quality checks
The first step in the RWD generation life cycle—the process of turning unstructured data into something more useful—is to ensure all data values are recorded, processed and stored in a way that allows for accurate reporting, interpretation and verification. Data quality checks should be detailed in the study protocol and adequately validated to ensure the RWD generation process aligns with its intended purpose.
Step 2: Focus on global standards
Adhering to internationally recognized data harmonization standards for data types, representations, delivery and schemas is critical to ensuring a product fulfills clinical needs and aligns with clinical practices in all the regions where it will be used. Keeping these standards at the forefront should also help your product when it’s under review by the U.S. Food and Drug Administration, the European Union Medical Device Regulation and others.
Step 3: Leverage natural language processing
Natural language processing (NLP) tools enable unstructured text mining and terminology recognition. Integrating NLP into the RWD life cycle can provide continuous data enrichment—not to mention immediate and future algorithmic improvements through iteration and learning. While deploying NLP at scale is challenging, efforts today will allow NLP to be an even more effective tool in the future.
Step 4: Pursue a flexible data platform
As many digital health companies learned during the COVID-19 pandemic, enterprise data warehouses are time-consuming to implement, inflexible and sometimes filter out data that’s needed later. That’s why we recommend using a flexible data platform based on fast healthcare interoperability resources (FHIR) standards, along with spec-compliant application programming interfaces around it. One example is "data provenance”—a scalable pipeline that includes different scenarios from various sources without losing sight of its origin and that can support multiple solutions. A flexible data platform makes it possible, for example, to aggregate raw regulated and unstructured data efficiently and quickly into a data lake, allowing for mining and direct analysis with minimal effort. It could also make it possible to ingest and store healthcare patient data in a standard way, without extensive development or configuration.
Step 5: Emphasize diversity, equity and inclusion
A lack of diversity in clinical trials is a recognized problem, but RWD presents opportunities to expand representation from historically underrepresented groups. Insights from RWD can help redraw the uneven map of medical knowledge to include groups that have been systemically excluded from clinical trials around the world. The first step to improving diversity, equity and inclusion efforts is to be mindful of potential biases when reviewing the insights produced by RWD.
Step 6: Look beyond existing regulations
Complying with existing regulations is critical, but RWD will require the existing regulatory framework to accommodate new use cases and risks. Digital health companies should look beyond existing regulations and think about how they can be good stewards of patient data. Because regulators assess risks, patients, providers and digital health companies all will benefit from public oversight of the RWD life cycle.
View Figure 1 to learn more about how to extract value from unstructured data.
With unstructured data continuing to become more readily available, using it to produce meaningful RWD will not be a competitive advantage for long—it will become an imperative. As digital health companies move through this new era of clinical trials, it’s more important than ever to thoughtfully consider how to best leverage both regulated and unstructured data.