AI & Analytics

MDM redefined: Generative AI ignites potential

By Saket Bhanawat, Himanshu Shandilya, and Akash Singh

April 18, 2024 | Article | 8-minute read

MDM redefined: Generative AI ignites potential

What if a company worked to organize all its data, but the processes to manage and keep that data current were rigid, manual and resource-intensive? Chances are the CDIO would view this as a major bottleneck to scaling the company’s intelligence and would seek to address it.


With generative AI, we see great potential to move past what’s been holding data teams back. With new thinking, they can break away from the manual and resource-intensive processes of master data management (MDM) and transform the core aspects of it so that each step is more automated, dynamic and context rich.


As these practices evolve, CDIOs should grasp both the conceptual and practical changes they bring to lead the company’s shift toward a more modern MDM approach.

Why generative AI? The connected data ecosystem

Traditional MDM thrives on clean, organized data. But as the pharma industry increasingly emphasizes the importance of connecting data from diverse sources, MDM needs to adapt.


Increasingly, businesses need:

  • A highly adaptive way to manage more complexity and nuance in the ever-changing data landscape.
  • Methods to accommodate domain-specific business requirements from commercial, pharma R&D, supply chain and manufacturing, finance, operations and more.
  • Agility to implement MDM solutions to be dynamic and capable of adapting to changing business needs such as expansion into markets with unique data challenges and overall ambitions to scale AI.

Generative AI offers these capabilities by giving data stewards automation for the tasks that machines can do and rich context for the domain-specific decisions that often need high-touch manual interventions.

Generative AI adds context and self-learning to MDM

In the sections that follow, we’ll cover how generative AI helps specific MDM components move from being rules-based and rigid to something more.


Figure 1 shows how generative AI takes MDM to the next level. While traditional MDM offers core benefits, generative AI makes MDM more automated, dynamic and context rich.

Data quality: Context aware, with real-time insights


MDM prioritizes identifying data quality problems before they cause issues.


Traditional data quality tools help by finding and filtering out low-quality data. For example, they might validate if a healthcare provider’s address matches a third-party source, but that’s where the validation stops. These checks, on their own, can’t truly identify if the address is correct.


Generative AI techniques including retrieval automated generation (RAG) can perform more intelligent quality checks, enabling more than third-party validation by also understanding the data’s context. This helps analysts grasp the meaning and surrounding information of the data, allowing them to pinpoint the correct address with minimal manual effort.


Standardization: Context aware, with real-time insights


Traditional data standardization relies on rules and manual effort, often missing data variations. A company’s predefined rules, for example, can struggle with standardizing global addresses, names and phone formats.


Generative AI tackles these challenges by understanding data in its context. It can intelligently handle diverse formats, like recognizing various address layouts and cultural naming conventions. For example, generative AI can identify “John Smith” and “Smith, John” as the same person or it can easily standardize U.S. and European addresses.


Adding AI-aided enrichment to the standardization process can help gain a 360-degree view of any healthcare provider’s capabilities by finding and adding information about affiliations, clinical trials and investigators, FDA debarment actions and more.


Reference data management: Ideal for automation


Traditionally, reference data management (RDM) has involved mapping values from various sources to a central list using manual configuration. For example, a healthcare provider’s specialty might be listed as “Onc.” in a customer relationship management system, while a third-party data source might use “Oncology.”


Getting accurate reference data requires a blend of specialized tools and the meticulous efforts of data stewards. And while these methods are reliable, they struggle with scalability and agility.


Too often data stewards, responsible for manual curation and reconciliation, face growing data volumes and complexity. This makes the process labor-intensive, prone to error and ill-suited for evolving business needs.


Generative AI promises a paradigm shift in RDM, automating tasks and boosting intelligence. It can autonomously map new data, identify relationships between codes and fix inconsistencies, all with minimal human oversight. Your company’s RDM tools are likely to embed these features into their existing software, but it’s also possible to use generative AI for custom mapping where that’s needed too.


Matching: Rich context, adaptability and greater accuracy


Matching primarily focuses on rules-based methods to identify and consolidate duplicate master records through multiple algorithms. These use deterministic and fuzzy matching logic with a blend of heuristic, linguistic, phonetic and empirical methods.


While these methods are effective, they often require extensive manual tuning and analysts can struggle with variations in data, multi-language matching and matching on descriptive fields.


Two generative AI techniques, embeddings and prompt engineering, can revolutionize data matching by capturing and understanding the nuances and context of data.

  • Embeddings allow for the representation of complex data in a high-dimensional space, making it easier to find similarities or differences between entities that traditional algorithms might miss. This method can be loosely compared to tokenization in traditional tools to identify matching candidates which are evaluated with match rules.
  • Prompt engineering can be used to create instructions for LLMs to evaluate duplicates from matching candidates. This method can loosely be compared to match rules in a company’s current MDM solution; however, it enables more intuitive, intelligent and flexible matching criteria, going beyond rigid rules-based systems.

An advanced approach to data matching means that generative AI can dynamically adapt to new data formats, languages and even cultural differences in naming conventions without the need for constant manual adjustment. It significantly enhances the accuracy of match rates, reduces false positives and negatives and can uncover relationships in the data previously unnoticed.


Survivorship: Sophistication, intelligence and adaptability


Data survivorship refers to the process of determining which data attributes should be prioritized and retained in the master record when identical entities from different sources are merged.


In traditional MDM systems, this process often follows predetermined rules, which may not always lead to the optimal combination of data attributes, especially in complex or rapidly changing data environments. For example, the abbreviated name of a customer might survive based on a predefined rule.


Instead, generative AI can identify and survive the best name through the data’s context.


Moreover, generative AI can learn from historical data survivorship decisions, continuously improving its ability to choose the best attributes for the master record. This not only ensures that the most relevant and high-quality data is retained but also significantly streamlines the survivorship process, reducing the reliance on manual intervention and speeding up the time to achieve a golden record.


Ultimately, generative AI should reduce the stewardship queue as data cleansing and enrichment processes are automated further.


Look for MDM tools to incorporate generative AI outputs and RAG for survivorship and organization fetching to help workers survive the most accurate information.


Stewardship: A shift toward strategic data management


Data stewardship is a blend of people, process and technology interventions focused on rectifying data gaps and resolving data scenarios with human interventions.


Generative AI and automation can empower data stewards to analyze massive data sets and pinpoint inconsistencies, duplicates and errors at record speeds.


The benefits are twofold: generative AI significantly reduces workloads for stewards, and it drastically improves data quality by ensuring information is reliable, up-to-date and consistent.


Specific examples of how data stewards can benefit include:

  • Automated duplicate resolution: AI can automatically identify and resolve potential duplicate entries, freeing up stewards’ time for more complex tasks.
  • Data change request validation: AI can analyze data change requests and verify their validity against public information, streamlining the approval process.
  • Fulfillment of data inquiries: AI-powered chatbots for business users can handle routine data inquiries, providing business users with timely responses.
  • Rigorous compliance verification: AI can continuously monitor data for compliance with regulations, reducing the risk of errors and fines.
  • Enforcement of data policies: AI can further assist by continually scanning the data landscape for violations and automatically applying corrective actions where possible.

The automation of some tasks, no doubt, will reduce reliance on manual stewardship. However, this step opens the possibility that the role can shift from data stewards toward data strategists, where people can spend more of their time focusing on governance frameworks, policy enforcement, data quality and using AI insights for decisions.

Advance to the next level with generative AI practices

MDM practices are always evolving and the CDIO can start by facilitating a culture where teams can assess how to integrate generative AI into their MDM workflows.


Beyond the core areas above, see several possible applications that take your use of generative AI in MDM to the next level beyond the basics. While this list is by no means exhaustive, you can use it to identify where else you might want to take your exploration next.


What to try


Clinical study mastering: Experiment with using generative AI to master clinical study data across sources. AI-assisted mastering can improve accuracy and efficiency particularly with analyzing lengthy study descriptions and resolving matches for standard identifiers like National Clinical Trials IDs.


Stewardship tasks for customer and compliance data: Generative AI is suited for automating data change requests (DCRs), data validation, potential duplicate resolution and extracting valuable insights for customer management and regulatory compliance.


Validation of customer contact information: AI bots can be trained to continuously validate critical customer information utilizing trusted sources from web.


Data enrichment for healthcare organizations (HCOs): Generative AI can gather HCO type, department and affiliation information and then use publicly available data to improve granularity, providing a new level of rich context around HCO insights.


Data enrichment for HCPs: Here, generative AI can be trained to read anything that’s public, including authored publications, customer feedback, latest affiliations, debarments, specialties, clinical trials, social media activity and more.


If you’d like more information on how to get started, ask your ZS team or contact us.

Add insights to your inbox

We’ll send you content you’ll want to read – and put to use.