Exposing invisible risk in clinical trials with PTRS and in silico modeling
Sam Dowd and William Chaplin co-authored this piece.
Key takeaways:
- Most development failures stem from invisible risk, defined as uncertainty that is unmeasured, untraceable and therefore unmanaged.
- PTRS (probability of technical and regulatory success) only becomes useful when it is driver-based and continuously updated. Static headline numbers obscure the risks that drive program success or failure.
- In silico modeling and model-informed drug development (MIDD) create value by making risk visible earlier, enabling better earlier decisions on population, exposure, endpoints and comparators.
What is PTRS, and why is it helpful in drug development?
Drug development is a capital-intensive sequence of largely irreversible bets made under uncertainty. The real danger isn’t uncertainty itself. It’s when uncertainty is left unmeasured, unaccounted for or untraceable—and therefore hard to govern.
We call this invisible risk. Risk isn’t bad, but invisible risk can be extremely detrimental.
In most organizations, the biggest development misses come from assumptions that were never fully vetted and thereby never converted to quantified uncertainty. When risk is visible and the assumptions beneath it transparent, leadership can invest more confidently in programs where risk is reducible and stop or redesign programs buoyed by unsubstantiated assumptions.
What is invisible risk?
Uncertainty that is unmeasured, unaccounted for or untraceable—and therefore hard to govern.
PTRS is one of the most powerful tools in an organization’s arsenal to measure uncertainty, offering R&D leadership as a common metric to express “How likely is this program to win?” and to allocate time, talent and capital accordingly. To work, PTRS should be treated as a living forecast: Early in development, the uncertainty band is wide, and the job of evidence generation is either to narrow it or reveal early that a program rests on shaky assumptions.
When PTRS is credible and traceable, it enables three high-value decisions:
- Portfolio rebalancing: Making the organization’s risk appetite explicit by choosing the right mix of stable programs and higher-upside bets, rather than concentrating risk or indexing too heavily on safe, low-value programs.
- Portfolio optimization: Directing investment toward the programs where the next increment of evidence is most likely to change the outcome, either by increasing the likelihood of success or by revealing early that a program should be redesigned or stopped.
- Protocol optimization: Using the dominant PTRS drivers to design future studies to be decisive, rather than simply “running the next phase.”
In this sense, PTRS is the shared decision language that allows value-creating tradeoffs to happen faster and with less ambiguity.
How is PTRS typically estimated—and where does it fall short?
Most organizations build PTRS using historical benchmarks, based on how often programs progress from phase to phase and ultimately reach market. Teams then adjust these baseline estimates for program-specific factors.
The challenge is that these benchmarks are rarely transparent or pathway-aware. They aggregate multiple drivers—including chemistry, manufacturing and controls (CMC) and strategic discontinuations—and often fail to reflect what a program must actually prove to support a specific indication or label.
As a result, PTRS can appear precise while obscuring the assumptions and risks that matter most.
In silico modeling and MIDD help PTRS reveal the risks that matter most
But PTRS can amplify invisible risk when it becomes a single headline number detached from its drivers and with no uncertainty range. The number may be directionally right while obscuring what actually matters: Which risks are dominant, how those risks change by phase and what evidence would materially change the outlook. When this happens, programs may “feel” healthier than they are, governance becomes reactive rather than proactive, and late-stage failures feel like surprises even when the ingredients for failure were hidden in plain sight.
This is where in silico modeling and MIDD matter: Used well, they make assumptions explicit, quantify uncertainty and enable evidence-backed course corrections so decisions reflect both what’s known and unknown. Neglecting these methodologies—or untethering them from PTRS—can reinforce a deceptive assessment of uncertainty and leave the organization exposed to invisible risk.
The purpose of this meta-analysis is to identify where in silico and MIDD have the highest leverage—by first identifying the drivers that most heavily influence trial success or failure and are reasonably sensitive to earlier modeling, simulation and targeted evidence generation.
The goal is to answer two clear questions: Which risks dominate right now? And which decisions would reduce them fastest?
FIGURE: Directional prevalence of primary risk drivers by phase
As expected, early clinical-stage risk skews toward safety and toxicology risks and pharmacological feasibility. In phases 2 and 3, uncertainty of clinical benefit dominates. And at submission, approval-readiness risks, particularly around CMC and labeling and regulatory alignment, dominate—even for programs with positive trials.
The 4 most important decisions shaping success in clinical trials
The meta-analysis above allows us to create in an in silico opportunity map revealing the uncertainties that (1) most frequently influence phase 2 and 3 outcomes and (2) are most sensitive to earlier modeling, simulation and targeted evidence generation.
Phase 2 is the industry’s most consistent and well-documented bottleneck, with large cross-industry benchmarking showing success rates below 30%. To demonstrate how organizations can use in silico modeling to make earlier development decisions, we’ve chosen to focus on the efficacy and exposure driver—because it’s the dominant failure driver between phases 2 and 3 and because this transition marks where investment balloons and decisions become much harder to reverse.
The implication isn’t to wait until phase 2 to address efficacy risk. It’s to pull phase 2 and 3 efficacy risk into preclinical and phase 1 using early pharmacokinetic pharmacodynamic (PK/PD) modeling, biomarkers and simulation to stress a small handful of decisions that most often break programs.
In practice, using in silico to expose invisible risk within PTRS means getting four decisions right:
- Population selection
- Exposure and dose selection
- Endpoint selection and timing
- Comparator assumptions
1: Whom to study
The most common late-stage failure pattern is a mismatch between mechanism and population. This can occur when a therapy is tested in patients where the mechanism of action can’t meaningfully move outcomes, where heterogeneity dilutes the signal below detectability or where an all comers enrollment compresses effect size and increases variance.
Exposing hidden risk in population selection
- Response and heterogeneity forecasting: Applying Bayesian subgroup and mixture modeling to phase 1 PK/PD, biomarkers and prior trial or real-world data (RWD) to forecast response and variability under different population definitions. Teams see where signals become concentrate and where they become diluted.
- Inclusion and exclusion stress testing: Using trial simulation and real-world prevalence and recruitment data to evaluate trade-offs between effect size, variability, recruitability and generalizability. This shows teams where tighter criteria improve signal and where they undermine feasibility.
- Baseline risk and event-rate simulations: Using recent control arms and RWD to simulate power and readout risk across realistic event-rate ranges. This shows teams where assumptions hold up and where they break down.
2: What exposure to target
Another common contributor to attrition between phases 2 and 3 is exposure misalignment. This happens when a product fails to achieve adequate, durable exposure in enough patients or when the efficacy threshold is not tolerable for patients. Reasons include inadequate exposure due to variability, bioavailability or adherence; presence of drug-drug interactions (DDI) or special population risks; duration or frequency is insufficient to translate into clinical benefit; and more.
Exposing hidden risk in exposure and dose selection
- Exposure distribution prediction: Using phase 1 PK/PD, biomarkers and prior trial or RWD to predict exposure and variability across patient populations. This shows teams where target exposure is consistently achieved and where variability or adherence limits it.
- Dose-exposure-response simulation: Using early PK, PD and clinical data to predict clinical response across candidate regimens. This shows teams where clinically meaningful benefit is achievable and where it is likely to emerge.
- Living tolerability and exposure envelope modeling: Using safety and exposure data to model the range of tolerable and achievable exposure as evidence accrues. This shows where efficacy and tolerability align and where the therapeutic window constrains dosing.
- Adherence and DDI scenario testing: Using real-world adherence patterns and concomitant medication data to model how real-world conditions affect exposure duration.
3. Endpoint selection and timing
A program can be biologically effective and still read out “negative” if the endpoint or follow-up window can’t capture the effect. In practice, many apparent efficacy failures are detectability failures: The study design isn’t sensitive enough to register the change the drug can plausibly produce. This can occur when endpoints are insufficiently sensitive, when follow-up is too short relative to the effect trajectory, or when variability, dropout or missing data obscure the signal. These limitations are often only fully understood once a study is already underway.
Exposing hidden risk in endpoint selection and timing
- Endpoint and trial duration simulation: Using historical trials and RWD to simulate endpoint performance based on expected variance, dropout and rescue therapy.
- Endpoint discovery: Pairing natural history data with biomarkers and clinical outcomes to link biomarkers to disease progression to identify earlier, higher signal or intermediate endpoints more likely to yield a decisive readout.
- Trajectory modeling: Using natural history and biomarker trajectories to predict whether therapeutic effect is likely to emerge early or late—and where longer follow-up is required.
- Design sensitivity analyses: Using trial simulation modeling can quantify the impact of dropout, rescue therapy and measurement variability. This shows where study design may shift the probability of a decisive readout.
4. Which comparator assumptions must be durable
Phase 3 trials don’t fail only because treatments underperform. They also fail because the control arm outperforms expectations or standard of care evolves midprogram, compressing the margin needed to be successful. Reasons can include when placebo or standard-of-care response is higher than assumed, changes in standard of care materially alter the landscape or when event rates or control responses are treated as point estimates rather than realistic ranges.
Exposing hidden risk in comparator assumptions
- Comparator sensitivity analysis: Simulating trials to quantify how control-performance shifts change the required margin and probability of success.
- Control benchmarking and uncertainty ranges: Using model-based meta-analyses of control arms and real-world evidence by geography to build distributions for control performance across geographies.
- Standard-of-care scenario planning: Using competitor timelines and guideline and adoption trends to model plausible shifts over the trial timeline and define what success looks like.
PTRS in clinical trials as a governance tool, not just another metric
Modeling can only meaningfully improve PTRS if modeling changes at least one of following: either (1) the decisions that drive the underlying probability of success or (2) the confidence in the assumptions that drive the forecast, allowing earlier, cleaner governance actions.
In practice, the impact of modeling is not always easy to isolate. Modeling often improves confidence and prevents misallocated investment. But if PTRS is not driver-based and routinely updated, those gains rarely translate into a clean “PTRS delta”—or even a measurable shift in confidence. Impact is better tracked through traceable changes to specific PTRS drivers and uncertainty bounds and then paired with earlier governance actions tied to those updates.
This means treating PTRS as a shared decision artifact with transparency. For each program, leadership and teams should be able to answer the following questions:
- Why is PTRS at this level?
- What assumptions underpin it?
- What evidence supports those assumptions?
- How uncertain are we?
- What would change the estimate?
When answers are clear, leaders can act to mitigate risk by modifying programs generating targeted evidence. PTRS is valuable only when leaders use it to turn a quantified risk into a specific change to the program that reduces uncertainty before the next irreversible investment.
zs:topic/research-and-development,zs:topic/data-digital-and-technology