Chapter 11 — Case Study 5 — Predictive Stability with Monte Carlo

Monte Carlo simulation can be used to predict future stability points using historical stability data from previous batches.
This approach is especially valuable for APIs and finished products where shelf-life decisions are critical.

🎯 Scenario

We have stability data from 4 historical batches (e.g., API assay at 0, 3, 6, 9, 12 months).

We want to predict whether a new batch — for which only early stability points (0–3–6 months) are available —
will remain within specification up to the time covered by the historical data (here: 12 months).

Predictions at later time points (e.g., 18–24 months) can still be generated by the same model,
but these are extrapolations beyond the observed domain and therefore carry much higher uncertainty.
Unless supported by additional evidence (longer historical data, accelerated stability, or a validated kinetic model),
such long-term forecasts should be treated as illustrative “what-if” scenarios, not as regulatory conclusions.

🧮 Monte Carlo Approach — Step by Step

📦 1. Slope comparability check (historical data)

We start with real numbers for 4 batches (replace with your own).
Assay (%) at 0, 3, 6, 9, 12 months:

Time (m)	B1	B2	B3	B4
0	100.0	99.8	100.5	99.9
3	99.3	99.2	99.6	99.0
6	98.8	98.7	99.1	98.6
9	98.1	98.3	98.5	98.0
12	97.7	97.8	98.1	97.5

library(dplyr); library(tidyr); library(ggplot2)

# 1) Create dataset
time_points <- c(0,3,6,9,12)
hist_df <- tribble(
  ~Batch, ~Time, ~Assay,
  "B1",0,100.0,"B1",3,99.3,"B1",6,98.8,"B1",9,98.1,"B1",12,97.7,
  "B2",0,99.8, "B2",3,99.2,"B2",6,98.7,"B2",9,98.3,"B2",12,97.8,
  "B3",0,100.5,"B3",3,99.6,"B3",6,99.1,"B3",9,98.5,"B3",12,98.1,
  "B4",0,99.9, "B4",3,99.0,"B4",6,98.6,"B4",9,98.0,"B4",12,97.5)

# 2) Fit models
fit_no_int <- lm(Assay ~ Time + Batch, data=hist_df)
fit_int    <- lm(Assay ~ Time * Batch, data=hist_df)
anova(fit_no_int, fit_int)  # p > 0.05 -> no evidence of slope differences

# 3) Visual overlay
ggplot(hist_df, aes(Time, Assay, color=Batch)) +
  geom_hline(yintercept = 95, linetype = 2, color = "red") +
  geom_point(size=2) +
  geom_smooth(method="lm", se=FALSE) +
  labs(title="Stability data by batch (historical)",
       subtitle="Linear fits per batch; check slope comparability",
       y="Assay (%)") +
  theme_minimal()

Case Study 5 – Slope comparability check (historical data)

Figure 11.1 – Stability data for 4 historical batches with linear fits. Each colored line represents a batch; the dashed line is the lower specification limit (LSL). Visual overlay suggests broadly similar degradation slopes.

11.1 – Checking slope comparability among historical batches

Before developing a predictive stability model, it is essential to verify whether the degradation rate (slope) is comparable across historical batches. If the slopes are statistically similar, the data can be pooled to build a single degradation model; if they differ, each batch (or group of batches) must be modelled separately or a mixed-effects approach should be considered. This preliminary step, recommended in ICH Q1E (“Evaluation of Stability Data”), ensures that the predicted shelf life is scientifically and statistically justified.

11.1.1 – Concluding remarks

This slope-comparability test provides the statistical justification for pooling or separating batches before predicting stability. When the p-value is greater than 0.05, as in our example, pooling is acceptable and leads to a more robust and parsimonious model for shelf-life estimation. If, instead, the test indicates significant differences, separate or hierarchical models should be fitted. Explicitly documenting this decision path (test performed, result, consequence) strengthens the transparency of the stability evaluation for regulatory submissions.

💻 2. Add the new batch (partial data)

Model comparison	Residual df	RSS	Df	Sum Sq	F	Pr(>F)
Model 1: Assay ~ Time + Batch (common slope)	15	0.228
Model 2: Assay ~ Time * Batch (different slopes)	12	0.162	3	0.066	1.63	0.235

Time (m)	NEW
0	100.1
3	99.4
6	98.8

new_df <- tribble(
  ~Batch,~Time,~Assay,
  "NEW",0,100.1,
  "NEW",3,99.4,
  "NEW",6,98.8)

all_df <- bind_rows(hist_df, new_df)

ggplot(all_df, aes(Time, Assay, color=Batch)) +
  geom_point(size=2) +
  geom_smooth(method="lm", se=FALSE) +
  geom_hline(yintercept=95, linetype=2) +
  labs(title="Historical batches + NEW batch",
       subtitle="Early stability points of new batch added",
       y="Assay (%)") +
  theme_minimal()

Case Study 5 – Historical + new batch overlay

Figure 11.2 – Historical batches (B1–B4) with linear fits (coloured lines) overlaid with the early stability points of the new batch (pink). The dashed horizontal line marks the lower specification limit (LSL = 95%). At this stage no modelling of the new batch is performed; its early points are only plotted against the historical trends to visually assess whether its behavior so far is consistent with the established degradation slopes before moving to predictions.

3️⃣ Fit pooled model on historical data and quantify parameter uncertainty via bootstrap

In this section we first fit the pooled degradation model using the historical batches only, because these provide the information needed to estimate the common degradation rate.
We then use bootstrap resampling of the historical data, a Monte Carlo method, to quantify the uncertainty in the model parameters (intercept and slope).
Finally, we apply these bootstrapped parameter sets to the early measurements of the new batch in order to project its assay values at future time points (12, 18 and 24 months).

## ---------------------------------------------------------------------------
## This script performs predictive stability for a new batch using a
## Monte Carlo / bootstrap approach:
##
## 1) It bootstraps the common degradation slope from the historical
##    batches. For each bootstrap draw, it re-estimates the intercept
##    (alpha*) of the NEW batch by anchoring to its early data points
##    (0–3–6 months). This ensures that predictions are aligned with
##    the actual starting level of the NEW batch.
##
## 2) It incorporates the residual standard deviation (σ) from the
##    pooled model to reflect observational variability. As a result,
##    the predicted distributions and risks represent future
##    measurements, not just the mean degradation path.
##
## The outputs (probabilities below LSL, predictive quantiles and
## violin/box plots) provide a quantitative risk assessment of the
## new batch at future time points (12, 18 and 24 months).
## ---------------------------------------------------------------------------

## ---- 3) POOLED + SIMPLE BOOTSTRAP + NEW-BATCH ANCHOR + OBSERVATIONAL σ ----
## Fixes implemented:
## - Ensure Time is numeric (in case it was a factor).
## - After boot(), set explicit column names on boot_res$t so we can index "slope".
## - Anchor predictions to NEW by re-estimating the intercept per draw.
## - Compute observational risk using residual SD (σ) from the pooled model.

library(boot)
library(dplyr)
library(ggplot2)

set.seed(123)
LSL <- 95
future_time <- c(12, 18, 24)
B <- 2000  # bootstrap draws

# --- Safety: ensure numeric Time ---
hist_df <- hist_df %>% mutate(Time = as.numeric(Time))
new_df  <- new_df  %>% mutate(Time = as.numeric(Time))

# 1) Pooled model on historical data (used for σ and as baseline)
fit_pooled <- lm(Assay ~ Time, data = hist_df)
sigma_hat  <- sqrt(sum(residuals(fit_pooled)^2) / df.residual(fit_pooled))  # residual SD (observational)

# 2) Nonparametric bootstrap of coefficients (simple and robust)
boot_fun <- function(d, idx){
  dd <- d[idx, ]
  cf <- coef(lm(Assay ~ Time, data = dd))
  c(intercept = unname(cf[1]), slope = unname(cf[2]))
}
boot_res  <- boot(hist_df, boot_fun, R = B)
coef_mat  <- as.matrix(boot_res$t)
# Give explicit names (boot may drop them)
if (is.null(colnames(coef_mat)) || ncol(coef_mat) != 2) {
  colnames(coef_mat) <- c("intercept", "slope")
}

beta_star <- coef_mat[, "slope"]  # bootstrapped slope b*
# NOTE: we do not use the historical intercept from bootstrap; we anchor to NEW instead.

# 3) Anchor to NEW (re-estimate intercept per draw using NEW 0–3–6)
t_new <- new_df$Time
y_new <- new_df$Assay
alpha_new_star <- sapply(beta_star, function(b) mean(y_new - b * t_new))

# 4) Mean predictions at future times (no measurement error)
mu_mat <- sapply(future_time, function(tt) alpha_new_star + beta_star * tt)
colnames(mu_mat) <- paste0("t", future_time, "m")

# 5) OBSERVATIONAL distributions (future measurements) by adding σ
#    (a) Observational risk P(N(mu, σ^2) < LSL) via closed form
p_out_obs <- sapply(seq_along(future_time), function(j){
  mean(pnorm(LSL, mean = mu_mat[, j], sd = sigma_hat))
})
names(p_out_obs) <- colnames(mu_mat)

#    (b) Observational predictive quantiles by simulating future measurements
obs_mat <- mu_mat + matrix(
  rnorm(B * length(future_time), 0, sigma_hat),
  nrow = B, ncol = length(future_time), byrow = FALSE
)
colnames(obs_mat) <- colnames(mu_mat)
q_obs <- apply(obs_mat, 2, quantile, probs = c(0.05, 0.5, 0.95), na.rm = TRUE)

# 6) OUTPUT for tables (recommended for GMP: risk on FUTURE MEASUREMENT)
p_out     <- round(p_out_obs, 3)
quantiles <- round(q_obs, 2)

print(p_out)
print(quantiles)

# 7) PLOT: observational predictive distributions (consistent with p_out and quantiles)
pred_df <- data.frame(
  value  = c(obs_mat),
  tlabel = factor(rep(colnames(obs_mat), each = nrow(obs_mat)),
                  levels = colnames(obs_mat))
)

ggplot(pred_df, aes(tlabel, value)) +
  geom_violin(trim = FALSE, alpha = 0.5) +
  geom_boxplot(width = 0.15, outlier.size = 0.7) +
  geom_hline(yintercept = LSL, linetype = 2, color = "red") +
  labs(title    = "Predictive distributions for the new batch",
       subtitle = "Violin + boxplot at 12/18/24 months; red dashed line = LSL (observational)",
       x = "Future time point", y = "Predicted Assay (%)") +
  theme_minimal()

Figure 11.3 – Predictive distributions at 12, 18 and 24 months (pooled model with bootstrap; observational distributions including residual variability σ).

The tables below summarize the predictive outputs from the bootstrap simulation.
They provide a quantitative risk assessment that can be directly used in GMP decision-making (e.g., shelf-life assignment, accelerated release).

Table 11.2 – Estimated probability of falling below the LSL

Future time point	P(Assay < LSL)
12 months	0.0 %
18 months	0.0 %
24 months	9.6 %

Shows the estimated risk of falling below the lower specification limit at each future time point (observational risk, including σ). Values reported as 0.0% are below the reporting precision with the current simulation settings.

Table 11.3 – Predictive quantiles of Assay (%) for the new batch

Quantile	12 months	18 months	24 months
5%	97.30	96.08	94.85
50% (median)	97.75	96.64	95.52
95%	98.22	97.15	96.14

Predictive interval (5–95%) shows the spread of expected values; the median is the central prediction.

Interpretation (practical use):
Taken together, the probability and quantile tables show not only the expected values but also the uncertainty around them.
At 12 and 18 months the predicted risk of falling below specification is essentially zero, whereas at 24 months it rises to about 10%.
These figures can be used directly to support shelf-life assignment and to justify regulatory decisions with quantitative evidence.

Presenting both probability of non-compliance and predictive intervals helps regulators and QA to see not only point estimates but the underlying risk distribution.

Model diagnostics (residual plots, QQ-plot, and a curvature check via a Time^2 term) did not indicate material deviations from linearity or normal-error assumptions over 0–12 months. If heteroscedasticity were evident, a WLS/GLS fit or a residual (wild) bootstrap would be adopted.

📌 GMP Value

This case study illustrates how a Monte Carlo approach (here implemented via bootstrap resampling of the pooled degradation model)
provides a quantitative, documented risk assessment of future stability points without waiting for full real-time data.

This quantitative framework can also be extended to sampling and OC curve analysis.