In this chapter we systematically analyze the 10,000 simulated assay results generated in Chapter 5.
We use descriptive statistics, probability of out-of-specification (OOS) results, visual tools, and capability indices.
Together, these methods link Monte Carlo outputs to structured GMP decision-making.
This allows us to assess compliance with specifications, estimate process capability, and evaluate risks.
Key metrics to summarize the simulation output:
R Example:
summary(Assay)
sd(Assay)
IQR(Assay)
quantile(Assay, probs = c(0.05, 0.95))
95% CI for the Mean
t.test(Assay)$conf.int
Summary of simulated assay results (N = 10,000 simulations):
Statistic | Value |
---|---|
Sample Size (N) | 10,000 |
Minimum | 93.80 |
1st Quartile (Q1) | 98.48 |
Median | 99.49 |
Mean | 99.49 |
3rd Quartile (Q3) | 100.52 |
Maximum | 105.51 |
Standard Deviation (SD) | 1.50 |
Interquartile Range (IQR) | 2.04 |
5th Percentile | 97.03 |
95th Percentile | 101.97 |
95% CI for Mean | [99.46 ; 99.52] |
These results confirm that the simulated assay distribution is centered close to the target (β 100%),
with moderate variability (SD β 1.5%).
The specification limits (98β102%) correspond approximately to Β±1.0Ο and Β±1.7Ο from the mean,
which explains the relatively high probability of out-of-specification results observed later.
The main GMP-related question is often: βWhat is the probability that a batch is OOS?β
R Example:
p_out <- mean(Assay < 98 | Assay > 102)
p_out
If p_out
is small (e.g., < 0.1%), the process is considered highly capable.
In our simulation, however, the estimated probability of an out-of-specification result is:
p_out = 0.2125 (β 21%)
This is a relatively high probability, indicating that about 1 in 5 simulated units fall outside the specification range (98β102%).
Such a high OOS rate is consistent with the low capability index (Cpk β 0.33) calculated later.
Confidence Interval (95%) for p_out
For a more robust assessment, we can estimate a 95% confidence interval for p_out
using the prop.test
function, which provides a score-based CI (Wilson interval).
Alternatively, confidence intervals for p_out
can also be estimated using bootstrap resampling.
This method is particularly useful when analyzing real experimental data, where the distribution is not well described by simple binomial assumptions.
In Monte Carlo simulations, however, the number of simulated data points can be arbitrarily large, so the binomial approximation is usually sufficient.
N <- length(Assay)
x <- sum(Assay < 98 | Assay > 102)
prop.test(x, N)$conf.int
β οΈ Important note on interpretation of OOS confidence intervals
There is a fundamental difference between two contexts:
p_out <- mean(Assay < 98 | Assay > 102)
With large simulated samples (e.g., 100,000), this estimate is already very precise. Binomial confidence intervals are not required here, because the probability is derived directly from the model.
prop.test()
in R, which implements the Wilson score-based CI by default.Example:
N <- length(Assay) # number of observations
x <- sum(Assay < 98 | Assay > 102) # number of OOS
prop.test(x, N)$conf.int
π In practice: - For Monte Carlo outputs, the simulation itself provides a precise estimate. - For real GMP data, binomial confidence intervals (ClopperβPearson, Wilson) or bootstrap resampling are necessary to quantify uncertainty from limited sample size.
Graphs make interpretation easier:
Histogram for the overall shape.
Boxplot to detect skewness and outliers.
Empirical cumulative distribution function (ECDF) to directly read probabilities.
R Example:
hist(Assay,
main = "Distribution of Simulated Assay (%)",
xlab = "Assay %",
col = "lightblue",
border = "white")
abline(v = c(98, 102), col = "red", lwd = 2, lty = 2)
boxplot(Assay, horizontal = TRUE,
main = "Assay Distribution",
col = "lightgreen")
abline(v = c(98, 102), col = "red", lwd = 2, lty = 2)
plot(ecdf(Assay),
main = "Empirical CDF of Assay",
xlab = "Assay %",
ylab = "Cumulative probability")
abline(v = c(98, 102), col = "red", lwd = 2, lty = 2)
Figure 6.1 β Histogram of simulated assay values with specification limits (98β102%).
Figure 6.2 β Boxplot of simulated assay values with specification limits (98β102%).
Figure 6.3 β Empirical CDF of simulated assay values with specification limits (98β102%).
For normally distributed data:
\[Cpk = \min \left( \frac{USL - \mu}{3\sigma}, \frac{\mu - LSL}{3\sigma} \right)\]Cpk = min( (USL - ΞΌ)/(3*Ο), (ΞΌ - LSL)/(3*Ο) )
Note: the closed-form Cpk formula assumes approximate normality and a symmetric distribution.
For non-normal data, consider transformations or percentile-based capability indices, which Monte Carlo can estimate directly.
R Example:
mean_assay <- mean(Assay)
sd_assay <- sd(Assay)
USL <- 102
LSL <- 98
Cpk <- min((USL - mean_assay) / (3 * sd_assay),
(mean_assay - LSL) / (3 * sd_assay))
Cpk
Result:
For the simulated assay data, the capability index is:
Cpk = 0.33
This value is far below the usual GMP threshold (Cpk β₯ 1.33),
confirming that the process, as simulated, has insufficient capability
to consistently remain within the 98β102% specification limits.
The low Cpk is coherent with the relatively high probability of OOS (β 21%).
Visual tools and numerical metrics, together, provide the clearest and most reliable basis for decision-making.
π This structured analysis bridges raw simulation output with regulatory decision-making, preparing the ground for the real-world pharmaceutical case study in the next chapter.
β Previous: A Complete Simulation in R | β² back to top | Next β Pharmaceutical Case Study