Chapter 6 – Analysis of Results

In this chapter we systematically analyze the 10,000 simulated assay results generated in Chapter 5.
We use descriptive statistics, probability of out-of-specification (OOS) results, visual tools, and capability indices.
Together, these methods link Monte Carlo outputs to structured GMP decision-making. This allows us to assess compliance with specifications, estimate process capability, and evaluate risks.


πŸ“Š 1. Descriptive Statistics

Key metrics to summarize the simulation output:

R Example:

summary(Assay)
sd(Assay)
IQR(Assay)
quantile(Assay, probs = c(0.05, 0.95))

95% CI for the Mean

t.test(Assay)$conf.int

Summary of simulated assay results (N = 10,000 simulations):

Statistic Value
Sample Size (N) 10,000
Minimum 93.80
1st Quartile (Q1) 98.48
Median 99.49
Mean 99.49
3rd Quartile (Q3) 100.52
Maximum 105.51
Standard Deviation (SD) 1.50
Interquartile Range (IQR) 2.04
5th Percentile 97.03
95th Percentile 101.97
95% CI for Mean [99.46 ; 99.52]

These results confirm that the simulated assay distribution is centered close to the target (β‰ˆ 100%),
with moderate variability (SD β‰ˆ 1.5%).
The specification limits (98–102%) correspond approximately to Β±1.0Οƒ and Β±1.7Οƒ from the mean,
which explains the relatively high probability of out-of-specification results observed later.


πŸ“‰ 2. Probability of Out-of-Spec (OOS)

The main GMP-related question is often: β€œWhat is the probability that a batch is OOS?”

R Example:

p_out <- mean(Assay < 98 | Assay > 102)
p_out

If p_out is small (e.g., < 0.1%), the process is considered highly capable. In our simulation, however, the estimated probability of an out-of-specification result is:

p_out = 0.2125 (β‰ˆ 21%)

This is a relatively high probability, indicating that about 1 in 5 simulated units fall outside the specification range (98–102%).
Such a high OOS rate is consistent with the low capability index (Cpk β‰ˆ 0.33) calculated later.

Confidence Interval (95%) for p_out

For a more robust assessment, we can estimate a 95% confidence interval for p_out using the prop.test function, which provides a score-based CI (Wilson interval). Alternatively, confidence intervals for p_out can also be estimated using bootstrap resampling.
This method is particularly useful when analyzing real experimental data, where the distribution is not well described by simple binomial assumptions.
In Monte Carlo simulations, however, the number of simulated data points can be arbitrarily large, so the binomial approximation is usually sufficient.

N <- length(Assay)
x <- sum(Assay < 98 | Assay > 102)
prop.test(x, N)$conf.int

⚠️ Important note on interpretation of OOS confidence intervals

There is a fundamental difference between two contexts:

Example:

N <- length(Assay)               # number of observations
x <- sum(Assay < 98 | Assay > 102)  # number of OOS
prop.test(x, N)$conf.int

πŸ‘‰ In practice: - For Monte Carlo outputs, the simulation itself provides a precise estimate. - For real GMP data, binomial confidence intervals (Clopper–Pearson, Wilson) or bootstrap resampling are necessary to quantify uncertainty from limited sample size.


πŸ“ˆ 3. Visualizing the Distribution

Graphs make interpretation easier:

R Example:

hist(Assay,
     main = "Distribution of Simulated Assay (%)",
     xlab = "Assay %",
     col = "lightblue",
     border = "white")
abline(v = c(98, 102), col = "red", lwd = 2, lty = 2)

boxplot(Assay, horizontal = TRUE,
        main = "Assay Distribution",
        col = "lightgreen")
abline(v = c(98, 102), col = "red", lwd = 2, lty = 2)

plot(ecdf(Assay),
     main = "Empirical CDF of Assay",
     xlab = "Assay %",
     ylab = "Cumulative probability")
abline(v = c(98, 102), col = "red", lwd = 2, lty = 2)

Analysis Histogram
Figure 6.1 – Histogram of simulated assay values with specification limits (98–102%).

Analysis Boxplot
Figure 6.2 – Boxplot of simulated assay values with specification limits (98–102%).

Analysis ECDF
Figure 6.3 – Empirical CDF of simulated assay values with specification limits (98–102%).


πŸ“ 4. Capability Indices

For normally distributed data:

\[Cpk = \min \left( \frac{USL - \mu}{3\sigma}, \frac{\mu - LSL}{3\sigma} \right)\]

Cpk = min( (USL - ΞΌ)/(3*Οƒ), (ΞΌ - LSL)/(3*Οƒ) )

Note: the closed-form Cpk formula assumes approximate normality and a symmetric distribution.
For non-normal data, consider transformations or percentile-based capability indices, which Monte Carlo can estimate directly.

R Example:

mean_assay <- mean(Assay)
sd_assay   <- sd(Assay)
USL <- 102
LSL <- 98

Cpk <- min((USL - mean_assay) / (3 * sd_assay),
           (mean_assay - LSL) / (3 * sd_assay))
Cpk

Result:
For the simulated assay data, the capability index is:

Cpk = 0.33

This value is far below the usual GMP threshold (Cpk β‰₯ 1.33),
confirming that the process, as simulated, has insufficient capability
to consistently remain within the 98–102% specification limits.
The low Cpk is coherent with the relatively high probability of OOS (β‰ˆ 21%).


πŸ’Š GMP Interpretation

Visual tools and numerical metrics, together, provide the clearest and most reliable basis for decision-making.

πŸ‘‰ This structured analysis bridges raw simulation output with regulatory decision-making, preparing the ground for the real-world pharmaceutical case study in the next chapter.


← Previous: A Complete Simulation in R | β–² back to top | Next β†’ Pharmaceutical Case Study