Monte Carlo simulation can be used to build empirical Operating Characteristic (OC) curves for sampling plans when ISO 3951-1 (variables) is not applicable due to non-normal data, and ISO 2859-1 (attributes) would explode the sample size (e.g., with Ac = 0).
This case study shows how to leverage historical, non-normal measurement data (continuous CQA) to design a reduced yet defensible sampling plan.
📦 Background Box — Why a skewed distribution?
In real GMP operations, fill-weight data are often close to normal (CLT).
Here, we deliberately use a left-skewed model to illustrate a case where ISO 3951-1 is not applicable.
This lets us show how empirical OC curves (via Monte Carlo) can justify reduced sampling while remaining transparent for QA/regulators.
We deliberately avoid sub-LSL values in the illustrative distribution; this prevents any implicit endorsement of past OOS lots and keeps the example fully GxP-neutral.
Solution: Build an empirical OC from the actual historical shape (skewness included) and choose n to hit practical risk targets.
Note. For attributes plans with Ac = 0, the OC curve depends only on the true defective rate
$p = P(X < \mathrm{LSL})$ and on $n$ — i.e. $P_a(p) = (1 - p)^n$.
The historical distribution is therefore used to contextualize $p$, not to calibrate $P_a(p)$ beyond its definition.
Historical shape (illustrative).
We generate a left-skewed distribution on [LSL, U] via a scaled Beta with all values ≥ LSL.
Purpose: a clean picture of where the true defective rate $p = P(X < \mathrm{LSL})$ would come from.
OC setup.
We fix uniform axes (x: 0–16% defectives; y: 0–0.40) and build a fine grid of (p) values.
For each $p$ and plan $(n, \mathrm{Ac}=0)$, we estimate the empirical acceptance probability $P_a(p)$ by Monte Carlo.
Y-axis zoom set to $P_a$ = 0–0.40 to focus on consumer-risk regions.
Theory overlay.
For $A_c = 0$, acceptance means “zero defectives” in the sample, so:
The binomial curve is drawn together with the empirical dots (they coincide up to Monte Carlo noise).
✅ The code below is fully reproducible.
Replace the distribution parameters (e.g.,alpha_hat,beta_hat,LSL,U) with those estimated from your historical data to tailor the OC to your process.
# ================================================================
# Case Study 6 — Historical shape (ggplot2) + OC curves (base R)
# - One-sided sampling (Ac = 0), uniform axes across OC figures
# - Legend with AQL / LTPD / LQ20 values included
# - Four separate OC plots: n = 32, 50, 80, 125
# ================================================================
suppressPackageStartupMessages(library(ggplot2))
set.seed(123)
# -----------------------------
# 0) Setup (one-sided LSL case)
# -----------------------------
LSL <- 98.0
U <- 102.0 # only for the illustrative historical distribution
# A left-skewed Beta(alpha, beta) on [LSL, U] when alpha > beta:
alpha_hat <- 8
beta_hat <- 2.6
# Generator for the illustrative “historical” data:
# by construction, ALL values are >= LSL (no OOS).
r_hist <- function(n) LSL + (U - LSL) * rbeta(n, alpha_hat, beta_hat)
# -----------------------------
# 1) Historical distribution
# -----------------------------
hist_x <- r_hist(100000)
stopifnot(all(hist_x >= LSL)) # defensive check
p_hist <- ggplot(data.frame(x = hist_x), aes(x)) +
geom_histogram(aes(y = after_stat(density)),
bins = 120, fill = "#a6cee3", color = "#1f78b4") +
geom_density(linewidth = 1.1, color = "#1f78b4") +
geom_vline(xintercept = LSL, linetype = 2, linewidth = 1) +
labs(title = "Historical distribution (left-skewed ≥ LSL)",
subtitle = sprintf("Support [%g, %g] — P(X<LSL) = 0 (by construction)", LSL, U),
x = "Fill weight (mg)", y = "Density") +
theme_minimal(base_size = 13) +
theme(plot.title = element_text(face = "bold"),
panel.grid.minor = element_blank())
print(p_hist)
# ggsave("fig12_1_historical.png", p_hist, width = 10, height = 3.6, dpi = 150)
# -------------------------------------
# 2) OC settings and helper functions
# -------------------------------------
# Uniform axes across all OC plots (for visual comparability)
xlim_percent <- c(0, 16) # x: 0–16% defectives
ylim_common <- c(0, 0.40) # y: Pa 0.00–0.40
# Grid of true defect rates p (in fraction). We will plot in percent.
p_grid <- seq(0, xlim_percent[2] / 100, by = 0.002)
# Monte Carlo reps for empirical Pa; increase if you want smoother dots
reps <- 20000
# Empirical acceptance probability at a given p
Pa_empirical <- function(n, Ac, p, reps = 20000) {
# For Ac = 0, accept if K = 0 with K ~ Bin(n, p).
mean(rbinom(reps, size = n, prob = p) <= Ac)
}
# One OC figure (base R): empirical (points) + binomial theory (dashed line)
plot_OC_base <- function(n, Ac = 0L) {
# Curves
Pa_emp <- sapply(p_grid, function(p) Pa_empirical(n, Ac, p, reps))
Pa_the <- pbinom(Ac, n, p_grid) # For Ac=0, equals (1 - p)^n
# Risk points (closed form for Ac = 0)
p_AQL <- 1 - 0.95^(1 / n) # Pa ~ 0.95
p_LTPD <- 1 - 0.10^(1 / n) # Pa ~ 0.10
p_LQ20 <- 1 - 0.20^(1 / n) # Pa ~ 0.20
# Plot
par(mar = c(4.5, 4.8, 3.5, 1))
plot(p_grid * 100, Pa_emp,
type = "p", pch = 16, cex = 0.7, col = "#1f78b4",
xlim = xlim_percent, ylim = ylim_common,
xlab = "True % Defective (p)", ylab = "Acceptance Probability Pa",
main = sprintf("Simulated OC curve — n=%d, Ac=%d", n, Ac))
lines(p_grid * 100, Pa_the, lty = 2, lwd = 2, col = "#000000") # Binomial theory
# Vertical markers (thicker, color-coded)
abline(v = 100 * p_AQL, lty = 3, lwd = 2, col = "#1f78b4") # AQL (blue)
abline(v = 100 * p_LTPD, lty = 3, lwd = 2, col = "#ff7f00") # LTPD (orange)
abline(v = 100 * p_LQ20, lty = 3, lwd = 2, col = "#33a02c") # LQ20 (green)
# Single legend with numeric values
legend("topright",
legend = c("Empirical (MC)",
"Binomial theory",
sprintf("AQL \u2248 %.3f%%", 100 * p_AQL),
sprintf("LTPD \u2248 %.2f%%", 100 * p_LTPD),
sprintf("LQ20 \u2248 %.2f%%", 100 * p_LQ20)),
col = c("#1f78b4", "#000000", "#1f78b4", "#ff7f00", "#33a02c"),
pch = c(16, NA, NA, NA, NA),
lty = c(NA, 2, 3, 3, 3),
lwd = c(NA, 2, 2, 2, 2),
pt.cex = 1.0,
bty = "n")
# Return the three key percentages (for tables)
invisible(c(AQL = 100 * p_AQL, LTPD = 100 * p_LTPD, LQ20 = 100 * p_LQ20))
}
# --------------------------------------
# 3) Four separate OC plots (not a panel)
# --------------------------------------
risk_n32 <- plot_OC_base(32, Ac = 0L)
risk_n50 <- plot_OC_base(50, Ac = 0L)
risk_n80 <- plot_OC_base(80, Ac = 0L)
risk_n125 <- plot_OC_base(125, Ac = 0L)
# Optional PNG export:
# png("fig12_2_OC_n32.png", 1400, 500, res = 130); plot_OC_base(32); dev.off()
# png("fig12_3_OC_n50.png", 1400, 500, res = 130); plot_OC_base(50); dev.off()
# png("fig12_4_OC_n80.png", 1400, 500, res = 130); plot_OC_base(80); dev.off()
# png("fig12_5_OC_n125.png",1400, 500, res = 130); plot_OC_base(125); dev.off()
# --------------------------------------
# 4) Summary tables (percent + counts/1e6)
# --------------------------------------
# First, assemble a compact table with AQL, LTPD, LQ20 by plan:
risk_table <- rbind(
`n=32, Ac=0` = round(risk_n32, 3),
`n=50, Ac=0` = round(risk_n50, 3),
`n=80, Ac=0` = round(risk_n80, 3),
`n=125, Ac=0` = round(risk_n125, 3)
)
# Then build a data.frame with “clean” column names (no spaces)
risk_df <- data.frame(
Plan = rownames(risk_table),
AQL_pct = risk_table[, "AQL"],
LTPD_pct = risk_table[, "LTPD"],
LQ20_pct = risk_table[, "LQ20"]
)
# Counts per 1,000,000 units (helps with interpretability on slides)
risk_df$AQL_cnt_1e6 <- round(risk_df$AQL_pct / 100 * 1e6)
risk_df$LTPD_cnt_1e6 <- round(risk_df$LTPD_pct / 100 * 1e6)
risk_df$LQ20_cnt_1e6 <- round(risk_df$LQ20_pct / 100 * 1e6)
print(risk_df, row.names = FALSE)
Table 12.1 – Summary of AQL, LTPD and LQ20 values from Monte Carlo simulation (percent and counts per 1,000,000 units).
| Plan | AQL (%) | LTPD (%) | LQ20 (%) | AQL (count/1e6) | LTPD (count/1e6) | LQ20 (count/1e6) |
|---|---|---|---|---|---|---|
| n=32, Ac=0 | 0.160 | 6.943 | 4.905 | 1,600 | 69,430 | 49,050 |
| n=50, Ac=0 | 0.103 | 4.501 | 3.168 | 1,030 | 45,010 | 31,680 |
| n=80, Ac=0 | 0.064 | 2.837 | 1.992 | 640 | 28,370 | 19,920 |
| n=125, Ac=0 | 0.041 | 1.825 | 1.279 | 410 | 18,250 | 12,790 |
Reading tip. As (n) increases, all three thresholds shift left (smaller (p)), reflecting a lower acceptance at the same defect level (i.e., stronger consumer protection).
Figure 12.1 – Historical fill-weight distribution (illustrative).
The left-skewed Beta model ensures all values are ≥ LSL; no OOS values are simulated.
This clarifies why OC curves are interpreted against the true defective rate \(p = P(X<\mathrm{LSL})\).
Figure 12.2 – Simulated OC curve (n = 32, Ac = 0).
Blue points show Monte Carlo estimates; the dashed black line is the binomial theory.
Vertical markers identify AQL (≈0.95), LTPD (≈0.10) and LQ20 (≈0.20) acceptance probabilities.
Figure 12.3 – Simulated OC curve (n = 50, Ac = 0).
As n increases, the OC becomes steeper, reducing producer’s risk and improving consumer protection.
Figure 12.4 – Simulated OC curve (n = 80, Ac = 0).
Larger n increases discriminating power, causing AQL/LTPD/LQ20 to shift left (lower p).
Figure 12.5 – Simulated OC curve (n = 125, Ac = 0).
The plan offers the strongest consumer protection but at the cost of the largest sample size.
Replace the distribution parameters with estimates from your historical CQA.
Keep Ac = 0 (or set Ac = 1 if justified).
Re-run the simulation to estimate the empirical OC and compare plans.
This approach respects the real data shape (non-normal), so producer’s and consumer’s risks are calibrated on reality.
It provides a transparent, auditable justification for reduced sampling.
Combine with pilot confirmation runs and CPV monitoring to maintain long-term risk control.
In routine use, confirm with pilot lots and monitor via CPV to maintain targeted $(P_a$) levels.
When ISO 3951-1 assumptions fail, empirical Monte Carlo OC curves provide a data-grounded alternative.
They can justify smaller yet statistically defensible sampling plans (e.g., n = 50 instead of 125).
Reading AQL/LTPD/LQ20 directly from simulation ensures traceable risk management aligned with GMP expectations.
| ← Previous: Case Study 5 — Predictive Stability with Monte Carlo | ▲ Back to top | Next: Case Study 7 — Microbiological Counts → |