This article provides a comprehensive guide to Verification, Validation, and Uncertainty Quantification (VVUQ) in computational modeling for biomedical research and drug development.
This article provides a comprehensive guide to Verification, Validation, and Uncertainty Quantification (VVUQ) in computational modeling for biomedical research and drug development. It covers foundational concepts, practical methodologies, and optimization strategies essential for building model credibility. Readers will learn to apply VVUQ frameworks like ASME V&V 40 and integrate AI/ML tools to enhance decision-making, satisfy regulatory standards, and accelerate the delivery of new therapies to patients.
Verification, Validation, and Uncertainty Quantification (VVUQ) represents a systematic framework for establishing confidence in computational models by ensuring their mathematical correctness, physical accuracy, and statistical reliability. As computational modeling and simulation (CM&S) increasingly replace physical testing across engineering and biomedical sectors, VVUQ provides the essential methodology for assessing model credibility [1]. This trifecta approach has become particularly critical in fields such as medical device development and pharmaceutical research, where regulatory agencies now accept in silico evidence as part of marketing authorization submissions [2].
The fundamental definitions of VVUQ's components are clearly established in technical standards. Verification is "the process of determining that a computational model accurately represents the underlying mathematical model and its solution," essentially answering "Are we solving the equations correctly?" [3]. Validation determines "the degree to which a model is an accurate representation of the real world from the perspective of the intended uses of the model," answering "Are we solving the correct equations?" [3]. Uncertainty Quantification (UQ) is "the science of quantifying, characterizing, tracing, and managing uncertainty in computational and real world systems" [4].
The proper relationship between these components follows a logical sequence: verification must precede validation, which in turn provides context for uncertainty quantification [3]. This sequential approach separates implementation errors (verification) from model formulation shortcomings (validation) while systematically accounting for variabilities and uncertainties that affect predictive confidence [2] [4].
The VVUQ framework operates as an integrated system where each component addresses distinct aspects of model credibility. Verification ensures the numerical implementation correctly solves the mathematical formalism, validation assesses how well computational predictions match experimental observations of the real world, and uncertainty quantification characterizes the reliability of model predictions given inherent variabilities in inputs, parameters, and model form [4] [3].
This framework has evolved from quality management principles in computational fluid dynamics and solid mechanics, gradually expanding to encompass complex biological systems and computational biomechanics [3]. The American Society of Mechanical Engineers (ASME) has played a pivotal role in standardizing VVUQ terminology and methodologies through publications such as VVUQ 1-2022, which establishes consistent terminology across computational modeling and simulation applications [1].
Table: The Three Components of VVUQ
| Component | Core Question | Focus | Key Activities |
|---|---|---|---|
| Verification | "Are we solving the equations correctly?" | Mathematics and code implementation [3] | Code verification, calculation verification, convergence studies [5] [4] |
| Validation | "Are we solving the right equations?" | Physical accuracy and real-world representation [3] | Comparison with experimental data, validation metrics, credibility assessment [2] [5] |
| Uncertainty Quantification | "How reliable are our predictions given uncertainties?" | Reliability and confidence bounds [6] [4] | Identifying uncertainty sources, propagation analysis, sensitivity analysis [4] |
Verification consists of two subordinate processes: code verification and calculation verification. Code verification ensures the computational algorithms correctly implement the mathematical model, typically through comparison with analytical solutions or manufactured problems [3]. Calculation verification focuses on estimating numerical errors introduced by discretization, iteration, and round-off, often assessed through mesh convergence studies [5] [3].
Validation constitutes an evidence-generating process that compares computational outputs with experimental data from the physical system being modeled [3]. This process is always context-dependent, as a model may be adequately validated for one intended use but insufficient for another. The ASME V&V 40 standard emphasizes that validation activities must be informed by the model's "context of use" (COU) and the potential risk associated with an incorrect prediction [2].
Uncertainty Quantification formally characterizes how uncertainties in inputs, parameters, and model form affect the quantity of interest. UQ distinguishes between aleatoric uncertainty (inherent variability irreducible by more data) and epistemic uncertainty (reducible through better information or knowledge) [4]. For computational models, key uncertainty sources include uncertain inputs, model form limitations, computational approximations, and physical testing variability [4].
Code verification methodologies ensure that the mathematical model is correctly implemented in software. The most rigorous approach employs comparison with analytical solutions for simplified problems with known exact answers [3]. When analytical solutions are unavailable for complex systems, the method of manufactured solutions provides an alternative by constructing an arbitrary solution function, deriving corresponding source terms, and verifying that the code reproduces the manufactured solution [3].
Software Quality Engineering (SQE) practices provide the foundation for reliable code verification through systematic code review, debugging, and version control [6]. These processes are particularly critical for in silico trials and medical device applications, where regulatory acceptance requires demonstrated software reliability [2] [6].
Calculation verification estimates numerical accuracy in specific simulations, primarily addressing discretization errors. The standard methodology involves mesh convergence studies, where successive mesh refinements demonstrate asymptotic approach to a continuum solution [3]. A common acceptance criterion requires that further mesh refinement changes the solution output by less than an established threshold (e.g., <5%) [3].
Table: Calculation Verification Methods for Discretization Error Estimation
| Method | Procedure | Application Context | Acceptance Criteria |
|---|---|---|---|
| Grid Convergence Index | Systematic refinement of spatial/temporal discretization [3] | Finite element, finite volume, finite difference methods | Solution change < 5% with refinement [3] |
| Iterative Convergence | Monitoring solution evolution with iteration count [5] | Problems solved through iterative methods | Residual reduction to specified tolerance [5] |
| Time Step Convergence | Progressive reduction of time step size [5] | Transient, dynamic simulations | Insensitive response with further reduction [5] |
For complex biomechanical systems, verification must address multi-physics interactions and nonlinear material behaviors. Ionescu et al. provide an exemplar case where they verified a transversely isotropic hyperelastic constitutive model implementation against an analytical solution for equibiaxial stretch, achieving stress predictions within 3% of the theoretical values [3].
Validation requires carefully designed physical experiments that provide high-quality data for comparing with computational predictions. These experiments must capture the essential physics relevant to the model's context of use while providing comprehensive documentation of boundary conditions, initial conditions, and material properties [3]. Validation experiments differ from traditional research experiments through their specific design for computational comparison, requiring rigorous characterization of experimental uncertainties [5].
The validation process follows a structured workflow: (1) define the context of use and quantities of interest; (2) design experiments that isolate these quantities; (3) execute experiments with comprehensive uncertainty characterization; (4) perform corresponding simulations; (5) compare results using appropriate validation metrics; and (6) assess credibility relative to predefined acceptability thresholds [2] [5].
Validation metrics provide quantitative measures of agreement between computational results and experimental data. These range from simple difference measures for scalar quantities to multivariate metrics for field comparisons [5]. The ASME VVUQ 20.1-2024 standard specifically addresses "Multivariate Metric for Validation," providing methodologies for comparing complex data patterns [1].
For regulatory applications, the ASME V&V 40-2018 standard introduces a risk-informed credibility framework that determines the required level of validation evidence based on model influence (how much the decision relies on the model) and decision consequence (potential impact of an incorrect prediction) [2]. This framework ensures validation rigor is proportionate to the model's role in decision-making, with higher-stakes applications requiring more extensive validation evidence.
Uncertainty in computational modeling arises from multiple sources, broadly categorized as aleatoric (irreducible randomness) and epistemic (reducible knowledge limitations) [4]. Aleatoric uncertainty includes inherent variabilities in material properties, operating conditions, and manufacturing tolerances, while epistemic uncertainty encompasses model form approximations, parameter estimation errors, and numerical approximations [4].
Table: Classification of Uncertainty Sources in Computational Modeling
| Uncertainty Category | Specific Sources | Representation Methods | Reduction Strategies |
|---|---|---|---|
| Aleatoric (Irreducible) | Natural material variability, environmental fluctuations, operational differences [4] | Probability distributions, random processes | Cannot be reduced; must be characterized [4] |
| Epistemic (Reducible) | Model form assumptions, simplified physics, unknown parameters [4] | Interval analysis, probability boxes, Bayesian methods | Improved models, additional data, expert knowledge [4] |
| Parametric | Imperfectly known material properties, boundary conditions [4] | Probability distributions, intervals | Experimental calibration, parameter estimation [3] |
| Numerical | Discretization error, iterative convergence, round-off [4] [3] | Error estimates, convergence studies | Mesh refinement, higher-order methods [3] |
Uncertainty quantification employs both non-probabilistic and probabilistic frameworks. Non-probabilistic methods include interval analysis and fuzzy sets, while probabilistic approaches dominate engineering applications through probability distributions and random field representations [4]. The UQ workflow typically involves: (1) identifying and classifying uncertainty sources; (2) quantifying input uncertainties; (3) propagating uncertainties through the computational model; (4) analyzing output uncertainties; and (5) performing sensitivity analysis to identify dominant uncertainty contributors [4].
Uncertainty propagation methods include sampling approaches (e.g., Monte Carlo, Latin Hypercube), expansion methods (e.g., polynomial chaos), and surrogate-based techniques [4]. Monte Carlo methods remain the gold standard for accuracy but often prove computationally prohibitive for large-scale models, motivating advanced surrogate modeling techniques that approximate complex system responses with computationally efficient models [4].
Implementing comprehensive VVUQ requires both conceptual frameworks and practical tools. The researcher's toolkit includes standardized protocols, software solutions, and reference materials essential for executing rigorous VVUQ processes.
Table: Essential VVUQ Resources for Researchers
| Resource Category | Specific Tools/Standards | Application Context | Key Functions |
|---|---|---|---|
| Technical Standards | ASME VVUQ 1-2022 (Terminology) [1] | All computational modeling | Standardized definitions and concepts |
| ASME V&V 10-2019 (Solid Mechanics) [1] | Structural analysis, biomechanics | Verification and validation protocols | |
| ASME V&V 20-2009 (CFD and Heat Transfer) [1] | Fluid dynamics, thermal analysis | Specific methodologies for CFD | |
| ASME V&V 40-2018 (Medical Devices) [1] [2] | Medical technology, in silico trials | Risk-informed credibility assessment | |
| Software Capabilities | SmartUQ [4] | General engineering systems | Design of experiments, calibration, UQ |
| Custom UQ Tools [4] | Discipline-specific applications | Uncertainty propagation, sensitivity analysis | |
| Experimental Protocols | Validation Experiment Design [5] [3] | Physical testing for validation | Controlled experiments with uncertainty characterization |
| Mesh Convergence Studies [3] | Numerical simulation | Discretization error estimation |
VVUQ methodologies have found particularly critical applications in medical domains where computational predictions inform safety and efficacy decisions. For medical devices, the ASME V&V 40 standard provides a risk-based framework for assessing model credibility, with implementation examples including fatigue analysis of tibial tray components [1] [2]. In pharmaceutical development, the Comprehensive in vitro Proarrhythmia Assay (CiPA) initiative employs cardiac electrophysiology models with high-throughput in vitro screening for drug safety assessment, requiring rigorous VVUQ for regulatory acceptance [2].
Digital twins represent an emerging frontier for VVUQ application, particularly in precision medicine where patient-specific models require continuous updating with real-time data [6]. Unlike traditional models, digital twins introduce unique VVUQ challenges related to frequent model updates and bidirectional physical-virtual information flow, necessitating dynamic validation approaches and continuous uncertainty monitoring [6].
The VVUQ trifecta represents an indispensable framework for establishing credibility in computational modeling research. By systematically addressing mathematical implementation (verification), physical accuracy (validation), and statistical reliability (uncertainty quantification), this methodology enables researchers to build confidence in their computational predictions. The rigorous application of VVUQ principles has become particularly crucial as computational models increasingly support high-consequence decisions in medical device regulation, pharmaceutical development, and personalized medicine.
The continuing evolution of VVUQ standards and methodologies reflects the growing sophistication of computational modeling across scientific disciplines. As digital twins and other advanced simulation technologies emerge, VVUQ frameworks must adapt to address new challenges in model updating, real-time validation, and dynamic uncertainty quantification. For computational modeling to fulfill its potential as a reliable tool for scientific discovery and engineering innovation, the principled application of verification, validation, and uncertainty quantification remains essential.
The adoption of Verification, Validation, and Uncertainty Quantification (VVUQ) represents a paradigm shift in modern drug development and regulatory submissions. Computational models have progressively moved from traditional engineering disciplines to critical applications in cell, tissue, and organ biomechanics, enabling unprecedented capabilities in predicting drug effects and medical device performance [7]. These models provide quantitative simulations of living systems that can yield stress and strain data across entire biological continua, offering insights where physical measurements are difficult or impossible to obtain [7]. The fundamental premise of VVUQ lies in establishing model credibility through a systematic framework that ensures mathematical implementation accuracy (verification), physical representation correctness (validation), and comprehensive error assessment (uncertainty quantification).
The regulatory landscape for medical products has evolved to recognize the value of computational modeling, with agencies like the FDA providing structured pathways for model submission and evaluation through programs such as the Q-Submission Program [8]. This program offers mechanisms for sponsors to obtain FDA feedback on computational models included in Investigational Device Exemption (IDE) applications, Premarket Approval (PMA) applications, and other regulatory submissions [8]. The growing acceptance of modeling and simulation in regulatory decision-making underscores why VVUQ has become non-negotiable—it provides the essential evidence base demonstrating that computer models yield results with sufficient accuracy for their intended use in pharmaceutical development and regulatory evaluation.
The VVUQ framework comprises three interconnected processes that together establish confidence in computational model predictions:
Verification: The process of determining that a model implementation accurately represents the conceptual description and solution to the mathematical model. In essence, verification addresses "solving the equations right" by ensuring that the computational algorithms correctly implement the intended mathematical model and that numerical solutions are obtained with sufficient accuracy [7] [9].
Validation: The process of assessing how well the computational model represents the underlying physical reality by comparing computational predictions with experimental data. Validation addresses "solving the right equations" by evaluating the modeling error through systematic comparison with gold-standard experimental measurements [7].
Uncertainty Quantification: The process of characterizing and assessing uncertainties in model inputs, parameters, and predictions, typically through statistical methods that propagate known sources of variability and error through the computational model to determine their impact on results [10].
Understanding error and accuracy is fundamental to VVUQ implementation. Error represents the difference between a simulated or experimental value and the true value, while accuracy describes the closeness of agreement between a simulation/experimental value and its true value [7]. Errors in computational modeling can be categorized as:
Numerical errors: Result from computational solution techniques and include discretization error, incomplete grid convergence, and computer round-off errors [7].
Modeling errors: Arise from assumptions and approximations in the mathematical representation of the physical problem, including geometry simplifications, boundary condition idealizations, material property estimations, and governing equation approximations [7].
It is crucial to distinguish between error (a known or potential deficiency) and uncertainty (a potential deficiency that may arise from lack of knowledge or inherent variability) [7]. The required level of accuracy for a particular model depends on its intended use in the drug development or regulatory process [7].
Verification ensures the computational model correctly solves the mathematical equations governing the physical system. The following table outlines key verification methodologies:
Table 1: Model Verification Methods and Protocols
| Method Category | Specific Techniques | Application Context | Acceptance Criteria |
|---|---|---|---|
| Code Verification | Method of Manufactured Solutions, Comparison with Analytical Solutions | Software development phase | Relative error < 1-5% for key outputs |
| Solution Verification | Grid Convergence Index (GCI), Richardson Extrapolation | Discrete model solutions | GCI < 3-5% for quantities of interest |
| Numerical Error Assessment | Residual evaluation, Iterative convergence monitoring | All simulation types | Residual reduction > 3-5 orders of magnitude |
Implementation of these verification protocols follows a structured approach. For code verification, the Method of Manufactured Solutions (MMS) involves assuming a solution function, substituting it into the governing equations to compute analytic source terms, solving the equations numerically with these source terms, and comparing the numerical solution with the assumed analytic solution [7]. For solution verification, grid convergence studies require performing simulations on three or more systematically refined meshes, calculating the apparent order of convergence, and applying the Grid Convergence Index to estimate discretization error [7].
Validation assesses how accurately the computational model represents physical reality through comparison with experimental data. The validation process requires carefully designed experimental protocols that mirror computational model scenarios:
Table 2: Model Validation Experimental Protocols
| Protocol Component | Description | Implementation Example |
|---|---|---|
| Validation Experiments | Specifically designed tests for model comparison | Bi-axial tissue testing with digital image correlation |
| Comparison Metrics | Quantitative measures for computational-experimental agreement | Strain field comparison, statistical confidence intervals |
| Accuracy Assessment | Evaluation of difference between prediction and measurement | ≤15% error for key output metrics (e.g., peak stress) |
A comprehensive validation protocol begins with identifying quantities of interest that are both clinically relevant and experimentally measurable. Validation experiments should be designed to provide detailed boundary conditions and material property data, not just outcome measurements [7]. For example, in validating a coronary stent model, validation experiments would measure not only arterial strain but also precise pressure boundary conditions and stent deployment parameters. Comparison between computational results and experimental data should use both global metrics (e.g., overall deformation, natural frequency) and local metrics (e.g., strain distributions, stress concentrations) with appropriate statistical confidence intervals [7].
Uncertainty Quantification (UQ) systematically accounts for variability and limited knowledge in computational models:
Parameter Uncertainty: Characterized through probability distributions for uncertain input parameters (e.g., material properties, boundary conditions) often derived from experimental measurements [7].
Sensitivity Analysis: Determines how variations in model inputs affect outputs, typically using methods like Monte Carlo simulation, Latin Hypercube sampling, or polynomial chaos expansions [10].
Model Form Uncertainty: Assesses errors introduced by mathematical simplifications of physical processes, often evaluated through comparison of multiple model forms against experimental data [7].
Uncertainty quantification follows a structured process of identifying uncertain parameters, characterizing their variability (through literature review or targeted experiments), propagating uncertainties through the computational model using sampling techniques, and analyzing the resulting uncertainties in model predictions [10]. For regulatory submissions, uncertainty quantification should demonstrate that model predictions remain within acceptable bounds despite known sources of variability.
VVUQ Workflow in Computational Modeling
Successful implementation of VVUQ requires specific computational and experimental resources. The following table details essential components of the VVUQ toolkit for drug development applications:
Table 3: Research Reagent Solutions for VVUQ Implementation
| Tool Category | Specific Tools/Reagents | Function in VVUQ Process |
|---|---|---|
| Computational Modeling Platforms | Finite Element Software (e.g., FEBio, Abaqus), CFD Solvers | Implementation and solution of mathematical models |
| Verification Tools | Code Verification Test Suites, Mesh Generation Tools | Assessment of numerical solution accuracy |
| Validation Experimental Systems | Bioreactors, Mechanical Testers, Imaging Systems | Generation of gold-standard experimental data |
| Biological Reagents | Engineered Tissues, Cell Cultures, Biomarkers | Experimental models for biological validation |
| Uncertainty Quantification Libraries | UQ Toolkits (e.g., DAKOTA, SciPy.stats), Sensitivity Analysis Packages | Statistical analysis of parameter variability |
| Data Management Systems | Laboratory Information Management Systems (LIMS), Electronic Lab Notebooks | Tracking experimental metadata and provenance |
Each tool category plays a distinct role in the VVUQ process. Computational modeling platforms provide the environment for implementing mathematical models of biological systems, while verification tools help establish that these implementations are error-free [7]. Validation experimental systems and biological reagents enable generation of high-quality experimental data for model validation, with particular attention to simulating in vivo conditions [7]. Uncertainty quantification libraries facilitate statistical analysis of parameter variability and its impact on model predictions [10]. Finally, data management systems ensure proper documentation and traceability throughout the VVUQ process, which is critical for regulatory submissions [11].
The regulatory environment for computational modeling in drug development has evolved significantly, with explicit recognition of modeling and simulation in regulatory decision-making. The FDA's Q-Submission Program provides formal mechanisms for sponsors to obtain feedback on computational models included in regulatory submissions [8]. This program encompasses:
Pre-Submission (Pre-Sub) Meetings: Allow sponsors to discuss proposed VVUQ approaches for models used in support of regulatory applications.
Informal Meetings: Provide opportunities for early feedback on computational modeling strategies.
Submission Issue Requests: Mechanisms for addressing specific questions during the review process [8].
The FDA differentiates between filing issues (deficiencies that render an application unreviewable) and review issues (complex judgments requiring in-depth assessment) [12]. Inadequate VVUQ documentation can lead to filing issues, resulting in refusal-to-file letters that stop the review process before substantive evaluation [12]. This distinction underscores why comprehensive VVUQ is essential—it addresses potential filing issues related to model credibility.
Successful regulatory submissions incorporating computational models must include detailed VVUQ documentation:
Model Description: Complete specification of governing equations, constitutive laws, boundary conditions, and initial conditions with scientific rationale for all modeling assumptions [7].
Verification Evidence: Documentation of code verification, solution verification, and numerical error estimation with acceptance criteria and results [7].
Validation Evidence: Comprehensive comparison with experimental data, including description of validation experiments, comparison metrics, quantitative assessment of agreement, and discussion of discrepancies [7].
Uncertainty Quantification: Characterization of parameter uncertainties, sensitivity analysis results, and assessment of how uncertainties impact model predictions relevant to the regulatory decision [10].
Context of Use Statement: Clear specification of the intended use of the model and the domain over which it has been validated, including any limitations on extrapolation beyond directly validated conditions [7] [8].
Documentation should enable regulatory reviewers to independently assess model credibility for the proposed context of use. This requires transparent reporting of all VVUQ activities, including both confirmatory results and identified limitations [8].
Verification, Validation, and Uncertainty Quantification have become non-negotiable components of modern drug development and regulatory submissions due to their critical role in establishing model credibility. The framework provides a systematic approach to demonstrate that computational models are implemented correctly (verification), represent physical reality adequately (validation), and have quantified error bounds appropriate for regulatory decision-making (uncertainty quantification). As computational models assume increasingly important roles in drug development—from predicting pharmacokinetics to optimizing clinical trial design—rigorous VVUQ practices provide the essential foundation for regulatory acceptance. Implementation of comprehensive VVUQ protocols, coupled with early engagement with regulatory agencies through programs like the Q-Submission Program, represents a strategic imperative for modern drug development organizations seeking to leverage computational modeling while maintaining regulatory compliance.
In computational medicine, the journey of a model from a research concept to a tool that informs critical decisions in drug development or patient care is complex. For researchers and drug development professionals, navigating this path requires a deep understanding of three pivotal concepts: Context of Use (COU), Model Credibility, and Fit-for-Purpose (FFP). These interconnected terms form the foundation for establishing trust in computational models and simulations, ensuring they are appropriately developed, evaluated, and applied. Framed within a broader thesis on verification and validation (V&V) in computational modeling research, this guide provides an in-depth technical exploration of these core tenets. V&V processes are the empirical and analytical engines that generate the evidence needed to demonstrate that a model is both technically sound (verification) and scientifically relevant (validation) for a specific purpose, thereby establishing its credibility within a defined COU [13].
Context of Use (COU) is a precise statement that defines the specific role, scope, and application of a computational model in addressing a particular question. It describes what will be modeled, how the model outputs will be used, and any additional information used alongside the model to answer the question of interest [14] [15]. The COU is the cornerstone for all subsequent credibility assessments.
Model Credibility refers to the trust in the predictive capability of a computational model for a specific COU [14] [16]. This trust is not absolute but is established through the collection of evidence, the rigor of which is determined by the model's risk.
Fit-for-Purpose (FFP) is a strategic principle dictating that the development, evaluation, and application of a model must be closely aligned with the specific Question of Interest (QOI) and COU [17]. An FFP approach ensures that the model's complexity, the quality of input data, and the thoroughness of V&V activities are proportionate to the model's intended use, avoiding both oversimplification and unjustified complexity [17].
Table: Core Terminology in Computational Modeling
| Term | Definition | Primary Reference |
|---|---|---|
| Context of Use (COU) | A statement defining the specific role and scope of a computational model used to address a question of interest. | [14] |
| Model Credibility | Trust, established through evidence, in the predictive capability of a computational model for a specific context of use. | [14] |
| Fit-for-Purpose (FFP) | A principle ensuring model development and evaluation are aligned with the question of interest and context of use. | [17] |
| Verification | The process of determining that a computational model accurately represents the underlying mathematical model and its solution. | [13] |
| Validation | The process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended uses. | [14] |
| Question of Interest (QOI) | The specific question, decision, or concern that is being addressed by the computational model and other evidence. | [14] |
Verification and Validation are the fundamental processes that provide the evidence required to establish model credibility.
Verification answers the question, "Are we building the model right?" It ensures that the computational model is implemented correctly in software and that numerical solutions are accurate [13]. This involves:
Validation answers the question, "Are we building the right model?" It determines how accurately the model represents reality [14]. This is achieved by comparing model predictions with independent experimental or clinical data (comparator data) not used in model development [14] [16].
The following diagram illustrates the workflow for establishing model credibility, from defining the need to assessing the resulting evidence, highlighting the roles of V&V.
The required level of model credibility is not one-size-fits-all; it is determined through a risk-informed assessment. This risk is a function of two factors [13] [15]:
Table: Model Risk Assessment Matrix (Adapted from FDA Guidance and ASME VV-40)
| Low Decision Consequence | Medium Decision Consequence | High Decision Consequence | |
|---|---|---|---|
| High Model Influence | Medium Risk | High Risk | High Risk |
| Medium Model Influence | Low Risk | Medium Risk | High Risk |
| Low Model Influence | Low Risk | Low Risk | Medium Risk |
This risk assessment directly drives the rigor and extent of V&V activities needed. A high-risk model, such as one used as the primary evidence to waive a clinical trial for a life-saving drug, demands far more extensive credibility evidence than a low-risk model used for internal research and development decisions [14] [13].
The evidence generated to establish credibility can be categorized. The U.S. Food and Drug Administration (FDA) guidance outlines eight categories of credibility evidence, which provide a practical framework for planning and reporting [15].
Table: Categories of Credibility Evidence for Computational Models
| Category | Type of Evidence | Description and Methodological Purpose |
|---|---|---|
| 1 | Code Verification | Demonstrates the software implementation accurately reflects the underlying mathematical model. Method: Unit and integration testing, software quality assurance. |
| 2 | Model Calibration | Assessment of the model's fit against the data used to estimate its parameters. Note: This alone is insufficient for validation. |
| 3 | Bench Test Validation | Compares model predictions with data from controlled in vitro or bench-top experiments. |
| 4 | In Vivo Validation | Compares model predictions with data from animal studies (in vivo). |
| 5 | Population-based Validation | Compares population-level predictions (e.g., average response) with a clinical dataset, without individual-level comparisons. |
| 6 | Emergent Model Behaviour | Evidence that the model reproduces known real-world phenomena that were not explicitly built into it. |
| 7 | Model Plausibility | Rationale supporting the choice of governing equations, assumptions, and input parameters based on established scientific principles. |
| 8 | Calculation Verification & UQ | Quantifies numerical solution accuracy and uncertainty for the specific simulations run to address the COU. |
To illustrate the application of these concepts, consider a hypothetical experiment for a drug development project.
Experimental and Credibility Protocol:
Table: Essential Components for Credibility Assessment
| Item / Reagent | Function in the Credibility Process |
|---|---|
| ASME VV-40:2018 Standard | Provides the foundational risk-informed framework for planning and assessing credibility activities [14] [13]. |
| FDA Guidance on CM&S | Offers a regulatory perspective and a nine-step process for assessing credibility in medical device submissions, extending ASME VV-40 concepts [16] [15]. |
| Software Verification Suite | A collection of unit and integration tests used for code verification to ensure the software is free of procedural errors [13]. |
| Comparator Datasets | High-quality experimental or clinical data (in vitro, in vivo, clinical) used as a benchmark for model validation [14]. |
| Uncertainty Quantification (UQ) Tools | Software and methodologies (e.g., sensitivity analysis, Monte Carlo simulation) to quantify numerical, parameter, and model form uncertainties [13]. |
The adoption of computational modeling in biomedical research and development hinges on a rigorous and systematic approach to building trust. The triad of Context of Use, Model Credibility, and the Fit-for-Purpose principle provides the conceptual framework for this endeavor. This framework is operationalized through the disciplined processes of Verification and Validation, the rigor of which is strategically calibrated by a risk-informed assessment. For researchers and drug development professionals, mastering these concepts is not merely an academic exercise. It is a critical competency that enables the development of credible, impactful models, facilitates clearer communication with regulators, and ultimately accelerates the delivery of safe and effective therapies to patients.
Verification, Validation, and Uncertainty Quantification (VVUQ) constitute a rigorous framework for establishing the credibility of computational models used in scientific research and industrial applications. Within Model-Informed Drug Development (MIDD), these processes provide the foundational evidence that models are trustworthy for informing critical decisions about drug safety, efficacy, and dosing. Verification addresses the question "Are we building the model right?" by ensuring the computational implementation accurately represents the underlying mathematical model [18]. Validation addresses "Are we building the right model?" by determining how well the computational model represents the real-world biological system [18]. Uncertainty Quantification characterizes uncertainties in model inputs, parameters, and structure, and evaluates how these propagate to affect model outputs and predictions [18]. The adoption of robust VVUQ practices delivers tremendous efficiencies and risk mitigation in drug development, though it remains underutilized across many communities despite its potential [19].
Table: Core Components of VVUQ in MIDD
| Component | Definition | Key Question Answered | Primary Focus |
|---|---|---|---|
| Verification | Process of determining if a computational model is an accurate implementation of the underlying mathematical model [20] | "Are we building the model right?" [9] | Code and calculation accuracy [20] |
| Validation | Process of determining the extent to which a computational model is an accurate representation of the real-world system [20] | "Are we building the right model?" [9] | Agreement with experimental data [18] |
| Uncertainty Quantification | Process of characterizing uncertainties in model inputs and computing resultant uncertainty in model outputs [20] | "How reliable are the predictions?" | Predictive reliability and confidence [18] |
Verification encompasses two primary activities: code verification and calculation verification. Code verification tests for potential software bugs and implementation errors through methods such as unit testing, regression testing, and comparison with analytical solutions [20]. For complex physiological models in MIDD, this may involve verifying that ordinary differential equation solvers for pharmacokinetic models accurately compute drug concentration time courses. Calculation verification estimates numerical errors arising from spatial or temporal discretization [20]. In a whole-organ model simulation, this would involve demonstrating that numerical errors from mesh discretization are below an acceptable threshold for the context of use.
A robust verification protocol includes:
Validation establishes the model's accuracy in representing reality by comparing model predictions to experimental data not used in model development. The ASME V&V40 Standard provides a risk-informed framework for validation activities, where the extent of validation required depends on the model's context of use (COU) and the decision's risk level [20]. For patient-specific models (PSMs) used in MIDD, special considerations include inter- and intra-user variability, multi-patient error estimation, and clinical validation across diverse populations [20].
Key validation factors include:
UQ systematically analyzes how uncertainties from multiple sources affect model predictions. In healthcare applications, sources of uncertainty include intrinsic variability (e.g., time-dependent changes in patient physiology), extrinsic variability (e.g., patient-specific genetics and lifestyle), measurement error, and model discrepancy [18]. UQ methodologies include:
Table: Sources of Uncertainty in MIDD Models
| Uncertainty Category | Specific Sources | Impact on MIDD Applications |
|---|---|---|
| Data-Related (Aleatoric) | Intrinsic variability (e.g., circadian blood pressure changes) [18] | Affects predictability of drug exposure and response time courses |
| Extrinsic variability (e.g., patient genetics, physiology) [18] | Impacts virtual population generation and subgroup analysis | |
| Measurement error (e.g., analytical assay precision) [18] | Affects parameter estimation from preclinical and clinical data | |
| Model-Related (Epistemic) | Model discrepancy (e.g., omitted genetics or disease interactions) [18] | Limits model applicability to specific patient subgroups or conditions |
| Structural uncertainty (e.g., model assumptions) [18] | Affects extrapolation beyond studied conditions | |
| Initial/boundary conditions [18] | Impacts simulation of specific physiological or clinical scenarios | |
| Coupling-Related | Geometry uncertainty (e.g., organ segmentation from medical images) [18] | Affects patient-specific dosimetry or biomechanical simulations |
| Scale transition uncertainty (e.g., cellular to tissue level) [18] | Impacts multiscale models linking cellular pharmacology to organ-level effects |
The implementation of VVUQ follows a structured workflow that integrates these components throughout the model development and application lifecycle. The ASME V&V40 Standard provides a framework for credibility assessment that begins with defining the question of interest and context of use, followed by a risk assessment to determine the necessary level of VVUQ activities [20].
VVUQ Workflow in MIDD
For patient-specific models, which form the basis of digital twins in healthcare, additional considerations include managing inter- and intra-user variability, implementing multi-patient and "every-patient" error estimation, and addressing uncertainty in personalized versus non-personalized inputs [20]. The maturity of VVUQ practices in cardiac electrophysiological modeling provides an exemplar for other MIDD applications, demonstrating how complex multiscale models can be evaluated for credibility [20].
Successful implementation of VVUQ requires specific computational tools and methodologies tailored to MIDD applications. This toolkit enables researchers to execute the rigorous evaluation processes necessary for model credibility.
Table: Essential VVUQ Research Reagents and Computational Tools
| Tool/Resource | Function in VVUQ | Application in MIDD |
|---|---|---|
| Software Quality Assurance (SQA) Framework | Ensures code reliability through version control, systematic testing, and documentation [20] | Maintains integrity of model codebase throughout drug development lifecycle |
| Numerical Code Verification Tools | Tests computational implementation against analytical solutions [20] | Verifies differential equation solvers in PK/PD and systems pharmacology models |
| Mesh Generation/Refinement Tools | Supports calculation verification for spatial discretization [20] | Enables geometry-based simulations (e.g., organ-level distribution, tissue penetration) |
| Sensitivity Analysis Algorithms | Identifies parameters contributing most to output uncertainty [18] | Prioritizes experimental efforts for parameter refinement in quantitative systems pharmacology |
| Uncertainty Propagation Methods | Quantifies how input uncertainties affect model outputs [18] | Establishes confidence intervals for model-informed dose selection and trial designs |
| Model Validation Databases | Provides experimental data for comparison with model predictions [20] | Enables validation of drug exposure-response relationships across populations |
| Credibility Assessment Framework | Guides evaluation of model fitness for purpose [20] | Supports regulatory submissions through standardized credibility evidence |
Cardiac electrophysiological (EP) modeling represents a mature application of VVUQ with direct relevance to drug safety assessment. These models simulate electrical activity from cellular to organ levels, requiring integration of disparate data sources and solving complex multiscale models [20]. In MIDD, cardiac EP models help assess drug-induced arrhythmia risk (e.g., Torsades de Pointes). The VVUQ process for these models includes:
The ASME V&V40 Standard provides a risk-informed framework for establishing model credibility that is increasingly recognized in regulatory submissions [20]. This approach links the required level of VVUQ evidence to the impact of the decision the model supports. For MIDD applications, this means:
Risk-Based Credibility Assessment
VVUQ provides the methodological foundation for establishing credibility of computational models in MIDD, enabling more reliable prediction of drug behavior in humans. As modeling sophistication increases with the development of patient-specific models and digital twins, robust VVUQ practices become increasingly critical [20] [21]. Future advancements will likely include standardized VVUQ protocols for specific MIDD applications, increased automation of VVUQ processes, and broader adoption of uncertainty quantification to characterize both variability and ignorance in drug development predictions [18]. By systematically implementing VVUQ, drug developers can enhance model reliability, regulatory acceptance, and ultimately, the quality of decisions that bring safe and effective medicines to patients.
In computational modeling research, the processes of verification and validation (V&V) are fundamental to establishing model credibility. Verification addresses the question "Are we solving the equations correctly?" by ensuring the computational model accurately represents the underlying mathematical description. Validation asks "Are we solving the correct equations?" by determining whether the model accurately represents real-world phenomena [1]. Within this V&V framework, Uncertainty Quantification (UQ) plays a critical role in assessing how variations in numerical and physical parameters affect simulation outcomes, forming the complete paradigm of Verification, Validation, and Uncertainty Quantification (VVUQ) [1].
A crucial aspect of UQ involves distinguishing between two fundamental types of uncertainty: aleatory and epistemic. Properly identifying and classifying these sources of error is essential for guiding model improvement, directing resource allocation, and honestly communicating the reliability of computational predictions to stakeholders in fields like drug development where decisions carry significant consequences [22] [23].
Aleatory uncertainty (also known as stochastic, irreducible, or variability uncertainty) stems from the inherent randomness and natural variability of physical systems or observation processes [22] [23]. This type of uncertainty is characterized by its irreducible nature; collecting additional data may better characterize the variability but cannot eliminate it [23].
Examples in computational modeling include:
Epistemic uncertainty (also known as systematic, reducible, or model uncertainty) arises from incomplete knowledge or information about the system being modeled [22] [23] [24]. This uncertainty is fundamentally reducible in principle through improved models, additional data collection, or enhanced measurements [23] [25].
Examples in computational modeling include:
Table 1: Fundamental Characteristics of Aleatory and Epistemic Uncertainty
| Characteristic | Aleatory Uncertainty | Epistemic Uncertainty |
|---|---|---|
| Origin | Inherent system variability | Incomplete knowledge |
| Reducibility | Irreducible | Reducible |
| Representation | Probability theory | Probability + alternative theories |
| Data Impact | More data characterizes but doesn't reduce | More data potentially reduces |
| Common Descriptors | Randomness, variability, stochasticity | Ignorance, approximation, simplification |
Table 2: Manifestations in Computational Modeling Contexts
| Modeling Context | Aleatory Manifestations | Epistemic Manifestations |
|---|---|---|
| Pharmacokinetics | Inter-patient variability in drug metabolism | Uncertainty in metabolic pathway parameters |
| Clinical Outcomes | Stochastic disease progression | Model simplification of biological processes |
| Material Science | Natural imperfections in crystalline structures | Uncertainty in constitutive model selection |
| Fluid Dynamics | Turbulent fluctuations | Uncertainty in boundary condition specification |
The following diagram illustrates the systematic process for identifying and classifying uncertainty sources within a computational modeling framework:
Purpose: Estimate epistemic uncertainty in deep learning models used for computational modeling [22] [23].
Detailed Methodology:
Interpretation: Higher variance across forward passes indicates greater epistemic uncertainty, suggesting the model encounters regions of input space where it has limited knowledge [23].
Purpose: Quantify total predictive uncertainty by combining multiple models [22] [23].
Detailed Methodology:
Interpretation: Disagreement between models (high inter-model variance) indicates epistemic uncertainty, while consistent disagreement with ground truth across all models suggests aleatoric uncertainty [22].
Purpose: Implement principled uncertainty quantification through probabilistic modeling [22].
Detailed Methodology:
Interpretation: The spread of the predictive distribution naturally incorporates both types of uncertainty, with the ability to decompose them through analytical methods [22].
The following diagram illustrates the complete workflow for quantifying and decomposing uncertainty in computational models:
Table 3: Essential Computational Tools for Uncertainty Analysis
| Tool/Reagent | Function | Application Context |
|---|---|---|
| Monte Carlo Simulation Engine | Propagates input uncertainties through model | General computational models with parameter uncertainty |
| Markov Chain Monte Carlo (MCMC) | Samples from complex posterior distributions | Bayesian parameter estimation and model calibration |
| Gaussian Process Regression | Creates surrogate models with built-in uncertainty | Emulation of computationally expensive simulations |
| Bayesian Neural Network Framework | Implements probabilistic deep learning | High-dimensional models with complex uncertainty structures |
| Conformal Prediction Library | Provides distribution-free prediction intervals | Model-agnostic uncertainty quantification with coverage guarantees |
| Ensemble Modeling Toolkit | Manages multiple model training and prediction | Committee-based uncertainty estimation |
| Sobol Sequence Generator | Implements quasi-random sampling | Efficient exploration of high-dimensional parameter spaces |
| Statistical Emulator | Creates fast approximations of complex models | Uncertainty propagation in computationally intensive models |
Table 4: Strategic Approaches for Managing Different Uncertainty Types
| Uncertainty Type | Characterization Methods | Reduction Strategies | Acceptance Measures |
|---|---|---|---|
| Aleatory | Statistical analysis of residuals, Variogram analysis, Noise decomposition | Improved measurement precision, Stratification of populations, Covariate inclusion | Uncertainty propagation, Robust design, Safety margins |
| Epistemic | Sensitivity analysis, Model discrepancy assessment, Bayesian model averaging | Additional targeted experiments, Model structure improvement, Domain expansion | Model averaging, Multiple model comparison, Conservative bounding |
In practical applications, aleatory and epistemic uncertainties are often intertwined, requiring sophisticated decomposition approaches [25]. For computational models in drug development, this distinction becomes particularly important when:
Recent research demonstrates that linear probes trained on internal activations of large models can effectively distinguish epistemic from aleatoric uncertainty, even when evaluated on unseen data domains [25]. This suggests that neural representations may natively encode information about the nature of uncertainty, providing promising avenues for automated uncertainty classification.
Beyond parameter uncertainty, model uncertainty represents a significant epistemic challenge in computational modeling. As classified in recent literature, this encompasses [24]:
Addressing these uncertainties requires techniques such as Bayesian Model Averaging (BMA) and Frequentist Model Averaging (FMA), which propagate model selection uncertainty into predictive uncertainty, providing more honest assessments of predictive reliability [24].
The rigorous distinction between aleatory and epistemic uncertainty provides a critical foundation for credible computational modeling in scientific research and drug development. By implementing the methodologies and frameworks presented in this guide, researchers can not only quantify these uncertainty sources but also develop targeted strategies for uncertainty reduction where possible and appropriate uncertainty communication where inevitable. This systematic approach to uncertainty classification strengthens the verification and validation process, ultimately supporting more reliable scientific inferences and safer, more effective therapeutic developments.
Verification and Validation (V&V) are fundamental pillars of credible computational modeling research. Within a broader VVUQ (Verification, Validation, and Uncertainty Quantification) framework, they serve distinct but complementary purposes. Verification is the process of determining that a computational model accurately represents the underlying mathematical model and its solution ("solving the equations right"). Validation is the process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended uses of the model ("solving the right equations") [26]. This guide focuses exclusively on the first pillar: code and solution verification, providing researchers and drug development professionals with the methodologies to ensure their software solves equations correctly.
Verification is subdivided into two critical activities:
The following workflow outlines the typical stages of a verification process, from the foundational mathematical model to a final, credible solution.
The objective of code verification is to find and eliminate errors in the source code (e.g., mistakes in logic, syntax, or algorithm implementation). A key methodology is the Method of Manufactured Solutions (MMS).
The MMS is a rigorous technique for code verification that does not rely on pre-existing analytical solutions to the governing equations [26].
Detailed Methodology:
The logical sequence of MMS, from creating a known solution to verifying the code's performance against it, is outlined below.
Once the code is verified, solution verification is performed for each simulation to quantify numerical errors, primarily discretization error.
Discretization error arises from approximating continuous PDEs with algebraic equations on a discrete grid. The following table summarizes common methods for estimating this error [26].
Table 1: Methods for Estimating Discretization Error in Solution Verification
| Method | Brief Description | Key Function | Pros & Cons |
|---|---|---|---|
| Richardson Extrapolation | Uses solutions on two or more systematically refined grids to estimate the exact solution and error on the finest grid. | Provides an error estimate and an improved solution estimate. | Pro: High-fidelity estimate. Con: Requires multiple, systematic grid refinements; can be costly. |
| Grid Convergence Index (GCI) | A standardized methodology based on Richardson Extrapolation that provides a consistent error band. | Reports a conservative confidence interval for the discretization error. | Pro: Allows for cross-study comparison; built-in safety factor. Con: Same cost as Richardson Extrapolation. |
| Residual Methods | Uses the local truncation error (the residual when the numerical solution is inserted into the PDE) as an error estimator. | Guides adaptive mesh refinement (AMR). | Pro: Can be computed from a single solution. Con: May not be as accurate as Richardson-based methods. |
A grid convergence study is the primary experimental protocol for solution verification.
Detailed Methodology:
The following table details key "research reagents" – the software tools and standards – essential for performing rigorous code and solution verification [28] [29] [30].
Table 2: Key Research Reagent Solutions for Verification
| Tool / Standard Category | Specific Tool / Standard | Function in Verification |
|---|---|---|
| Verification Standards | ASME V&V 10-2019, ASME VVUQ 1-2022, NASA STD 7009 | Provide standardized procedures, terminology, and methodologies for performing and reporting verification activities. Essential for ensuring consistency and credibility, especially in regulated industries [26]. |
| Multiphysics Simulation Platforms | ANSYS, COMSOL Multiphysics | Provide built-in tools for mesh refinement studies and some error estimation. Their solvers undergo rigorous code verification, allowing users to focus on solution verification and application [30]. |
| Mathematical & Scripting Environments | MATLAB & Simulink, Python (NumPy/SciPy) | Enable the implementation and automation of verification protocols, such as running MMS or processing results from grid convergence studies. Offer high flexibility for custom analysis [30]. |
| Open-Source Simulation Environments | OpenModelica | An open-source tool for equation-based modeling. Its transparency allows for direct inspection and verification of implementation, making it valuable for academic research and method development [30]. |
| Advanced Formal Methods Tools | VMCAI-related Research Tools (e.g., for Model Checking, Abstract Interpretation) | While more common in computer science, these tools (often presented at conferences like VMCAI) provide formal, mathematical proof of certain software properties, representing the highest level of code verification for critical systems [29]. |
Code and solution verification are non-negotiable steps in establishing the credibility of computational simulations. Code verification, through methods like the Method of Manufactured Solutions, ensures that the software implementation is free of errors. Solution verification, primarily via grid convergence studies, quantifies the numerical errors in a specific computation. By systematically applying the protocols and tools detailed in this guide, researchers and scientists in drug development and other fields can provide evidence that their software is indeed "solving the equations correctly," thereby creating a solid foundation for subsequent model validation and decision-making.
Verification and Validation (V&V) are fundamental processes for establishing the credibility and reliability of computational models used in scientific research and drug development. Despite their complementary nature, they address distinct questions: Verification is a mathematical exercise that answers "Are we solving the equations correctly?" by checking for programming errors and numerical accuracy, while Validation is a scientific exercise that answers "Are we solving the correct equations?" by assessing how well the computational model represents physical reality [31] [7]. This distinction is crucial for researchers developing in silico models, as a model can be perfectly verified yet remain invalid for its intended purpose if it does not accurately capture the underlying biology [7].
The role of V&V has expanded significantly as high-quality computational modeling becomes available in more application areas [32]. In drug development, where modeling and simulation has progressed from a scientific nicety to a regulatory necessity, rigorous V&V provides the evidence base that allows researchers, clinicians, and regulators to trust model predictions [33]. The U.S. Food and Drug Administration's Project Optimus initiative and the 21st Century Cures Act explicitly recognize the importance of these in silico tools for improving drug development efficiency [34] [33]. This guide provides a comprehensive framework for designing effective validation experiments that bridge bench research and virtual patient applications.
Verification consists of two complementary activities: code verification addresses coding mistakes and checks the consistency of discretization techniques, while solution verification estimates the numerical uncertainty of solutions when exact answers are unknown [31]. Code verification ensures the mathematical model is implemented correctly in software, whereas solution verification quantifies the numerical error in computed results [32].
Validation quantifies modeling errors by comparing computational predictions to experimental data, accounting for uncertainties in both computational and experimental results [31]. Unlike verification, validation is not a binary pass/fail determination but rather a process of assessing the degree to which a model is an accurate representation of the real world from the perspective of its intended uses [7]. The required level of validation rigor should be commensurate with the importance and needs of the application and decision context [32].
Table 1: Key Definitions in Verification and Validation
| Term | Definition | Primary Question |
|---|---|---|
| Verification | Process of determining that a model implementation accurately represents the conceptual description and solution | "Are we solving the equations right?" |
| Validation | Process of comparing computational predictions to experimental data to assess modeling error | "Are we solving the right equations?" |
| Uncertainty Quantification | Process of characterizing and quantifying uncertainties in model inputs and their propagation to outputs | "What is the potential range of errors in our predictions?" |
| Code Verification | Checking for programming errors and consistency of discretization techniques | "Is the code implemented correctly?" |
| Solution Verification | Estimating numerical uncertainty in solutions where exact answers are unknown | "What is the numerical error in this specific solution?" |
Understanding error and uncertainty is essential for effective V&V. Error represents the difference between a simulated or experimental value and the truth, while accuracy describes the closeness of agreement between a simulation/experimental value and its true value [7]. Errors in computational modeling can be categorized as:
Uncertainty represents potential deficiencies that may or may not be present during modeling, arising from either lack of knowledge about the physical system or inherent variation in material properties [7]. The V&V process must account for both errors and uncertainties to establish model credibility.
The first step in virtual patient cohort creation involves constructing a mathematical model with the appropriate level of detail for the intended clinical question. Model design exists on a spectrum from mechanistic models that incorporate detailed biological processes to phenomenological models that capture general system behavior with fewer parameters [34]. For virtual patient applications, intermediate-sized fit-for-purpose models that incorporate mechanistic details for critical system components while using phenomenological representations for less important elements often represent the optimal approach [34].
Model development considerations can be divided into pharmacokinetic (PK) and pharmacodynamic (PD) components. PK models describe drug concentration over time, typically using compartment-based approaches to capture drug movement through the body. PD models predict treatment safety and efficacy, representing the greater challenge due to the complexity of biological systems and limited human data [34]. The choice of model structure must be tailored to the aims of your specific in silico clinical trial, with model selection techniques such as information criteria and testing predictions on held-out data helping identify the most parsimonious model that describes available experimental data [34].
After model construction, parameterization involves determining which parameters will be fixed at literature values and which will vary across virtual patients to represent population heterogeneity [34]. For a model with (r = m + n) parameters, (m) parameters will be fixed (vector (\mathbf{q} = [q1, \ldots, qm])) while (n) parameters will vary per virtual patient (vector (\mathbf{p} = [p1, \ldots, pn])), with the (i^{th}) virtual patient represented by variable parameters (\mathbf{p}_i) [34].
Sensitivity analysis and identifiability analysis play crucial roles in selecting virtual patient characteristics. Sensitivity analysis quantifies how changes in model inputs affect outputs, while identifiability analysis determines what can be inferred about model parameters given available data [34]. These analyses help researchers understand which parameters significantly influence model predictions and whether these parameters can be reliably estimated from available data, preventing situations where models validated against population averages fail when applied to individual patients due to identifiability issues [34].
Effective validation follows a hierarchical approach, beginning with validation of individual model components and progressing to whole-system validation [32]. This hierarchical verification and validation process improves efficiency by identifying errors at the simplest level before progressing to more complex integrations [32]. The validation process should encompass multiple levels:
This hierarchical approach is particularly valuable for complex biological models, where validating the entire system at once makes error identification and correction difficult. When leveraging previous V&V results, exercise caution as these results are specific to particular quantities of interest (QOIs) in particular settings, and transferring them to new QOIs and settings can be difficult to justify [32].
A critical concept in validation is the domain of applicability—the region in the problem space where a validation assessment is judged to apply [32]. This domain includes features or descriptors that characterize the problem space, with each validation experiment mapping to a point in this multi-dimensional space. The predictive application for a new problem can be assessed based on its position relative to previous validation points [32].
However, defining the domain of applicability presents challenges. Omitting important features may make dissimilar problems appear similar, while including all potentially relevant features creates a high-dimensional space where any new prediction appears extrapolative [32]. In practice, subject-matter expertise must inform judgments about the relevance of previous validation studies to new predictive applications [32].
Table 2: Validation Experiment Design Considerations
| Design Aspect | Considerations | Best Practices |
|---|---|---|
| Experimental Data | - Accuracy and precision of measurements- Representative conditions- Uncertainty quantification | - Use data with documented uncertainty estimates- Ensure experimental conditions cover intended use domain- Include appropriate controls |
| Comparison Metrics | - Quantitative vs. qualitative assessment- Selection of validation metrics- Statistical significance testing | - Define quantitative validation metrics before experiments- Establish acceptance criteria for model accuracy- Use statistical tests to assess agreement |
| Uncertainty Propagation | - Input parameter uncertainty- Experimental measurement error- Model form uncertainty | - Propagate input uncertainties through model- Account for experimental error in comparisons- Distinguish between different uncertainty sources |
Table 3: Essential Research Reagents and Computational Tools
| Category | Specific Tools/Reagents | Function in Validation |
|---|---|---|
| Computational Tools | - PARNASSOS CFD Code [31]- Monte Carlo N-Particle Transport Code [32]- Whole-body PK/PD simulators [33] | - Implement mathematical models- Perform uncertainty quantification- Simulate virtual patient populations |
| Experimental Systems | - In vitro tissue/organ models- Animal disease models- Clinical biomarker assays | - Generate validation data for model components- Provide whole-system response data- Establish human-relevant parameters |
| Analytical Frameworks | - Method of Manufactured Solutions [32]- Goal-oriented a posteriori error estimates [32]- Sensitivity analysis techniques [34] | - Code verification- Solution verification- Parameter importance ranking |
A comprehensive validation protocol for virtual patient models should incorporate these key elements:
Define Quantities of Interest (QOIs): Clearly specify the QOIs for validation, as different QOIs will be affected differently by various error sources [32]. QOIs should be relevant to the decision context and measurable in both computational and experimental settings.
Establish Acceptance Criteria: Define quantitative criteria for acceptable agreement between model predictions and experimental data, based on the intended model use [7]. For example, criteria might require that model predictions fall within experimental uncertainty bounds for a specified percentage of validation points.
Design Validation Experiments: Create experiments that test the model across its intended domain of applicability, with particular attention to potential extrapolation regions [32]. The experimental design should efficiently sample the input parameter space while focusing on regions most relevant to predictive applications.
Execute Hierarchical Validation: Begin with component-level validation and progress to system-level validation, documenting agreement at each level [32]. This approach isolates errors to specific model components and builds confidence in the integrated model.
Quantify Uncertainties: Characterize and propagate uncertainties from both computational and experimental sources, including numerical errors, parameter uncertainties, input uncertainties, and experimental measurement errors [32] [7].
Document Validation Results: Thoroughly document the validation process, including experimental conditions, measurement uncertainties, comparison metrics, and results relative to acceptance criteria [7]. This documentation provides the evidence base for model credibility.
Virtual patient models enable in silico clinical trials that explore patient heterogeneity and its impact on therapeutic outcomes [34]. These virtual trials can address questions about inter-patient variability in treatment response, patient stratification to identify responders versus non-responders, and assessment of potential drug combinations or alternative treatment regimens [34]. This approach serves as a bridge between standard-of-care approaches designed around the "average patient" and fully personalized therapy [34].
Virtual patient technology is particularly valuable for vulnerable populations where clinical trials present practical, ethical, or legal challenges, including pediatric patients, pregnant women, oncology patients, and those with rare diseases [33]. For example, virtual pregnancy models can simulate drug exposure in the mother, fetus, and placenta while accounting for physiological changes during each trimester, enabling dosage adjustments without ethical concerns of testing in actual pregnant women [33].
Regulatory agencies including the FDA, European Medicines Agency, and Japanese Pharmaceuticals and Medical Devices Agency now use modeling and simulation to evaluate new drug submissions [33]. The FDA's Center for Drug Evaluation and Research employs these tools to "predict clinical outcomes, inform clinical trial designs, support evidence of effectiveness, optimize dosing, predict product safety, and evaluate potential adverse event mechanisms" [33].
The future of virtual patient models points toward precision dosing in clinical care, where models incorporating a patient's unique genetic makeup and biomarkers determine optimal drug doses for individual patients [33]. This approach is already being used for complex cases including patients who have undergone bariatric surgery, received transplants, or suffer from psychiatric disorders, complex infections, or rare genetic diseases [33]. Looking further ahead, the field is moving toward personal avatars for each patient, enabling testing of health interventions before actual administration [33].
Designing effective validation experiments for virtual patient models requires rigorous application of verification and validation principles throughout the model development lifecycle. By following a systematic approach that includes hierarchical validation, uncertainty quantification, and careful documentation, researchers can create virtual patient models that reliably predict clinical outcomes. As these computational approaches become increasingly integrated into drug development and clinical care, robust validation practices will ensure that virtual patient technologies fulfill their potential to transform therapeutic development and personalized medicine.
In computational modeling research, Verification and Validation (V&V) form the cornerstone of establishing model credibility. Verification addresses "solving the equations right" by ensuring the mathematical equations are implemented correctly, while validation addresses "solving the right equations" by assessing how accurately the model represents real-world phenomena [7] [1]. Within this V&V framework, Uncertainty Quantification (UQ) has emerged as a critical third component, completing what is known as the VVUQ paradigm (Verification, Validation, and Uncertainty Quantification) [1].
UQ systematically accounts for variability and errors in computational predictions, transforming qualitative estimates into quantifiable confidence measures. This technical guide explores two fundamental approaches to UQ: the long-established Monte Carlo methods and the increasingly influential Bayesian inference techniques. While Monte Carlo methods use random sampling to propagate uncertainties, Bayesian methods provide a probabilistic framework for updating beliefs based on new evidence. Understanding these methods is essential for researchers across scientific domains, from drug development professionals needing to assess compound efficacy to engineers predicting system reliability under uncertain conditions.
The growing importance of UQ is reflected in formal standards developed by organizations like ASME, which now provide comprehensive guidelines for VVUQ implementation across various engineering and scientific disciplines [1]. This guide provides both theoretical foundations and practical methodologies for implementing these crucial uncertainty quantification techniques.
In computational modeling, error represents a known discrepancy between a simulation/experimental value and its true value, while uncertainty represents a potential deficiency due to lack of knowledge [7]. The key distinction is that errors are generally correctable once identified, whereas uncertainties are inherent and must be characterized and propagated through the model.
Uncertainties in computational modeling are broadly categorized into:
The primary goal of UQ is to quantify how these different sources of uncertainty affect model predictions, enabling informed decision-making with understood risk levels [7] [1].
The integrated Verification, Validation, and Uncertainty Quantification framework provides a systematic approach to establishing model credibility:
This framework ensures that models produce results with sufficient accuracy for their intended use while providing clear metrics for assessing reliability.
Table 1: Classification of Errors and Uncertainties in Computational Modeling
| Category | Definition | Examples | Mitigation Approaches |
|---|---|---|---|
| Numerical Errors | Errors from computational techniques | Discretization error, iterative convergence error, round-off error | Mesh refinement, convergence studies [7] |
| Modeling Errors | Assumptions in mathematical representation | Simplified physics, approximate constitutive models, geometry idealization | Model calibration, multi-physics approaches [7] |
| Parameter Uncertainty | Uncertainty in input parameters | Material properties, boundary conditions, initial conditions | Probabilistic modeling, Bayesian inference [35] |
| Experimental Uncertainty | Variability in validation data | Measurement noise, calibration errors, human factors | Repeated testing, statistical analysis [7] |
Monte Carlo (MC) methods represent a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results [36]. The underlying concept uses randomness to solve problems that might be deterministic in principle, making them particularly valuable for uncertainty propagation in complex systems where analytical solutions are intractable.
The name originates from the Monte Carlo Casino in Monaco, inspired by the gambling habits of mathematician Stanisław Ulam's uncle [36]. The method gained prominence during the Manhattan Project for simulating nuclear reactions and has since become fundamental to computational science, engineering, and finance.
The core MC algorithm follows a consistent pattern [36]:
For estimating an unknown expected value μ of a random variable, the basic MC implementation is straightforward [36]:
Where m represents the sample mean approximating μ. The key strength of this approach is that by the law of large numbers, the empirical mean converges to the true expected value as the number of samples increases.
A critical consideration in MC methods is determining the number of samples required for a desired accuracy. For a chosen error tolerance ε and confidence level corresponding to z-score z, the required sample size n can be estimated as [36]:
[ n \geq s^2 z^2 / \epsilon^2 ]
Where s² is the sample variance. When simulation results are bounded between a and b, a more specific formula applies [36]:
[ n \geq 2(b-a)^2 \ln(2/(1-(\delta/100)))/\epsilon^2 ]
For example, with δ=99% confidence, ( n \geq 10.6(b-a)^2/\epsilon^2 ).
Figure 1: Monte Carlo Method Workflow - The iterative process of random sampling and aggregation
Despite its conceptual simplicity, MC method can be computationally expensive, often requiring many samples to achieve acceptable accuracy [36] [35]. This limitation has led to sophisticated parallelization approaches, particularly in cloud computing environments [35].
The MapReduce paradigm has been successfully adapted for MC parallelization [35]:
This approach leverages the embarrassingly parallel nature of MC algorithms, where each realization can be computed independently, making it ideal for distributed computing environments [36] [35]. Cloud computing offers significant advantages for MC simulations due to its theoretically infinite scalability and pay-per-use model [35].
Table 2: Monte Carlo Applications Across Disciplines
| Field | Application | Key Input Uncertainties | Measured Outputs |
|---|---|---|---|
| Finance [37] | Economic policy prediction | GDP growth, inflation rates, market volatility | Risk assessment, policy outcomes |
| Engineering [35] | Structural dynamics | Material properties, loading conditions | Failure probabilities, stress distributions |
| Drug Development | Molecular dynamics [38] | Force field parameters, atomic coordinates | Binding affinities, molecular volumes |
| Computational Physics [36] | Nuclear reactor safety | Cross-sections, decay constants, thermal properties | Failure risk, temperature profiles |
Bayesian methods provide a probabilistic framework for uncertainty quantification that integrates prior knowledge with observed data. Unlike frequentist approaches that treat parameters as fixed, Bayesian methods treat parameters as random variables with probability distributions that represent uncertainty in their values [37].
The foundation of Bayesian inference is Bayes' theorem:
[ P(\theta|D) = \frac{P(D|\theta)P(\theta)}{P(D)} ]
Where:
This theorem provides a mathematical mechanism for updating beliefs about parameters as new data becomes available [37] [38].
MCMC methods combine Monte Carlo sampling with Bayesian inference to generate samples from complex posterior distributions [37]. The most common implementation is the Metropolis-Hastings algorithm:
After sufficient iterations, the Markov chain converges to the target posterior distribution [37].
Bayesian Hierarchical Models (BHMs) extend basic Bayesian approaches by incorporating multiple levels of variability, making them particularly suitable for complex systems [37]. In economic policy prediction, for example, BHMs integrate various macroeconomic indicators while accounting for uncertainty at different model levels [37].
A typical three-level hierarchical structure includes:
This structure allows for more flexible modeling of complex relationships while naturally propagating uncertainty across model levels [37].
Figure 2: Bayesian Inference Workflow - The iterative process of posterior estimation using MCMC
A recent study demonstrated Bayesian inference for parameterizing three-point water models, quantifying uncertainty in force field parameters [38]. The experimental protocol included:
The study revealed inherent limitations of three-point water models, demonstrating how Bayesian UQ can identify model structural deficiencies rather than just parameter uncertainties [38].
Research in economic forecasting implemented a BHM with MCMC to quantify policy prediction uncertainty [37]:
A systematic approach to model validation incorporates Bayesian updates with rejection criteria [39]:
This approach directly links validation decisions to the specific intended use of the model, providing a rigorous framework for assessing model credibility [39].
Table 3: Comparison of Uncertainty Quantification Methods
| Characteristic | Monte Carlo Methods | Bayesian Methods |
|---|---|---|
| Theoretical Basis | Law of large numbers, frequentist statistics | Bayesian probability theory |
| Computational Demand | High (many samples required) | High (MCMC convergence) |
| Prior Information | Not directly incorporated | Explicitly incorporated via prior distributions |
| Output | Point estimates with confidence intervals | Posterior probability distributions |
| Implementation Complexity | Low to moderate | Moderate to high |
| Parallelization Potential | High (embarrassingly parallel) | Moderate (sequential MCMC) |
| Key Strengths | Non-intrusive, easy implementation | Uncertainty in parameters, hierarchical modeling |
Table 4: Essential Computational Tools for Uncertainty Quantification
| Tool/Category | Function | Example Applications |
|---|---|---|
| MCMC Samplers | Generate samples from posterior distributions | Parameter estimation, hierarchical modeling [37] |
| Parallel Computing Frameworks | Distribute computational workload across multiple processors | Cloud-based Monte Carlo simulations [35] |
| Bayesian Hierarchical Modeling Software | Implement multi-level statistical models | Economic policy prediction, biological systems [37] |
| Surrogate Models | Approximate complex systems with computationally efficient models | Reduced-order modeling, sensitivity analysis [39] |
| Convergence Diagnostics | Assess MCMC algorithm convergence | Gelman-Rubin statistic, trace plots, autocorrelation [38] |
| Uncertainty Visualization Tools | Communicate uncertainty in intuitive formats | Probability boxes, predictive intervals, violin plots [39] |
Uncertainty quantification represents an essential component of credible computational modeling, complementing verification and validation in the comprehensive VVUQ framework. Both Monte Carlo and Bayesian methods offer powerful, complementary approaches to quantifying uncertainty—Monte Carlo through extensive random sampling and Bayesian methods through probabilistic inference that incorporates prior knowledge.
The choice between these methodologies depends on specific application requirements: Monte Carlo methods offer straightforward implementation and excellent parallelization characteristics, while Bayesian approaches provide more sophisticated uncertainty characterization, particularly for parameter estimation and hierarchical modeling. As computational resources continue to grow and UQ methodologies mature, integrating these approaches will further enhance our ability to make reliable predictions with quantified uncertainty across scientific domains, from drug development to economic forecasting and beyond.
The ongoing standardization of VVUQ practices through organizations like ASME signals the growing recognition that uncertainty quantification is not merely a technical refinement but a fundamental requirement for responsible computational modeling in research and decision-making contexts [1].
Verification and Validation (V&V) are fundamental processes in computational modeling that ensure the reliability and trustworthiness of model predictions. Verification addresses the question "Are we solving the equations correctly?" by ensuring that the computational model accurately represents the underlying mathematical model and that the equations are solved without significant numerical errors [14] [40]. Validation, in contrast, answers "Are we solving the correct equations?" by determining how accurately the computational model represents the real-world system it is intended to simulate [14] [40]. The American Society of Mechanical Engineers (ASME) V&V 40-2018 standard provides a risk-informed framework for assessing the credibility of computational models, particularly for medical devices and related applications [41]. This standard does not prescribe specific activities but instead offers a flexible framework for determining the rigor of evidence needed to establish model credibility based on the model's intended use and the risk associated with its potential failure [40].
The ASME V&V 40 framework consists of a structured, iterative process for establishing model credibility. The framework's core strength lies in its risk-informed approach, where the level of evidence required is commensurate with the model's decision-making influence and potential consequences of an incorrect prediction [40]. The following diagram illustrates the key steps and their relationships in the credibility assessment process:
Figure 1: ASME V&V 40 Credibility Assessment Process Flow
The first step involves precisely defining the Question of Interest—the specific question, decision, or concern that the modeling aims to address [14]. This is followed by defining the Context of Use (COU), which details how the model will be used to address this question, including the specific role, scope, operating conditions, and quantities of interest for the computational model [14] [40]. The COU statement should explicitly describe any additional data sources that will inform the question alongside the model outputs. Ambiguity in defining the COU can lead to reluctance in accepting modeling results or protracted dialogues between model developers and regulators regarding credibility requirements [14].
Model risk is determined by evaluating two key factors: Model Influence (the contribution of the computational model relative to other evidence in decision-making) and Decision Consequence (the significance of an adverse outcome resulting from an incorrect decision) [14] [40]. The following table illustrates how these factors combine to determine overall model risk:
Table 1: Model Risk Assessment Matrix
| Decision Consequence | Low Model Influence | Medium Model Influence | High Model Influence |
|---|---|---|---|
| Low | Low Risk | Low Risk | Medium Risk |
| Medium | Low Risk | Medium Risk | High Risk |
| High | Medium Risk | High Risk | High Risk |
Based on the determined model risk, specific credibility goals and activities are established. These activities are categorized into verification, validation, and applicability assessments, which are further divided into 13 credibility factors [14]. The rigor of assessment for each factor should be commensurate with the model risk. For example, a high-risk model would require more stringent acceptance criteria and comprehensive validation activities compared to a low-risk model [40].
The final step involves evaluating the collected evidence against the pre-defined credibility goals to determine if the model is sufficiently credible for its COU [40]. This assessment should be performed by a team with adequate knowledge of the computational model, available evidence, and model requirements. Comprehensive documentation of the entire process, including rationale for credibility goals and summary of findings, is essential for regulatory submissions and scientific transparency [40].
The V&V 40 standard organizes credibility activities into three main categories: Verification, Validation, and Applicability, with detailed factors under each category [14]. The following table summarizes these factors and provides examples of relevant activities:
Table 2: Credibility Factors and Activities in ASME V&V 40
| Activity Category | Credibility Factor | Description & Example Activities |
|---|---|---|
| Verification | Software Quality Assurance | Ensuring proper software installation, licensing, and version control [14] |
| Numerical Code Verification | Checking for coding errors and confirming algorithm implementation [14] | |
| Discretization Error | Assessing solution accuracy related to spatial and temporal discretization [14] | |
| Numerical Solver Error | Evaluating iterative convergence errors and round-off errors [14] | |
| Use Error | Ensuring correct model setup and input by trained users [14] | |
| Validation | Model Form | Assessing appropriateness of governing equations, constitutive relationships, and boundary conditions [14] |
| Model Inputs | Evaluating uncertainty and appropriateness of input parameters and data [14] | |
| Test Samples | Ensuring test articles used for validation are representative of actual devices [14] | |
| Test Conditions | Confirming test conditions represent the COU operating environment [14] | |
| Equivalency of Input Parameters | Ensuring consistent inputs between validation and COU simulations [14] | |
| Output Comparison | Quantitatively comparing model predictions to experimental data using predefined acceptance criteria [14] | |
| Applicability | Relevance of Quantities of Interest | Ensuring validated outputs align with COU prediction needs [14] |
| Relevance of Validation Activities | Assessing how well validation evidence supports the specific COU [14] |
Objective: To ensure the computational model accurately represents the underlying mathematical model and is solved correctly [14] [40].
Methodology:
Objective: To determine how accurately the computational model represents the real world through comparison with experimental data [14] [40].
Methodology:
A compelling application of the V&V 40 framework involves computational fluid dynamics (CFD) modeling of a generic centrifugal blood pump for hemolysis prediction [40]. This case study demonstrates how the same computational model requires different levels of credibility evidence based on two distinct Contexts of Use:
Table 3: Risk Assessment for Blood Pump with Different Contexts of Use
| Context of Use Element | COU 1: Cardiopulmonary Bypass | COU 2: Ventricular Assist Device |
|---|---|---|
| Question of Interest | Are hemolysis levels acceptable for short-term CPB use? | Are hemolysis levels acceptable for short-term VAD use? |
| Device Classification | Class II | Class III |
| Decision Consequence | Low (non-life-threatening outcome) | High (life-threatening outcome) |
| Model Influence | Medium (supplementary evidence) | High (primary evidence) |
| Overall Model Risk | Low | High |
| Required Credibility Evidence | Basic mesh verification, comparison to literature data | Comprehensive V&V, rigorous experimental validation with strict acceptance criteria |
For the CPB application (lower risk), basic verification activities and comparison to literature data may provide sufficient credibility. However, for the VAD application (higher risk), comprehensive experimental validation using particle image velocimetry and in vitro hemolysis testing with strict acceptance criteria would be necessary [40].
The Bologna Biomechanical Computed Tomography (BBCT) solution provides another practical application, where the V&V 40 framework was implemented to demonstrate model credibility for predicting femur fracture risk [42]. The model received a "medium" risk classification, requiring substantial verification and validation activities. Key credibility activities included:
This implementation demonstrated that following the structured V&V 40 approach provided a clear pathway for regulatory qualification of the in silico methodology [42].
Successful implementation of the V&V 40 framework requires specific computational and experimental resources. The following table outlines essential components of the research toolkit for credibility assessment:
Table 4: Essential Research Toolkit for V&V 40 Implementation
| Tool/Resource Category | Specific Examples | Function in Credibility Assessment |
|---|---|---|
| Computational Modeling Software | ANSYS CFX, Finite Element Analysis packages [40] | Provides the simulation environment for computational model implementation and solution |
| Verification Tools | Mesh generation software, convergence analysis utilities [42] [40] | Enables discretization error quantification and numerical accuracy assessment |
| Experimental Validation Apparatus | Particle Image Velocimetry (PIV) systems, in vitro hemolysis test loops [40] | Generates high-quality comparator data for model validation under controlled conditions |
| Biomechanical Testing Equipment | Material testing systems, cadaveric specimen testing fixtures [42] | Provides experimental data for validation of structural and biomechanical models |
| Uncertainty Quantification Framework | Statistical analysis tools, sensitivity analysis algorithms [42] | Facilitates characterization and propagation of uncertainties through the modeling process |
| Documentation System | Electronic lab notebooks, version control systems [40] | Ensures comprehensive tracking of modeling decisions, parameter values, and validation results |
The ASME V&V 40 standard provides a critical framework for establishing credibility of computational models through a risk-informed approach that aligns evidence requirements with decision consequence and model influence. By implementing this structured methodology—from precise definition of the Context of Use through rigorous verification, validation, and applicability assessment—researchers and drug development professionals can generate trustworthy computational evidence suitable for regulatory evaluation. The framework's flexibility allows application across diverse domains, from medical device evaluation to pharmaceutical development, promoting broader acceptance of in silico methodologies while ensuring scientific rigor and patient safety.
Verification, Validation, and Uncertainty Quantification (VVUQ) provides a critical framework for establishing confidence in computational models used across scientific and engineering disciplines. Within computational biomedicine, VVUQ ensures that models serving drug development and clinical decision-making are robust, reliable, and fit-for-purpose [43]. The American Society of Mechanical Engineers (ASME) defines these pillars as follows: Verification addresses "Are we building the model right?" by ensuring the computational model accurately represents the underlying mathematical model and its solution. Validation asks "Are we building the right model?" by assessing how accurately the model replicates real-world phenomena. Uncertainty Quantification (UQ) characterizes and propagates the impact of uncertainties in model inputs, parameters, and structure on the model's outputs [43] [9]. As modeling and simulation become increasingly essential for biomedical innovation, rigorous VVUQ processes provide the necessary foundation for credible predictions that can inform high-stakes decisions in therapeutic development and regulatory evaluation [44].
The following diagram illustrates the fundamental VVUQ workflow in computational modeling, showing how verification, validation, and uncertainty quantification interact to build model credibility for decision-making.
The implementation of VVUQ requires distinct methodologies for each component. Verification encompasses both code verification—ensuring the numerical algorithms are implemented correctly without programming errors—and solution verification—estimating the numerical error in computed solutions [43]. Common techniques include the method of manufactured solutions and discretization error estimation. Validation employs systematic comparison with experimental data, using quantitative validation metrics such as the area metric for statistical comparisons or waveform metrics for time-series data [43]. Uncertainty Quantification employs both probabilistic approaches (e.g., Monte Carlo methods, Bayesian inference) and non-probabilistic approaches (e.g., intervals, fuzzy sets) to characterize and propagate uncertainties from various sources, including parameter uncertainty, model form uncertainty, and experimental noise [43].
The biomedical modeling community has developed several standards and guidance documents to ensure VVUQ rigor. The ASME VVUQ 10 standard provides guidance on verification and validation in solid mechanics, while ASME VVUQ 20 addresses validation in computational fluid dynamics and heat transfer [43]. For medical devices, the ASME VVUQ 40 standard specifically addresses assessing credibility of computational models through verification, validation, and uncertainty quantification [44]. Regulatory agencies increasingly recognize the value of VVUQ, with the FDA incorporating model credibility assessments into their review processes [44]. The Center for Reproducible Biomedical Modeling (CRBM) and FAIR principles (Findable, Accessible, Interoperable, and Reusable) further support community efforts to enhance model transparency and reproducibility [44].
PBPK modeling uses systems of differential equations to predict how drugs are absorbed, distributed, metabolized, and excreted in humans and animals by combining drug-specific and physiology-specific information [45]. These models are particularly valuable for simulating the impact of intrinsic factors (e.g., genetics, disease, organ impairment) and extrinsic factors (e.g., formulation, food effects, drug-drug interactions) on drug pharmacokinetics, enabling extrapolation from healthy adults to broader populations [45]. The construction of virtual populations that reflect physiological and pathophysiological characteristics of specific patient subgroups serves as critical input for PBPK models, enabling prediction of drug exposure, efficacy, and safety in populations where clinical data may be limited [45].
Table 1: Virtual Populations in PBPK Modeling for Various Clinical Scenarios
| Category | Population | Key Physiological Parameters | Exemplar Drugs Modeled |
|---|---|---|---|
| Pediatrics | Term neonates to adolescents | Age-dependent organ maturation, enzyme expression | Midazolam, caffeine, carbamazepine, theophylline [45] |
| Geriatrics | Elderly (65-100 years) | Age-dependent decreases in hepatic and renal function | Morphine, furosemide, simvastatin [45] |
| Pregnancy | Pregnant women and fetus | Pregnancy-induced physiological changes | Cefazolin, cefuroxime, cefradine [45] |
| Organ Impairment | Renal impairment (mild, moderate, severe) | Reduced glomerular filtration rate, altered clearance | Oseltamivir, sitagliptin, rosiglitazone [45] |
| Disease States | Obesity (adults and children) | Altered body composition, organ size, blood flows | Midazolam, caffeine, acetaminophen, clindamycin [45] |
Implementing rigorous VVUQ for PBPK models requires systematic protocols. Verification involves checking the mathematical implementation through mass balance verification and ensuring numerical solver accuracy. Validation typically follows a stepwise approach, beginning with validation against rich pharmacokinetic data from healthy volunteers, then progressing to special populations [45]. Validation metrics include comparison of predicted versus observed AUC, C~max~, and other PK parameters, with acceptability often determined by whether predictions fall within two-fold of observed values [45]. Uncertainty Quantification involves characterizing parameter uncertainty through techniques like Markov Chain Monte Carlo and Bayesian inference, then propagating these uncertainties to output predictions [44]. For models intended to support regulatory decisions, establishing a credibility framework following ASME VVUQ 40 is increasingly recommended [44].
QSP modeling integrates mechanistic knowledge of biological pathways, drug targets, and pharmacokinetic-pharmacodynamic relationships to quantitatively predict drug efficacy and toxicity [44]. Unlike traditional pharmacokinetic/pharmacodynamic models, QSP models explicitly represent biological networks and their interactions with therapeutics, capturing emergent behaviors that arise from interactions across multiple biological scales [44]. A key strength of QSP is its ability to integrate diverse data types, from molecular-level measurements (e.g., protein expression, metabolic profiles) to cellular responses (e.g., proliferation, apoptosis) and ultimately to tissue- and organ-level phenotypes [44]. This multi-scale integration makes QSP particularly valuable for predicting drug efficacy and toxicity, which are emergent properties arising from complex interactions across biological scales [44].
The VVUQ process for QSP models presents unique challenges due to their complexity and multi-scale nature. Verification focuses on ensuring mathematical consistency across scales and checking the implementation of complex biological networks. Validation often employs a multi-level approach, comparing model predictions against data at different biological hierarchies—from in vitro cellular responses to in vivo physiological outcomes [44]. Uncertainty Quantification in QSP must address both parameter uncertainty and model structure uncertainty, often using techniques like global sensitivity analysis to identify key uncertainty contributors [44]. A promising approach is the integration of machine learning with QSP, where ML helps address data gaps and improves individual-level predictions while QSP provides biological grounding and mechanistic interpretability [44].
Table 2: Research Reagent Solutions for QSP and In Silico Trial Modeling
| Tool/Category | Specific Examples | Function/Purpose |
|---|---|---|
| QSP Platforms | jinkō platform (Nova) | Disease modeling and clinical trial simulation [46] |
| Virtual Population Generators | Conditional Tabular Generative Adversarial Networks (CTGANs) | Generate synthetic patient cohorts with realistic characteristics [47] |
| PBPK Software | Industry-standard PBPK platforms | Simulate drug disposition in diverse populations [45] |
| AI/Analytics | PandaOmics, Natural Language Processing (NLP) | Target identification, patient stratification from EHRs [48] |
| UQ Methodologies | Monte Carlo methods, Bayesian inference, Predictive Capability Metrics | Quantify and propagate uncertainties in model predictions [43] [49] |
In silico clinical trials (ISCTs) use computer simulations to predict the outcomes of clinical trials by combining trial design elements with disease, drug, and population models [46] [47]. The core methodology involves creating virtual patient cohorts that reflect the demographic, physiological, genetic, and clinical characteristics of target populations, then simulating their response to interventions using validated QSP or PBPK models [46]. These approaches are particularly valuable for addressing challenges in rare disease research, where small patient populations and ethical constraints limit traditional trial feasibility [48]. ISCTs can optimize trial designs through virtual arms, synthetic control groups, and exploration of different dosing regimens, potentially reducing the number of patients required in actual trials and accelerating therapeutic development [48] [47].
The following workflow illustrates the typical process for developing and executing an in silico clinical trial, from model development to clinical prediction.
A landmark demonstration of ISCT validation comes from Nova's successful prediction of the Phase III FLAURA2 trial results for non-small cell lung cancer [46]. Researchers used only publicly available information from Phase I/II trials and the Phase III protocol to inform their QSP model of EGFR-mutant lung cancer and simulate 5,000 virtual patients [46]. The simulation accurately predicted the hazard ratio and time-to-progression benefits of the osimertinib combination therapy, with the predicted hazard ratio of 0.602 closely matching the actual trial result of 0.62 [46]. This prospective prediction, completed in approximately three weeks compared to the actual trial duration of 33 months, demonstrates how ISCTs can potentially accelerate and de-risk drug development [46].
Establishing credibility for ISCTs requires particularly rigorous VVUQ protocols due to their direct implications for clinical decision-making. Verification ensures proper implementation of the trial simulation infrastructure, including patient enrollment algorithms, treatment assignment, and endpoint assessment. Validation follows a stepwise process, beginning with face validation by domain experts, then comparison with historical data, and ideally culminating in prospective prediction of actual clinical trial outcomes [46]. Uncertainty Quantification must address both model uncertainty (in the disease and drug action models) and population uncertainty (in how well the virtual cohort represents the real patient population) [47]. For regulatory acceptance, establishing credibility evidence through standards such as ASME VVUQ 40 is essential, with the level of evidence commensurate with the model's impact on decision-making [44].
Despite methodological advances, several cross-cutting challenges persist in implementing VVUQ for biomedical models. Semantic interoperability remains a barrier, with a lack of shared ontologies and metadata linking experimental outputs to computational model variables [48]. Calibration and validation workflows need strengthening, as quantitative measurements are not routinely used to calibrate mechanistic models, nor are model predictions prospectively tested in experimental systems [48]. There is also a need for better integration of qualitative system features (e.g., bistability, switch-like behaviors) with quantitative precision to ensure models capture essential biological dynamics [44]. Community-driven initiatives such as the Computational Modeling in Biology Network (COMBINE) and adherence to FAIR data principles are actively addressing these challenges [44].
The field of VVUQ for biomedical models is rapidly evolving, with several emerging trends shaping its future. There is growing emphasis on credibility assessment frameworks that provide structured approaches for evaluating model reliability for specific contexts of use [43] [44]. The integration of machine learning with mechanistic modeling continues to advance, offering opportunities to enhance predictive accuracy while maintaining biological interpretability [44]. Community efforts are increasingly focused on model reproducibility and transparency, including model sharing, standardized reporting, and open-source tool development [44]. For successful implementation, researchers should prioritize proactive and cautious adaptation of literature models rather than developing entirely new models, following a "learn and confirm" paradigm that critically assesses biological assumptions, pathway representations, and parameter estimation methods before application to new contexts [44].
As computational models assume increasingly prominent roles in biomedical research and development, robust VVUQ practices will be essential for building trust and ensuring these powerful tools deliver on their promise to accelerate therapeutic innovation and improve patient care.
In computational modeling research, Verification and Validation (V&V) are fundamental processes for establishing model credibility and reliability. Verification addresses the question, "Are we building the model correctly?"—a process of ensuring the computational model accurately represents the developer's conceptual description and specifications. Validation addresses the question, "Are we building the right model?"—determining how accurately the computational model represents the real-world system it intends to simulate [9]. Within mission-critical industries such as aerospace, medical devices, and drug development, formal V&V activities can account for more than 40% of total project effort due to the increasing complexity and safety-integrity requirements of embedded components and systems [50].
The cost of remedying errors grows exponentially throughout the development lifecycle. Studies indicate that the cost of error-remedying can be as much as 100 times greater in the final phases compared to early stages of development [50]. This underscores the critical importance of robust V&V practices, particularly in drug development where model predictions directly impact research directions, clinical trials, and ultimately, patient outcomes.
Table: Core Concepts in Verification and Validation
| Term | Definition | Key Question | Primary Focus |
|---|---|---|---|
| Verification | Process of determining that a computational model accurately represents the developer's conceptual description and specifications [9]. | "Am I building the model right?" [9] | Solving equations correctly; numerical accuracy; code correctness |
| Validation | Process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended uses of the model [9]. | "Am I building the right model?" [9] | Model fidelity to physical reality; experimental comparison |
| Uncertainty Quantification (UQ) | The process of quantifying uncertainties in mathematical models, computational solutions, and experimental data [28]. | "How confident are we in the predictions?" | Identifying, characterizing, and propagating sources of error and uncertainty |
| Solution Verification | Assessment of the numerical accuracy of computed solutions, including iterative and discretization errors [51]. | "How accurate is this specific solution?" | Numerical error estimation; convergence analysis |
| Code Verification | Process of ensuring computational algorithms are implemented correctly in software [51]. | "Is the software bug-free?" | Code correctness; algorithm implementation |
Figure 1: Core V&V Process Flow and Relationship to Modeling
Poor documentation represents one of the most critical failures in V&V processes. Without clearly defined requirements as a starting point, teams lack a definitive benchmark against which to verify and validate their models [50]. This leads to incomplete testing, overlooked design problems, and potential non-compliance with industry standards and safety-critical issues. The fundamental problem emerges when development proceeds without a precise understanding of what the model should accomplish and what constitutes acceptable performance.
How to Avoid: Establish a rigorous requirements management process that documents both functional and non-functional model requirements. Maintain version-controlled documentation that traces requirements to specific verification tests and validation benchmarks. Implement a formal change control process to ensure documentation remains synchronized with model evolution.
Many organizations make the critical error of not using V&V at early stages of model development, treating it as a final checkpoint rather than an integral part of the development lifecycle [50]. This delayed approach allows errors to propagate and become embedded in the model architecture, dramatically increasing remediation costs. The sequential nature of traditional approaches like the V-model exacerbates this problem by deferring testing until late stages, resulting in late detection of defects that require expensive rework [52].
How to Avoid: Implement V&V activities from the earliest conceptual phases through the complete model lifecycle. Apply techniques such as dependability analysis, safety analysis, certification, and qualification support during initial development to detect defects before they are introduced into the system. Adopt iterative development approaches that validate conceptual models against simplified test cases before full implementation.
A frequently overlooked aspect of V&V is not validating your test tools or methods [50]. All formal V&V tool sets—including design modeling tools, formal proofing tools, and model checking tools—require their own validation to ensure they are functioning properly and providing accurate results. Using unvalidated tools risks propagating errors through the entire verification chain while providing false confidence in model correctness.
How to Avoid: Establish a formal tool qualification process that includes benchmark testing, version control, and regular calibration. Document tool limitations and operational boundaries. For critical applications, use multiple independent tools to cross-verify results and identify potential tool-specific errors.
Not giving independence to whoever is doing the evaluation introduces confirmation bias and undermines the objectivity of V&V activities [50]. When model developers verify their own work, they naturally tend to follow mental pathways that confirm the model's correctness rather than aggressively seeking to uncover defects. This compromised independence is particularly problematic in high-consequence applications where unbiased assessment is essential.
How to Avoid: Implement organizational separation between development and V&V teams. Establish independent review processes with clearly defined accountability. For highest-criticality applications, consider third-party verification to ensure complete objectivity. Independent verification and validation add significant value to systems and software development even when third-party certification is not mandatory [50].
A critical pitfall involves not testing the final product that will actually be deployed [50]. This occurs when requirements change during development but the V&V activities are not updated accordingly, resulting in a validated model that differs significantly from what is ultimately delivered. The disconnect emerges when teams fail to maintain strict configuration management between the model specification, implementation, and validation suite.
How to Avoid: Implement rigorous configuration management that synchronizes model requirements, implementation, and V&V activities. When requirements change, update both the model and the corresponding V&V tests. Maintain a comprehensive test suite that validates the integrated final product rather than just individual components.
Using inexperienced teams for V&V activities leads to underestimated timelines, inadequate test coverage, and failure to identify subtle but critical errors [50]. V&V complexity grows dramatically with model sophistication, and inexperienced teams often lack the perspective to anticipate edge cases and failure modes. Preparation alone for complex V&V activities can take months, a timeframe frequently underestimated by those unfamiliar with rigorous V&V processes.
How to Avoid: Invest in specialized V&V training and mentorship programs. Engage experienced personnel throughout the V&V lifecycle, particularly during planning and critical review stages. Develop realistic resource estimates based on historical data from similar projects with comparable complexity levels.
Table: Pitfall Summary and Mitigation Strategies
| Pitfall | Impact | Early Indicators | Mitigation Strategies |
|---|---|---|---|
| Poor Documentation | Unclear verification targets; incomplete testing; safety issues overlooked [50] | Vague requirements; frequent misinterpretations; missing traceability | Implement requirements management tools; maintain version-controlled documentation; establish traceability matrices |
| Late V&V Integration | 100x higher correction costs; architectural defects; major rework required [50] [52] | V&V treated as final phase; early design reviews skipped; no prototype validation | Embed V&V from project inception; use iterative development; conduct early feasibility studies |
| Unvalidated Test Tools | False confidence; systematic errors; incorrect results [50] | Tool errors discovered late; inconsistent results across platforms; missing tool documentation | Establish tool qualification process; use multiple independent tools; document tool limitations and boundaries |
| Lack of Independence | Unchallenged assumptions; confirmation bias; missed defects [50] | Developers self-verifying; lack of critical review; defensive responses to findings | Separate development and V&V teams; implement independent peer review; use third-party verification for critical systems |
| Inexperienced Teams | Unrealistic schedules; inadequate test coverage; overlooked failure modes [50] | Missed milestones; superficial test cases; inability to anticipate edge cases | Invest in specialized training; engage senior reviewers; develop competency frameworks |
Uncertainty Quantification (UQ) has emerged as a critical component of comprehensive V&V, particularly with the rise of predictive simulation in research and development. UQ systematically accounts for various sources of uncertainty, including parameter uncertainty (from imperfectly known inputs), model form uncertainty (from imperfect representations of physics), and numerical uncertainty (from discretization and solution approximations) [51].
UQ Workflow Protocol:
Validation requires rigorous comparison between model predictions and experimental observations using quantitatively defined metrics. The validation process encompasses planning, execution with close collaboration between simulation and test teams, accuracy assessment, and eventual model acceptance decisions [51].
Validation Metrics Protocol:
Verification encompasses both code verification (ensuring algorithms are implemented correctly) and solution verification (assessing numerical accuracy of specific computations).
Code Verification Protocol:
Solution Verification Protocol:
Artificial intelligence is transforming V&V practices through multiple avenues. AI-powered tools like ChatGPT can assist with basic simulation questions, help debug code, and automate programming tasks, serving as valuable aids particularly for beginner users [53] [54]. More significantly, machine learning integration enables data scientists to test and refine AI-driven solutions in risk-free simulated environments, while simulation modelers benefit from more accurate, data-driven inputs that improve model fidelity [53].
Reinforcement learning (RL) integration through Python and Java APIs allows linking simulation models with popular RL libraries, enabling agents to explore different strategies within simulated environments and gradually improve by learning from actions and outcomes [53]. This approach helps fine-tune policies that can later be deployed in real-world operations.
The growing emphasis on digital twins is driving adoption of surrogate modeling techniques that create fast-running approximations of high-fidelity models. Reduced order modeling techniques based on eigenvalue analysis can compress million-degree-of-freedom systems down to manageable sizes while preserving essential behaviors [54]. Similarly, neural network-based surrogate models can be trained to provide accurate results in seconds instead of hours, enabling rapid parameter studies and interactive applications [54].
Real-time data streaming through protocols like MQTT (Message Queuing Telemetry Transport) enables digital twins to maintain synchronization with real-world assets, ensuring accurate real-time representation for validation against operational data [53].
The shift toward cloud-based simulation platforms transforms how models are developed, verified, and validated. Cloud environments eliminate hardware constraints, enable multi-user access, and facilitate version control for both models and V&V artifacts [53]. Emerging platforms allow users to not only run and share models but also build and edit them directly through web interfaces, enabling seamless real-time collaboration across geographically distributed teams [53].
Figure 2: Next-Generation V&V with AI and Cloud Technologies
Table: Essential Research Reagent Solutions for Computational V&V
| Tool/Category | Function | Application in V&V | Representative Examples |
|---|---|---|---|
| Code Verification Tools | Verify algorithm implementation and identify coding errors | Ensure computational algorithms are implemented correctly; detect software defects | Method of Manufactured Solutions; Method of Exact Solutions; Convergence testing tools [51] |
| Uncertainty Quantification Frameworks | Quantify and propagate uncertainties through computational models | Assess confidence in predictions; identify dominant uncertainty sources; support risk-informed decisions | Monte Carlo methods; Bayesian inference tools; Polynomial chaos expansions [51] |
| Validation Metrics & Comparison Tools | Quantitatively compare model predictions with experimental data | Assess model accuracy; establish predictive capability; support model acceptance decisions | Waveform metrics; area metric; statistical comparison tools [51] |
| AI-Assisted Programming Tools | Generate, debug, and optimize code through natural language interaction | Accelerate V&V tool development; automate repetitive tasks; assist beginners with simulation queries | ChatGPT; GitHub Copilot; specialized AI assistants for simulation [53] [54] |
| Reduced Order Modeling Tools | Create fast-running surrogate models from high-fidelity simulations | Enable rapid parameter studies; facilitate digital twin creation; support system-level simulation | Eigenvalue-based reduction; neural network surrogates; proper orthogonal decomposition [54] |
| Cloud-Based Collaboration Platforms | Enable multi-user model development and V&V in shared environments | Facilitate team collaboration; maintain version control; provide scalable computing resources | AnyLogic Cloud; COMSOL Server; custom web-based platforms [53] |
Effective verification and validation requires both technical rigor and organizational commitment. Beyond implementing specific techniques, successful organizations foster a culture of quality assurance where V&V is viewed as an essential investment rather than an inconvenient cost. This involves recognizing the warning signs of inadequate V&V—rushed design tests, unclear responsibilities, ignored standards, or dismissed safety-critical issues—and addressing them proactively [50].
For computational modeling researchers, particularly in drug development where decisions have profound consequences, comprehensive V&V provides the foundation for credible, defensible results. By understanding common pitfalls, implementing robust methodologies, and leveraging emerging technologies, research teams can develop models that truly advance scientific understanding while minimizing potentially costly errors. The integration of traditional V&V practices with modern approaches like AI assistance, surrogate modeling, and cloud collaboration represents the future of credible computational simulation across all scientific domains.
Verification, Validation, and Uncertainty Quantification (VVUQ) constitute a critical framework for establishing credibility in computational modeling research. Verification is "the process of determining that a computational model accurately represents the underlying mathematical model and its solution," while validation is "the process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended uses of the model" [3]. Simply put, verification ensures you are "solving the equations right," and validation ensures you are "solving the right equations" [3]. Uncertainty Quantification (UQ) is the formal process of tracking uncertainties throughout model calibration, simulation, and prediction [6].
When data is limited or incomplete, performing rigorous VVUQ becomes particularly challenging yet increasingly critical. This guide outlines practical strategies for researchers and drug development professionals to build model credibility despite data constraints, framed within the broader context of establishing trust in computational models.
The fundamental VVUQ process flow begins with verification, proceeds to validation, and incorporates uncertainty quantification throughout, as illustrated below:
VVUQ Process Flow. This diagram illustrates the interrelationship between core VVUQ activities, highlighting that verification must precede validation, with uncertainty quantification integrated throughout. Adapted from VVUQ literature [3].
In data-limited scenarios, two primary types of uncertainty must be addressed:
The intended use of the model dictates the rigor required in VVUQ processes. For regulatory applications or clinical decision-making, more extensive VVUQ is necessary compared to models used for basic research or hypothesis generation [3] [6].
A risk-based approach, as formalized in the ASME V&V 40 standard for medical devices, prioritizes VVUQ activities based on the model's decision context and the associated impact of an incorrect prediction [1] [55]. This is particularly valuable when data is limited, as it directs resources to the most critical areas.
Implementing a risk-based approach involves:
Verification is especially critical when validation data is scarce, as it ensures that any discrepancies during validation stem from the model itself rather than implementation errors [3].
Table 1: Verification Methods Under Data Constraints
| Method Category | Specific Technique | Application in Data-Limited Scenarios | Key Performance Metrics |
|---|---|---|---|
| Code Verification | Comparison to analytical solutions | Verify against simplified cases with known solutions | Agreement within 3% of analytical solution [3] |
| Calculation Verification | Mesh convergence studies | Assess discretization error with systematic refinement | Solution change <5% with further refinement [3] |
| Code Verification | Method of Manufactured Solutions (MMS) | Create test problems with predefined solutions | Convergence to known solution at expected rate |
| Calculation Verification | Grid Convergence Index (GCI) | Quantify numerical uncertainty | GCI <3.3% indicates sufficient grid resolution [55] |
When comprehensive validation experiments are not feasible, these strategies can provide evidence of model validity:
Leverage Multi-level Validation:
Utilize Bayesian Methods: Bayesian approaches are particularly valuable for quantifying anatomical and parameter uncertainties from limited clinical data [6]. These methods can:
Employ Temporal Validation: For digital twins that are continuously updated, temporal validation assesses how well the model predicts system evolution over time, even with limited data points [6].
Formal UQ is essential when data is limited to understand the reliability of model predictions:
Sensitivity Analysis: Global sensitivity analysis (e.g., Sobol indices) identifies which input parameters most significantly affect outputs, guiding where to focus resources for better characterization [55]. In cardiac flow modeling, sensitivity analysis revealed that "ejection fraction, the heart rate, and the pump performance curve coefficients are the most impactful inputs" [55].
Surrogate Modeling: Tools like EasySurrogate from the VECMA toolkit can create computationally efficient surrogate models, enabling comprehensive UQ even when original simulations are expensive [56].
Uncertainty Propagation: Use Monte Carlo or polynomial chaos methods to propagate input uncertainties through the model, providing confidence bounds on predictions [55].
Following a structured plan ensures comprehensive coverage despite data limitations. The workflow below illustrates a systematic approach:
VVUQ Implementation Workflow. This workflow outlines a systematic approach for executing VVUQ activities when data is limited, emphasizing the sequence from risk analysis to final credibility assessment. Based on ASME V&V40 implementation case study [55].
Protocol 1: Sensitivity-Driven Validation Resource Allocation This protocol maximizes validation effectiveness when experimental data is limited:
Protocol 2: Credibility Assessment Scaling Based on the Risk-Based Credibility Framework from ASME V&V 40 [1]:
Table 2: Essential Computational Tools for VVUQ in Data-Limited Environments
| Tool Category | Specific Tool/Platform | Function in Data-Limited VVUQ | Application Context |
|---|---|---|---|
| VVUQ Workflow Management | EasyVVUQ (VECMA Toolkit) [56] | Simplifies implementation of VVUQ workflows | Multiscale applications in computational biomedicine |
| Surrogate Modeling | EasySurrogate (VECMA Toolkit) [56] | Creates efficient surrogate models for UQ | Applications where full simulation is computationally expensive |
| Automation & HPC Execution | FabSim3, QCG Tools [56] | Automates computational research activities | High-performance computing environments |
| Model Coupling | MUSCLE3 [56] | Supports coupling of multiscale models | Digital twins with multiple spatial/temporal scales |
| Uncertainty Quantification | Dakota [55] | Performs parameter studies, optimization, and UQ | General computational models including biomedical applications |
| Credibility Assessment | PCMM Framework [57] | Assesses predictive capability maturity | Organizational assessment of simulation credibility |
A comprehensive VVUQ plan for a numerical model of left ventricular flow after Left Ventricular Assist Device (LVAD) implantation demonstrates practical implementation of these strategies under constraints [55]:
Challenge: Predicting thrombus formation risk in LVAD patients requires understanding flow patterns, but comprehensive in vivo validation data is limited.
VVUQ Approach:
This case study demonstrates that even with limited clinical data, structured VVUQ using benchtop experiments and comprehensive UQ can build model credibility for critical medical applications.
Implementing robust VVUQ strategies when data is limited or incomplete requires a systematic, risk-based approach that prioritizes activities based on their impact on the model's decision context. By leveraging verification techniques, sensitivity analysis, Bayesian methods, and structured uncertainty quantification, researchers can establish reasonable confidence in their computational models even with substantial data constraints. The frameworks and protocols outlined in this guide provide a pathway for researchers and drug development professionals to build credibility in their models while transparently acknowledging and quantifying limitations arising from data scarcity. As computational models play increasingly important roles in precision medicine and regulatory decision-making, these strategies become essential for responsible model development and application.
Verification, Validation, and Uncertainty Quantification (VVUQ) constitutes a critical framework in computational modeling research to establish confidence in simulation results. Verification addresses whether the computational model is implemented correctly—essentially, "Are we solving the equations correctly?" Validation determines whether the model accurately represents reality—"Are we solving the correct equations?" Uncertainty Quantification (UQ) systematically assesses the effects of uncertainties in inputs, parameters, and models on simulation outputs [4] [1]. Together, these processes ensure that computational models produce credible, reliable results suitable for decision-making in scientific and engineering contexts, from aerospace design to pharmaceutical development [58] [59].
The growing complexity of computational models and the shift toward simulation-based engineering have heightened the importance of robust VVUQ processes. However, traditional VVUQ methods are often resource-intensive, requiring significant expert involvement, computational power, and time. These challenges have created opportunities for Artificial Intelligence (AI) and Machine Learning (ML) to transform VVUQ practices by introducing automation, enhancing efficiency, and enabling more sophisticated uncertainty analysis [56] [60].
Verification is fundamentally concerned with the mathematical correctness of the computational model. It ensures that the governing equations are solved accurately without unintended errors in the implementation. Key verification activities include:
Verification answers the question: "How do we know the computational model is solving the equations correctly?"
Validation evaluates the model's predictive capability for its intended real-world application by comparing computational results with experimental data. The validation process typically follows structured approaches:
The fundamental validation question is: "How do we know the computational model adequately represents reality?"
Uncertainty Quantification systematically characterizes and propagates the effects of various uncertainties on model predictions. Key aspects include:
UQ provides confidence bounds on predictions, enabling risk-informed decision-making based on computational models [4] [1].
Machine learning techniques can significantly accelerate various verification tasks:
These approaches reduce the manual effort required for verification while improving comprehensiveness.
Machine learning transforms validation through data-driven approaches:
These techniques make validation more thorough and efficient, particularly for systems with limited experimental data.
ML approaches have particularly transformative potential for UQ:
Table 1: ML Techniques for Uncertainty Quantification
| ML Technique | UQ Approach | Key Advantages | Application Examples |
|---|---|---|---|
| Gaussian Processes (GP) | Natural uncertainty estimates via predictive variance | Provides uncertainty bounds without additional computation | Surrogate modeling, inverse problems [60] |
| Bayesian Neural Networks (BNN) | Probabilistic weights and outputs | Quantifies both epistemic and aleatoric uncertainty | Nuclear engineering, turbulence modeling [60] |
| Monte Carlo Dropout (MCD) | Approximate Bayesian inference | Easy implementation with standard neural networks | CHF prediction, neutron flux modeling [60] |
| Deep Ensembles (DE) | Multiple models with different initializations | Improved accuracy and uncertainty quantification | Reactor safety case studies [60] |
| Conformal Prediction (CP) | Distribution-free uncertainty intervals | Strong theoretical guarantees for coverage | Nuclear component degradation [60] |
The VECMA toolkit (VECMAtk) represents a comprehensive open-source framework specifically designed to automate and facilitate VVUQ workflows for complex applications [56]. Its modular architecture provides integrated tools for the entire VVUQ pipeline:
The VECMA toolkit has been successfully applied across diverse domains:
These applications demonstrate how the integrated toolkit approach can significantly accelerate VVUQ processes while maintaining methodological rigor.
Objective: To efficiently propagate uncertainties through computationally expensive models using ML surrogates.
This approach typically reduces the computational cost of comprehensive UQ by orders of magnitude while maintaining acceptable accuracy [56] [60].
Objective: To maximize validation insights from limited experimental data using ML techniques.
This protocol provides a structured approach for building confidence in models even when experimental data is scarce.
AI-Enhanced VVUQ Workflow Architecture
This architecture illustrates how AI/ML technologies complement and enhance traditional VVUQ processes, creating parallel pathways that accelerate verification, enhance validation, and enable more comprehensive uncertainty quantification.
Table 2: Essential Tools and Resources for AI-Enhanced VVUQ
| Tool/Resource | Category | Primary Function | Application in VVUQ |
|---|---|---|---|
| VECMA Toolkit | Integrated Framework | End-to-end VVUQ workflow management | Automation of complex UQ and validation campaigns [56] |
| EasyVVUQ | VVUQ Library | Structured UQ and sensitivity analysis | Standardizes UQ process across different simulation codes [56] |
| TensorFlow/PyTorch | ML Framework | Deep learning model development | Building surrogate models, BNNs for UQ [60] |
| scikit-learn | ML Library | Traditional machine learning algorithms | GP regression, sensitivity analysis, preprocessing [60] |
| FabSim3 | Automation Tool | Computational research automation | Reproducible simulation campaigns on HPC [56] |
| SmartUQ | Commercial UQ | Design of experiments, calibration | UQ for engineering systems, sensitivity analysis [4] |
| ASME VVUQ Standards | Guidelines | Standard procedures and terminology | Ensuring regulatory compliance, best practices [1] |
The pharmaceutical industry represents a particularly promising domain for AI-enhanced VVUQ, with several critical application areas:
ML-enhanced computational models enable virtual patient populations and simulated treatment outcomes, reducing the need for expensive and time-consuming physical trials. VVUQ ensures these models produce credible results for regulatory evaluation [56] [1].
AI-driven VVUQ can optimize complex drug delivery parameters while quantifying uncertainties related to physiological variability, manufacturing tolerances, and environmental factors.
In pharmaceutical testing, proper distinction between method validation (establishing fitness for purpose), verification (confirming validated methods work in new settings), and qualification (early-stage evaluation) is crucial for regulatory compliance [61]. AI can streamline these processes through automated data analysis and pattern recognition.
Despite significant progress, several challenges remain in fully realizing AI-automated VVUQ:
Future developments will likely focus on physics-informed ML (incorporating physical constraints into data-driven models), transfer learning (applying knowledge across related domains), and improved UQ for ML models themselves. Professional organizations including ASME and OECD/NEA are actively developing standards and benchmarks to advance the field [60] [1].
The integration of AI and machine learning with VVUQ processes represents a paradigm shift in computational modeling research. By automating labor-intensive tasks, enhancing validation capabilities, and enabling comprehensive uncertainty quantification, these technologies are making credible computational simulation more accessible and efficient. The ongoing development of integrated toolkits like VECMA, coupled with emerging standards and methodologies, promises to further accelerate this transformation across diverse domains from nuclear engineering to pharmaceutical development. As these approaches mature, they will increasingly support risk-informed decision-making based on computational models, ultimately advancing scientific discovery and engineering innovation.
Verification, Validation, and Uncertainty Quantification (VVUQ) represents a systematic framework essential for establishing credibility in computational modeling and simulation (CM&S). Within computational modeling research, verification is performed to determine if the computational model fits the mathematical description, while validation is implemented to determine if the model accurately represents the real-world application. Uncertainty quantification (UQ) is conducted to determine how variations in numerical and physical parameters affect simulation outcomes [1]. This triad of processes has become increasingly critical across risk-sensitive fields, particularly in precision medicine and drug development, where model predictions directly impact clinical decision-making and patient outcomes [6].
The fundamental question VVUQ addresses is one of trust: can the computational models be relied upon for specific decisions? For researchers and drug development professionals, this translates to ensuring that models of physiological systems, disease progression, or drug interactions produce predictions with known confidence bounds. This is especially crucial for digital twins in precision medicine, which provide tailored health recommendations by simulating patient-specific trajectories and interventions [6]. Without rigorous VVUQ, computational models remain unverified mathematical constructs whose real-world applicability is unknown.
A clear understanding of verification and validation reveals their complementary but distinct roles in computational research. Verification is the process of checking that software correctly implements the specific function, ensuring the model is built right according to specifications. It answers the question: "Are we building the product right?" [62] [63]. In contrast, validation determines whether the software that has been built is traceable to customer requirements, ensuring the right product is built to meet user needs. It answers the question: "Are we building the right product?" [62] [63].
This distinction is not merely semantic; it dictates different methodologies, timing, and responsibilities within the research workflow. Verification typically involves static testing methods such as code reviews, walkthroughs, and inspections performed by the quality assurance team, while validation employs dynamic testing methods including actual program execution in realistic environments by the testing team [63].
Uncertainty Quantification provides the formal process of tracking uncertainties throughout model calibration, simulation, and prediction. These uncertainties can be epistemic (stemming from incomplete knowledge) or aleatoric (arising from natural variabilities not captured by the model) [6]. By quantifying these uncertainties, UQ enables the prescription of confidence bounds, which demonstrate the degree of confidence researchers should have in their predictions—a critical requirement for models intended to inform clinical decisions [6].
Table 1: Core Components of VVUQ in Computational Modeling
| Component | Primary Question | Focus | Methods | Timing |
|---|---|---|---|---|
| Verification | Are we building the model right? | Checking mathematical correctness and code implementation [1] | Code reviews, solution verification, software quality assurance [6] | Before validation [63] |
| Validation | Are we building the right model? | Assessing accuracy in representing real-world physics [1] | Comparison with experimental data, validation metrics [6] | After verification [63] |
| Uncertainty Quantification | How confident are we in the predictions? | Quantifying numerical and physical uncertainties [1] | Sensitivity analysis, Bayesian methods, polynomial chaos [64] | Throughout the modeling lifecycle |
Implementing rigorous VVUQ processes provides compelling financial and strategic benefits that justify the required investment. Manufacturers are increasingly shifting from physical testing toward computational modeling techniques specifically because performing CM&S can decrease the number of physical tests necessary for product development [1]. This transition generates substantial cost savings while accelerating development timelines, particularly in fields like medical device development where physical testing can be prohibitively expensive and time-consuming.
The strategic value extends beyond immediate cost savings. VVUQ processes are specifically designed to improve the efficacy and streamline costs throughout both pre-market and post-market stages of a product's life cycle [1]. In pharmaceutical research and development, this translates to better-informed go/no-go decisions, reduced late-stage failures, and more targeted clinical trials through digital twin methodologies [6]. The NASEM report specifically highlighted VVUQ as essential for building trust in the use of digital twins for risk-critical applications in medicine [6].
From a risk perspective, VVUQ provides crucial protection against the significant costs of model-based errors. Unquantified uncertainty may prevent physicians from taking appropriate actions—or any action—due to safety concerns and an inability to gauge confidence in model output [6]. In regulatory contexts, models without proper VVUQ are increasingly unlikely to gain acceptance by bodies like the FDA, potentially derailing years of research investment [6]. The ASME VVUQ standards provide the guidance that helps practitioners better assess and enhance the credibility of their computational models, directly addressing regulatory concerns [1].
Table 2: Cost-Benefit Analysis of VVUQ Implementation
| Cost Category | Without VVUQ | With VVUQ | Quantitative Benefit |
|---|---|---|---|
| Physical Testing | High volume of physical tests required [1] | Reduced physical testing through simulation [1] | Decreased number of physical tests necessary for product development [1] |
| Error Detection | Late-stage error discovery (expensive rework) [63] | Early bug detection through verification [63] | Verification finds 50-60% of defects early [63] |
| Regulatory Compliance | Potential rejection due to unquantified uncertainty [6] | Built-in credibility for regulatory submissions [1] | Adherence to ASME VVUQ Standards (V&V 40 for medical devices) [1] |
| Clinical Decision Support | Unactionable predictions due to unknown confidence [6] | Predictions with prescribed confidence bounds [6] | Enables informed decisions based on causal relationships [6] |
Implementing VVUQ requires structured methodologies tailored to the specific application domain. The following workflow provides a generalized protocol applicable to computational models in drug development and precision medicine:
Phase 1: Verification Protocol
Phase 2: Validation Protocol
Phase 3: Uncertainty Quantification Protocol
Table 3: Essential Research Reagents and Tools for VVUQ Implementation
| Tool Category | Specific Solution | Function in VVUQ Process |
|---|---|---|
| Software Verification Tools | Static code analyzers, Unit testing frameworks | Verify correct algorithm implementation and code functionality [6] [63] |
| Solution Verification Tools | Grid convergence tools, Iterative error estimators | Quantify numerical errors in mathematical model discretization [1] [6] |
| Validation Metrics | Multivariate validation metrics [1], Experimental data repositories | Assess similarity between model predictions and experimental data [1] [64] |
| UQ Methodologies | Bayesian calibration tools [6], Polynomial Chaos Expansion [64], Monte Carlo Sampling [64] | Quantify and propagate uncertainties through computational models |
| Standardized Protocols | ASME V&V Standards (e.g., V&V 10 for solid mechanics, V&V 20 for CFD, V&V 40 for medical devices) [1] | Provide domain-specific methodologies for VVUQ implementation |
The application of VVUQ to digital twins in precision medicine illustrates its critical role in high-stakes research environments. Digital twins for cardiovascular and oncology applications consist of five main components: virtual representation, observational data, data assimilation, twin prediction, and decision support [6]. Each component introduces distinct VVUQ requirements.
For cardiac electrophysiological models, verification ensures that computational codes correctly solve the governing PDEs for electrical propagation through personalized heart anatomies derived from CT scans [6]. Validation tests whether these simulations accurately represent individual patients' electrical behavior, particularly for diagnosing arrhythmias such as atrial fibrillation [6]. Uncertainty quantification becomes essential for accounting for anatomical uncertainties from clinical data, such as MRI artifacts that affect predictive capabilities of electrophysiology simulations [6].
In oncology applications, models predicting tumor growth and therapy response must undergo rigorous VVUQ before they can reliably inform treatment selection [6]. The dynamic nature of digital twins—continuously updated with new patient data—introduces unique VVUQ challenges compared to traditional modeling approaches. Specifically, the question arises: how frequently should a digital twin be re-validated to ensure ongoing accuracy? [6] This necessitates more flexible and iterative temporal validation approaches.
Digital Twin Components with VVUQ Integration
Effective resource allocation for VVUQ should follow a risk-informed approach that prioritizes activities based on the model's intended use and potential impact. The ASME V&V 40 standard for medical devices provides a risk-based framework that categorizes model credibility requirements according to the consequence of model error [1]. This framework can be adapted for computational models in drug development by considering two key dimensions: decision consequence (the impact of an incorrect model prediction) and model influence (the weight the model will carry in the decision-making process).
For high-consequence decisions—such as predicting individual patient response to therapy or simulating clinical trial outcomes—investment in comprehensive VVUQ is non-negotiable. The NASEM report specifically emphasizes that VVUQ is essential for building trust in digital twins for risk-critical medical applications [6]. In these contexts, the cost of VVUQ implementation should be framed as essential insurance against catastrophic clinical decisions based on faulty predictions.
A phased implementation strategy optimizes resource allocation while building VVUQ capabilities:
Phase 1: Foundation (0-6 months)
Phase 2: Expansion (6-18 months)
Phase 3: Maturity (18+ months)
Risk-Informed VVUQ Resource Allocation Framework
Justifying investment in VVUQ requires framing it not as an optional expense but as an essential component of credible computational research. The business case rests on four pillars: (1) cost efficiency through reduced physical testing and early error detection; (2) risk mitigation against erroneous decisions based on faulty models; (3) regulatory compliance through adherence to emerging standards; and (4) strategic advantage enabled by trustworthy predictive capabilities.
For computational models intended to inform clinical decisions or regulatory submissions, VVUQ transitions from recommended practice to mandatory requirement. The framework, methodologies, and resource allocation strategies presented provide researchers and drug development professionals with a structured approach to implementing VVUQ that balances comprehensive rigor with practical resource constraints. By systematically building VVUQ into the computational modeling lifecycle, organizations can realize the full potential of simulation-based research while maintaining scientific integrity and patient safety.
In the competitive landscapes of drug development, aerospace, and materials science, predictive computational modeling has become indispensable for accelerating innovation and reducing development costs [19]. However, the utility of these complex models is often criticized due to inadequate verification and validation (V&V), undermining their credibility and acceptance among peers, clinicians, and regulators [7]. Establishing model credibility is not merely a technical challenge but an organizational imperative. Verification is the process of determining that a model implementation accurately represents the conceptual description and solution to the underlying mathematical model—essentially, "solving the equations right" [7] [65]. In contrast, validation is the process of assessing how accurately computational predictions compare to experimental data, or "solving the right equations" [7] [65]. For computational models to reliably support high-stakes decision-making, organizations must overcome significant cultural, procedural, and resource-based hurdles to embed rigorous V&V practices into their core workflows.
A clear, shared vocabulary is essential for effective interdisciplinary collaboration. The following definitions establish this foundation:
The field of digital medicine has evolved the V&V framework into a three-component model (V3) for evaluating Biometric Monitoring Technologies (BioMeTs), which is highly instructive for computational modeling in drug development [67]:
This framework underscores that a model can be perfectly verified and analytically valid yet still fail if it does not accurately represent the relevant real-world clinical or physical phenomena.
A crucial aspect of V&V is the clear distinction between error and uncertainty, as this informs the strategy for credibility building [7].
Table 1: Classification of Errors and Uncertainties in Computational Modeling
| Category | Source | Examples | Mitigation Strategy |
|---|---|---|---|
| Numerical Error | Discretization, iterative convergence, round-off [7] | Discretization error from mesh resolution, tolerance for iterative solvers | Solution verification, grid convergence studies [65] |
| Modeling Error | Assumptions in mathematical representation [7] | Simplified geometry, inaccurate boundary conditions, inadequate constitutive models | Validation against high-quality experimental data, sensitivity studies |
| Parameter Uncertainty | Inherent variation or lack of knowledge [7] | Unknown material properties, variable initial conditions | Uncertainty quantification (UQ), probabilistic analysis, Monte Carlo simulation [7] |
| Experimental Uncertainty | Random and bias errors in validation data [65] | Measurement noise, calibration drift | Improved experimental design, rigorous uncertainty estimation in experiments |
Despite its proven value, robust V&V remains underutilized across many research and industrial sectors [19]. Organizations face several common barriers:
Building a culture that prioritizes model credibility requires a multi-faceted approach combining leadership, process, and education.
Implementing V&V requires the application of specific, repeatable technical methods.
The fundamental strategy of verification is the identification and quantification of errors in the computational model and its solution [65].
The following workflow outlines a robust verification process:
Validation assesses the modeling error by comparing computational results to experimental data from a carefully designed validation experiment [7] [65].
The logical structure of a hierarchical validation methodology is shown below:
Successful V&V relies on a suite of methodological "reagents" and tools.
Table 2: Essential Tools and Methods for a V&V Program
| Tool/Method | Function | Application Context |
|---|---|---|
| Analytical Solutions | Provides exact solution to simplified problem for code verification [65] | Benchmarking numerical solvers; confirming correct implementation of governing equations. |
| Method of Manufactured Solutions (MMS) | Generates a synthetic solution for verifying code on complex problems without known analytical solutions [65] | Code verification for complex PDEs and boundary conditions. |
| Grid Convergence Index (GCI) | Provides a standardized method for reporting discretization error from grid convergence studies [65] | Solution verification; quantifying spatial and temporal numerical error. |
| Validation Metric | A quantitative measure for comparing computational and experimental data, accounting for uncertainty [65] | Validation assessment; moving beyond graphical comparison to statistical confidence. |
| Uncertainty Quantification (UQ) Software | Tools for propagating input uncertainties through a model (e.g., Monte Carlo, polynomial chaos) [7] | Probabilistic analysis; determining output confidence intervals. |
| Centralized QMS Platform | Software to manage design controls, link verification protocols to inputs, and maintain audit trails [66] | Regulatory compliance; ensuring traceability for medical devices and pharmaceuticals. |
Overcoming organizational hurdles to foster a culture of model credibility is not a simple task, but it is a necessary one for organizations that rely on predictive computational modeling. The journey requires committed leadership to champion the value of V&V, the implementation of structured methodologies like hierarchical validation and solution verification, and the breaking down of disciplinary silos in favor of integrated teams. By adopting a strategic framework that includes early planning, standardized protocols, specialized training, and robust tooling, organizations can transform V&V from a perceived bureaucratic burden into a powerful engine for building credible, defensible, and impactful models that accelerate innovation and reduce risk.
Within the broader framework of Verification and Validation (V&V) in computational modeling research, establishing robust validation metrics is fundamental for assessing a model's accuracy and predictive capability. While verification answers the question "Are we solving the equations correctly?" by ensuring the computational model accurately represents the underlying mathematical model, validation addresses "Are we solving the right equations?" by determining the degree to which a model is an accurate representation of the real world from the perspective of its intended uses [3]. Validation is therefore the process of quantifying a model's ability to replicate physical reality, providing the essential evidence needed to build credibility in its predictions, especially in high-stakes fields like drug development and biomedical engineering [3] [1]. This guide details the core metrics and experimental methodologies for rigorously establishing that predictive capability.
The following diagram illustrates how validation fits into the broader V&V workflow and its critical role in linking the computational model to real-world observations.
Validation metrics are quantitative measures used to assess the performance and effectiveness of a statistical, computational, or machine learning model [68]. The choice of metric is dictated by the type of model (e.g., regression vs. classification) and the specific context of its intended use.
Regression models predict a continuous output. Their accuracy is typically assessed by measuring the difference between the model's predictions and the experimentally observed values from the real-world system [68]. Common metrics are summarized in the table below.
Table 1: Key Validation Metrics for Regression Models
| Metric | Mathematical Formula | Interpretation | Advantages | Disadvantages | ||
|---|---|---|---|---|---|---|
| Mean Absolute Error (MAE) | (\frac{1}{n}\sum_{i=1}^{n} | yi - \hat{y}i | ) | Average magnitude of error, in the same units as the data. | Easy to understand; robust to outliers. | Does not penalize large errors heavily. |
| Root Mean Squared Error (RMSE) | (\sqrt{\frac{1}{n}\sum{i=1}^{n} (yi - \hat{y}_i)^2}) | Standard deviation of prediction errors. | Punishes larger errors more severely; common in reporting. | Sensitive to outliers. | ||
| Mean Absolute Percentage Error (MAPE) | (\frac{100\%}{n}\sum_{i=1}^{n} | \frac{yi - \hat{y}i}{y_i} | ) | Average percentage error. | Scale-independent; easy for stakeholders to interpret. | Undefined for zero values; can be biased towards low forecasts. |
Classification models predict a discrete class or category output. Their validation is often based on a Confusion Matrix, a table that describes the performance of a classifier by comparing predicted classes to actual classes [68].
Table 2: Key Validation Metrics Derived from the Confusion Matrix
| Metric | Formula | Description | Use-Case Focus |
|---|---|---|---|
| Accuracy | ((TP + TN) / (TP + TN + FP + FN)) | Overall proportion of correct predictions. | General model performance when classes are balanced. |
| Precision | (TP / (TP + FP)) | Proportion of positive predictions that are correct. | Minimizing false positives (e.g., drug safety screening). |
| Recall (Sensitivity) | (TP / (TP + FN)) | Proportion of actual positives correctly identified. | Minimizing false negatives (e.g., disease diagnosis). |
| Specificity | (TN / (TN + FP)) | Proportion of actual negatives correctly identified. | Critical when correctly identifying negatives is key. |
| F1-Score | (2 \times (Precision \times Recall) / (Precision + Recall)) | Harmonic mean of precision and recall. | Balanced measure when class distribution is uneven. |
For models that output probabilities, the Area Under the Receiver Operating Characteristic Curve (AUC-ROC) is a powerful metric. It measures the model's ability to separate classes across all possible thresholds and is independent of the proportion of responders [68].
A rigorous validation experiment requires a carefully controlled methodology to generate high-quality data for comparing model predictions. The protocol must be designed to isolate the specific physical phenomena the model is intended to simulate.
A 2025 study on optimizing a mathematical model for precision exercise load assessment provides a robust template for a validation protocol [69]. The following workflow maps the key stages of this experimental process.
1. Subject Recruitment & Screening:
2. Baseline Characterization:
3. Controlled Intervention:
4. Longitudinal Data Collection:
5. Data Processing & Model Fitting:
6. Model Performance Comparison:
This table details the essential materials and instruments used in the featured validation experiment, along with their critical functions [69].
Table 3: Essential Research Materials for a Validation Study in Exercise Biomechanics
| Item / Solution | Function in the Experiment |
|---|---|
| Calibrated Ergometer / Cycle Ergometer (Ergoselect 100) | Provides a controlled and quantifiable external workload. Precisely measures power output (Watts) and cadence, which are used to calculate the exact external load. |
| Heart Rate Monitor (POLAR H10) | Measures the subject's heart rate in real-time. Used to ensure exercise intensity remains within the target range and to collect data for calculating Heart Rate Variability (HRV), a key internal load metric. |
| International Physical Activity Questionnaire (IPAQ) | A standardized tool for assessing participants' baseline physical activity levels. Used for screening and characterizing the subject cohort. |
| Data Processing & Modeling Software | Custom scripts or software platforms (e.g., Python, R, MATLAB) are used to fit the mathematical models to the collected data, estimate parameters, and compute validation metrics (MAE, RMSE, etc.). |
A complete validation process must account for uncertainty. Uncertainty Quantification (UQ) is the process of determining how variations in numerical and physical parameters affect simulation outcomes [1]. This is intrinsically linked to Sensitivity Analysis, which measures how uncertainty in a model's output can be apportioned to different sources of uncertainty in its inputs [3].
Sensitivity studies are a critical component performed before or after validation experiments. When conducted prior to validation, they help identify critical parameters that the validation experiment must tightly control. When conducted after, they provide assurance that the experimental results are within initial uncertainty estimates [3]. In patient-specific biomechanical models, for example, key sources of uncertainty include experimentally derived material coefficients and the resolution of medical image data used for 3D geometry reconstruction [3].
In computational modeling research, the processes of verification and validation (V&V) serve as the fundamental pillars for establishing model credibility. Verification is defined as "the process of determining that a computational model accurately represents the underlying mathematical model and its solution," while validation is "the process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended uses of the model" [70]. Succinctly, verification ensures you are "solving the equations right" (mathematics), and validation ensures you are "solving the right equations" (physics) [70]. A comparative analysis of different computational models for the same context of use rigorously applies these V&V principles to determine which model is most credible and fit-for-purpose.
The American Society of Mechanical Engineers (ASME) has developed specialized standards for Verification, Validation, and Uncertainty Quantification (VVUQ), including the V&V 40 standard for assessing credibility of computational modeling applied to medical devices [1] [41]. This standard provides a risk-based framework that is highly applicable to drug development, where model predictions can significantly impact patient safety and therapeutic efficacy. Performing computational modeling and simulation can decrease the number of physical tests necessary for product development, but assuring that the model has been formed using sound procedures is key [1].
The verification and validation process follows a specific logical progression, as verification must necessarily precede validation [70]. This sequence is critical because it separates errors due to model implementation from uncertainty due to model formulation. Verification involves confirming that the model is correctly implemented with respect to its conceptual description and that numerical errors are minimized [58] [70]. Validation checks the accuracy of the model's representation of the real system by comparing computational results with experimental data [58].
The validation process typically employs a structured approach. Naylor and Finger (1967) formulated a three-step methodology that remains widely followed: (1) Build a model that has high face validity, (2) Validate model assumptions, and (3) Compare the model input-output transformations to corresponding input-output transformations for the real system [58]. This framework ensures systematic assessment of a model's predictive capabilities within its intended domain of applicability.
When comparing multiple computational models for the same application context, researchers should consider both quantitative and qualitative criteria. The primary quantitative criteria include [71]:
These criteria are interdependent and must be considered simultaneously for comprehensive model assessment. Goodness-of-fit measures alone are insufficient for model selection because they don't distinguish between fit to meaningful regularity and fit to experimental noise [71]. Generalizability has emerged as the preferred criterion because it evaluates a model's predictive accuracy across repeated experimental samplings, thus penalizing overfitting and rewarding models that capture underlying regularities [71].
Table 1: Core Quantitative Criteria for Model Evaluation
| Criterion | Definition | Common Measures | Interpretation |
|---|---|---|---|
| Descriptive Adequacy | Ability to reproduce observed data | SSE, R², Maximum Likelihood | Higher values indicate better fit to available data |
| Complexity | Model flexibility to fit diverse patterns | Number of parameters, Functional form | Simpler models with comparable fit are preferred |
| Generalizability | Predictive accuracy for new data | AIC, BIC, Cross-validation | Higher values indicate better prediction beyond sample data |
Verification comprises two complementary processes: code verification and calculation verification [70]. Code verification ensures the mathematical model and solution algorithms are working as intended by comparing numerical results to benchmark problems with known analytical solutions [70]. For example, Ionescu et al. verified a hyperelastic constitutive model implementation by demonstrating it could predict stresses to within 3% of an analytical solution for equibiaxial stretch [70].
Calculation verification focuses on errors arising from discretization of the problem domain, typically assessed through mesh convergence studies [70]. A model is considered mesh-converged when subsequent refinement changes the solution output by an acceptably small amount (e.g., <5%) [70]. When comparing multiple models, consistent verification protocols must be applied to all candidates to ensure fair comparison.
Table 2: Quantitative Verification Metrics for Model Comparison
| Verification Type | Assessment Method | Acceptance Criterion | Comparative Metric |
|---|---|---|---|
| Code Verification | Comparison to analytical solutions | <5% deviation from known solution | Relative error percentage across models |
| Mesh Convergence | Systematic refinement of discretization | <5% change in solution with refinement | Convergence rate and asymptotic value |
| Iterative Convergence | Monitoring solution residuals | Reduction below specified tolerance | Computational cost to achieve tolerance |
Validation employs both quantitative and qualitative approaches to assess model accuracy against experimental data. Face validity represents the initial assessment, where domain experts examine model output for reasonableness [58]. This subjective evaluation is particularly valuable in complex biological systems where comprehensive quantitative validation may be impractical.
For quantitative validation, statistical methods provide objective measures of model accuracy. Hypothesis testing can be used to evaluate whether model predictions match experimental observations within acceptable tolerances [58]. The null hypothesis (H₀) states that the model measure of performance equals the system measure of performance, while the alternative hypothesis (H₁) states they are different [58]. The test statistic is calculated as:
t₀ = (E(Y) - μ₀)/(S/√n)
where E(Y) is the expected value from the model, μ₀ is the observed system value, S is the sample standard deviation, and n is the sample size [58]. The model is rejected if |t₀| exceeds the critical t-value for the chosen significance level.
Confidence intervals provide an alternative approach that estimates the range of accuracy. If both the upper and lower confidence bounds fall within an acceptable error range (ε) around the experimental value, the model is considered acceptable [58]. This method is particularly useful when small sample sizes limit statistical power for hypothesis testing.
When comparing multiple models, selection criteria based on generalizability provide robust protection against overfitting. The Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) formalize the trade-off between goodness-of-fit and model complexity [71]. These criteria help identify models that are sufficiently complex to capture underlying regularities but not unnecessarily complex to capitalize on random noise, thereby implementing the principle of Occam's razor [71].
The relationship between model complexity, goodness-of-fit, and generalizability follows a predictable pattern. As complexity increases, goodness-of-fit continually improves, but generalizability peaks at intermediate complexity before declining due to overfitting [71]. In comparative analysis, this means the model with the best fit is not necessarily the best predictor—the optimal model achieves the best balance between fit and complexity.
The ASME V&V 40 standard provides a risk-based framework for establishing model credibility that is particularly relevant for drug development applications [41]. This approach ties the required level of V&V effort to the model influence on decision-making and the consequence of making an incorrect prediction [41]. For high-risk applications (e.g., clinical trial design), extensive validation with comprehensive uncertainty quantification is required, while lower-risk applications (e.g., preliminary screening) may warrant less rigorous assessment.
The credibility assessment process involves defining specific context of use statements that precisely describe the intended application of the computational models [41]. For comparative analysis, all candidate models must be evaluated against the same context of use to ensure fair comparison. The assessment then identifies credibility factors (e.g., conceptual model validation, mathematical model validation, numerical solution verification) and establishes credibility goals for each factor based on the risk assessment [41].
Sensitivity studies are an essential component of model comparison, examining how variations in input parameters affect model outputs [70]. These analyses identify critical parameters that dominate model behavior and help determine if validation results are sensitive to specific inputs [70]. When comparing models, those with similar predictive performance but lower sensitivity to poorly-characterized inputs are generally preferred.
Uncertainty Quantification (UQ) systematically accounts for variations in numerical and physical parameters on simulation outcomes [1]. In comparative analysis, models with comprehensively characterized uncertainties provide more reliable predictions, as the uncertainty bounds reflect confidence in predictions. The ASME VVUQ 10.2-2021 standard specifically addresses the role of uncertainty quantification in verification and validation of computational solid mechanics models, providing guidance applicable across domains [1].
Table 3: Research Reagent Solutions for Computational Model Evaluation
| Tool Category | Specific Solutions | Function in Comparative Analysis | Application Context |
|---|---|---|---|
| Statistical Analysis | R Programming, Python (Pandas, NumPy, SciPy) | Advanced statistical testing and model comparison metrics | General computational modeling across domains |
| Commercial VVUQ Tools | ASME VVUQ Challenge Problem datasets | Benchmarking against standardized problems with known solutions | Method validation and inter-model comparison |
| Visualization Platforms | ChartExpo, Ninja Tables | Creating standardized comparison charts and visualizations | Results communication and pattern identification |
| Specialized Biomechanics | Custom FE software, Image-based modeling tools | Patient-specific model creation and validation | Drug delivery system evaluation, tissue response prediction |
Comparative model evaluation requires carefully designed validation experiments that produce data suitable for discriminating between competing models. The experimental protocol should:
For drug development applications, validation experiments might include in vitro assays, animal models, or clinical data, depending on the model's context of use and development stage. The validation dataset should be independent of the data used for model calibration to avoid bias in the comparative assessment.
Comparative analysis of computational models for the same context of use requires a systematic, multi-faceted approach grounded in verification and validation principles. The framework presented integrates rigorous statistical testing with risk-informed credibility assessment to support model selection in drug development and related fields. By applying consistent verification protocols, comprehensive validation against high-quality experimental data, and advanced model comparison criteria such as generalizability, researchers can confidently identify the most suitable model for their specific application context.
The evolving landscape of computational modeling continues to develop new standards and methodologies for model evaluation. Engagement with the broader VVUQ community through ASME symposiums and challenge problems provides valuable opportunities for benchmarking and methodological refinement [1] [28]. As computational models play increasingly important roles in drug development decisions, robust comparative analysis ensures these powerful tools deliver reliable, actionable insights.
Model-Informed Drug Development (MIDD) is a multidisciplinary approach that uses a variety of quantitative models to inform drug development and regulatory decision-making. These approaches integrate data from preclinical and clinical sources to develop, validate, and utilize exposure-based, biological, and statistical models [72]. MIDD encompasses a broad spectrum of modeling techniques, including Physiologically-Based Pharmacokinetic (PBPK) modeling, Quantitative Systems Pharmacology (QSP), Population PK, Exposure-Response analysis, Model-Based Meta-Analysis (MBMA), and increasingly, Artificial Intelligence/Machine Learning (AI/ML) methods [73]. When successfully applied, MIDD approaches can improve clinical trial efficiency, increase the probability of regulatory success, optimize drug dosing, and support therapeutic individualization, sometimes even reducing the need for dedicated clinical trials [72].
The release of the draft ICH M15 guidance, "General Principles for Model-Informed Drug Development," in December 2024 represents a pivotal moment in the formalization and harmonization of MIDD practices globally [74] [75]. Developed under the auspices of the International Council for Harmonisation (ICH), this guidance provides a globally harmonized framework for assessing evidence derived from MIDD and discusses multidisciplinary principles including MIDD planning, model evaluation, and evidence documentation [74] [76]. The guidance aims to facilitate multidisciplinary understanding, appropriate use, and harmonized assessment of MIDD and its associated evidence, ultimately enabling greater efficiency in drug development while promoting consistent and transparent regulatory evaluation [75]. It introduces a structured framework to assess model credibility and regulatory influence, emphasizing a totality-of-evidence approach and transparent communication of assumptions, risks, and impact [73].
Verification, Validation, and Uncertainty Quantification (VVUQ) constitute a fundamental discipline for ensuring the credibility and reliability of computational models used in MIDD. As stated in the M15 guidance, appropriate use of MIDD requires harmonized approaches to assessment to promote consistent and transparent evaluation of MIDD evidence [75]. In the context of regulatory decision-making for pharmaceutical products, where patient safety and public health are paramount, establishing model credibility through rigorous VVUQ processes is not merely academic—it is a regulatory expectation.
The core components of VVUQ in MIDD encompass:
Verification: The process of determining that a computational model accurately represents the underlying conceptual model and its mathematical representation. This includes code verification (ensuring the mathematical model is correctly implemented in software) and calculation verification (assessing the numerical accuracy of the computed solution) [77] [49].
Validation: The process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended uses of the model [77]. This involves comparison of model predictions with experimental data not used in model development.
Uncertainty Quantification (UQ): The systematic characterization and propagation of uncertainties in model inputs, parameters, and structure to their effects on model outputs [77] [49]. This includes distinguishing between aleatory uncertainty (inherent randomness) and epistemic uncertainty (limited knowledge).
For MIDD approaches, VVUQ provides the scientific evidence that models are fit-for-purpose—that they possess sufficient predictive capability to inform specific regulatory decisions, whether related to dose selection, clinical trial simulation, or predictive safety evaluation [72]. The M15 guidance emphasizes this by calling for clarity in modeling context and transparent communication of assumptions and risks [73].
Table 1: VVUQ Terminology in Regulatory and Engineering Contexts
| Term | Definition in Regulatory MIDD | Engineering Simulation Context |
|---|---|---|
| Verification | Ensuring the computational implementation accurately solves the intended mathematical model | Code verification (correct implementation) and solution verification (numerical accuracy) [77] |
| Validation | Assessing how well the model represents reality for its intended use [77] | Determining model accuracy by comparison with experimental data [77] [49] |
| Uncertainty Quantification | Characterizing uncertainties in model predictions to inform decision-making | Systematic treatment of aleatory and epistemic uncertainties [77] |
| Credibility | Evidence that a model is fit-for-purpose for regulatory decisions | Established through comprehensive VVUQ activities [77] |
The ICH M15 draft guidance establishes several foundational principles that directly impact how VVUQ should be planned and executed for MIDD submissions. While the guidance does not prescribe specific technical methods, it provides a framework for assessing the credibility and relevance of MIDD evidence [75] [76]. The guidance emphasizes that model development should follow its general recommendations in conjunction with current accepted standards and scientific practices for specific modeling and simulation methods [76].
A central concept in M15 is the context of use (COU), which defines the specific role and scope of the model in informing regulatory decisions. The COU directly determines the level of VVUQ rigor required—models supporting critical decisions with significant patient impact necessitate more extensive VVUQ evidence [72]. The guidance promotes a risk-informed approach to VVUQ, where the extent of evaluation is proportionate to the model's influence on decision-making and the consequences of an incorrect decision [72]. Furthermore, M15 encourages a totality-of-evidence perspective, recognizing that model credibility may be supported by multiple lines of evidence rather than a single validation exercise [73].
The guidance also introduces a structured framework for assessing model credibility and regulatory influence, emphasizing transparent communication of assumptions, risks, and impact [73]. This includes documentation of model development, validation, simulation plans, and results in meeting packages submitted to regulatory agencies [72]. For researchers, this means that VVUQ activities must be carefully documented and presented in a manner that allows regulatory reviewers to assess the model's suitability for its intended purpose.
Table 2: M15 Recommendations for VVUQ Planning and Documentation
| M15 Element | VVUQ Implications | Documentation Requirements |
|---|---|---|
| MIDD Planning | Early definition of VVUQ strategy aligned with context of use | Justification of VVUQ approach based on model risk assessment [72] |
| Model Evaluation | Comprehensive assessment of model credibility | Description of data used for model development and validation [72] |
| Evidence Documentation | Transparent reporting of VVUQ activities and results | Model development, validation, simulation plan, and results [72] |
| Risk Assessment | Proportional VVUQ rigor based on decision consequence | Rationale for model risk level considering potential impact of incorrect decisions [72] |
Implementing an effective VVUQ strategy for MIDD begins with a thorough assessment of model risk and criticality. The FDA's MIDD Paired Meeting Program recommends that sponsors include a risk assessment in their meeting packages, considering both the weight of model predictions in the totality of data (model influence) and the potential risk of making an incorrect decision (decision consequence) [72]. This risk assessment should be conducted early in model development and guide the scope and intensity of VVUQ activities.
High-risk contexts, such as models intended to support dose selection in pivotal trials or to replace clinical endpoints, demand more extensive VVUQ, potentially including:
In contrast, models with lower decision consequence, such as those used for internal decision support or early exploratory analysis, may require less extensive but still rigorous VVUQ.
Verification ensures that the computational implementation accurately represents the intended mathematical model and that solutions are numerically accurate. For MIDD applications, verification should address both code verification and solution verification [77].
Code verification methodologies include:
Solution verification focuses on quantifying numerical errors in specific simulations:
Validation establishes the model's accuracy in representing real-world phenomena for its intended use. The M15 guidance emphasizes a fit-for-purpose approach to validation, where the extent and methods are aligned with the model's context of use [75]. A comprehensive validation strategy should include:
The validation process should employ appropriate validation metrics tailored to the model's context of use. These may include:
Diagram 1: Model Validation Workflow for MIDD - This diagram illustrates the comprehensive validation process from planning through documentation, incorporating both internal and external validation components essential for regulatory submissions.
Uncertainty Quantification (UQ) systematically characterizes how uncertainties in model inputs, parameters, and structure propagate to uncertainties in model outputs. For MIDD applications, a comprehensive UQ approach should address:
Effective UQ methodologies for MIDD include:
The results of UQ should be presented in terms that are meaningful for decision-makers, such as confidence intervals around predicted responses, probability of achieving target outcomes, or risk-benefit distributions.
The FDA's MIDD Paired Meeting Program provides a valuable mechanism for sponsors to obtain regulatory feedback on MIDD approaches, including VVUQ strategies, during drug development [72]. This program, conducted by FDA's Center for Drug Evaluation and Research (CDER) and Center for Biologics Evaluation and Research (CBER) during fiscal years 2023-2027, affords selected sponsors the opportunity to meet with Agency staff to discuss MIDD approaches in medical product development [72].
The program features a paired meeting structure consisting of an initial meeting to discuss proposed MIDD approaches, followed by a follow-up meeting approximately 60 days after submission of additional information [72]. This structure allows for iterative feedback and alignment on complex MIDD approaches, including VVUQ plans. The FDA prioritizes selecting requests that focus on dose selection or estimation, clinical trial simulation, and predictive or mechanistic safety evaluation [72].
To participate, sponsors must submit a meeting request that includes specific information about the product, proposed indication, question of interest, MIDD approach(es), context of use, and specific questions for the Agency [72]. For granted requests, a comprehensive meeting package must be submitted 47 days before the initial meeting, including a risk assessment that considers the weight of model predictions and potential decision consequences [72].
Table 3: FDA MIDD Paired Meeting Program Key Dates and Requirements
| Program Element | Timeline/Requirement | Key Considerations |
|---|---|---|
| Meeting Request Due Dates | Quarterly (March 1, June 1, September 1, December 1) [72] | Requests should focus on dose selection, trial simulation, or safety prediction |
| Meeting Grant Notifications | Approximately 1 month after submission due date [72] | FDA grants 1-2 meetings per quarter, with additional depending on resources |
| Meeting Package Deadline | 47 days before initial meeting [72] | Must include risk assessment and detailed MIDD approach |
| Follow-up Meeting | Within ~60 days of package submission [72] | Allows for discussion of refined approach based on initial feedback |
Comprehensive documentation of VVUQ activities is essential for regulatory submissions involving MIDD approaches. The M15 guidance emphasizes transparent communication of assumptions, risks, and impact, requiring detailed documentation that allows regulatory reviewers to assess model credibility [73]. Effective VVUQ documentation should include:
This documentation should follow a totality-of-evidence approach, demonstrating through multiple lines of evidence that the model is sufficiently credible for its intended use [73]. The level of detail should be proportionate to the model's risk level, with higher-risk applications requiring more comprehensive documentation.
Real-world applications demonstrate the significant impact that well-executed MIDD with proper VVUQ can have on drug development and regulatory outcomes. Industry case studies highlighted by Certara experts show several successful applications:
These successes share common elements in their VVUQ approaches:
The integration of these approaches within the M15 framework facilitates more consistent and transparent evaluation of MIDD evidence across regulatory agencies [73].
Implementing robust VVUQ for MIDD requires both methodological expertise and appropriate tools. The following table summarizes key resources and their roles in establishing model credibility.
Table 4: Essential Research Reagents for VVUQ in MIDD
| Tool Category | Specific Tools/Methods | Function in VVUQ Process |
|---|---|---|
| Software Verification Tools | Method of Manufactured Solutions, Convergence Analysis [77] | Verify correct implementation of mathematical models and numerical methods |
| Sensitivity Analysis Methods | Sobol' Indices, Morris Method, Partial Rank Correlation [77] | Identify influential parameters and prioritize uncertainty reduction efforts |
| Uncertainty Propagation Methods | Monte Carlo Simulation, Latin Hypercube Sampling, Polynomial Chaos [77] | Quantify how input uncertainties affect model outputs |
| Model Validation Metrics | Area Under ROC Curve, Visual Predictive Check, Mean Square Error [77] | Quantitatively assess model accuracy and predictive performance |
| Experimental Data for Validation | Clinical trial data, In vitro data, Literature data [72] | Provide reference for model validation and credibility assessment |
| Documentation Frameworks | Model development reports, Validation protocols, Uncertainty documentation [72] | Communicate VVUQ activities and results to regulatory agencies |
Diagram 2: VVUQ Evidence Generation for Regulatory Decisions - This diagram shows how different VVUQ components contribute evidence to support regulatory decision-making within the MIDD framework.
The formalization of MIDD through the ICH M15 guidance represents a significant evolution in drug development and regulatory science. As these approaches become increasingly central to drug development, VVUQ practices will continue to mature and standardize. Emerging trends include:
For researchers and drug development professionals, successfully navigating this evolving landscape requires proactive VVUQ planning integrated throughout the model development lifecycle rather than as a final compliance step. As emphasized by Certara experts, "Impactful MIDD only happens when the entire development team embraces modeling as a core decision-making tool—not just a technical exercise" [73]. This cultural shift, combined with rigorous VVUQ practices aligned with M15 principles, will be essential for realizing the full potential of MIDD to accelerate the development of safe and effective therapies.
The ICH M15 guidance, though currently in draft form with comments accepted until February 28, 2025, already provides a clear direction for the future of MIDD [75]. By establishing a harmonized framework for assessing MIDD evidence, it enables more consistent application of VVUQ principles across regulatory submissions globally. As sponsors and regulators gain experience with this framework, best practices for VVUQ in MIDD will continue to evolve, further enhancing the efficiency and robustness of drug development.
Verification, Validation, and Uncertainty Quantification (VVUQ) constitute a critical framework for establishing credibility in computational modeling and simulation (CM&S). As computational methods increasingly support high-stakes decisions in medicine and healthcare—particularly in emerging applications like in silico clinical trials and digital twins—rigorous VVUQ processes become essential for ensuring reliability, safety, and efficacy [1]. These methodologies provide structured approaches to answer fundamental questions: whether models are implemented correctly (verification), whether they accurately represent reality (validation), and how variations in parameters affect outcomes (uncertainty quantification) [1] [79].
The adoption of CM&S in regulated medical applications represents a paradigm shift, with manufacturers increasingly moving from physical testing to computational modeling throughout product life cycles [1]. This transition is particularly evident in medical device innovation and pharmaceutical development, where in silico methods can reduce the need for animal or human testing while providing valuable insights into device performance, safety, and effectiveness [80]. However, as computational models are idealized digital representations with inherent assumptions, establishing their credibility through VVUQ is prerequisite for their use in decisions that could impact patient safety [80].
This technical guide examines the application of VVUQ frameworks to digital twins and in silico trials, with specific focus on credibility assessment methodologies, standards, and implementation protocols for researchers and drug development professionals.
Within computational modeling research, verification and validation serve distinct but complementary functions:
Verification addresses "Are we building the model right?" by ensuring computational models correctly implement their intended mathematical formulations and that numerical solutions are accurate [1] [9]. This process includes code verification (confirming algorithms solve equations correctly) and solution verification (assessing numerical accuracy, including discretization and iterative errors) [6] [79].
Validation answers "Are we building the right model?" by determining how accurately computational models represent real-world phenomena through comparison with experimental data [1] [9]. Validation assesses a model's ability to predict outcomes beyond the conditions used for its calibration, establishing its predictive capability for intended use contexts [81] [79].
Uncertainty Quantification characterizes and quantifies uncertainties in model inputs, parameters, and predictions, distinguishing between aleatoric uncertainty (inherent variability in physical systems) and epistemic uncertainty (limited knowledge about model parameters) [6] [79]. UQ provides confidence bounds on predictions, enabling risk-informed decision-making [6].
The following diagram illustrates the integrated relationship between VVUQ components in the model development lifecycle:
Figure 1: VVUQ Process Workflow. This diagram illustrates the integrated relationship between verification, validation, and uncertainty quantification activities throughout the computational model development lifecycle.
Multiple standardized frameworks provide guidance for implementing VVUQ in computational modeling:
Table 1: Key VVUQ Standards and Their Applications
| Standard | Domain | Key Focus Areas | Status |
|---|---|---|---|
| ASME VVUQ 1-2022 [1] | General Terminology | Standardized VVUQ terminology across computational modeling | Published |
| ASME V&V 10-2019 [1] | Computational Solid Mechanics | Verification & validation in solid mechanics applications | Published |
| ASME V&V 20-2009 [1] | CFD & Heat Transfer | Standards for fluid dynamics and heat transfer simulations | Published |
| ASME V&V 40-2018 [1] [81] | Medical Devices | Risk-based credibility framework for medical device applications | Published |
| VVUQ 20.1-2024 [1] | Multivariate Validation | Advanced metrics for validation across multiple variables | Published |
| VVUQ 50.1-20XX [1] | Model Life Cycle | Guide incorporating VVUQ throughout model lifecycle | Coming Soon |
The ASME V&V 40 standard provides a particularly influential risk-based credibility framework for medical device applications, establishing procedures for determining the appropriate level of VVUQ evidence based on the model's role in decision-making and the consequences of incorrect predictions [1] [81]. This framework has been adapted specifically for in silico clinical trial applications, expanding consideration to factors including scope, coverage, and severity [81].
The risk assessment framework for in silico clinical trials evaluates model risk based on three independent factors [81]:
This risk assessment directly informs credibility targets, determining the appropriate level of validation evidence required. The following diagram illustrates this risk-based decision framework:
Figure 2: Risk-Based Credibility Assessment Framework. This diagram illustrates the decision process for establishing credibility targets based on risk assessment factors including scope, coverage, and severity of the in silico trial application.
Digital twins in precision medicine comprise five main components that each present distinct VVUQ challenges [6]:
The continuous updating nature of digital twins introduces unique VVUQ challenges, particularly regarding temporal validation approaches. Unlike traditional static models, digital twins require ongoing validation throughout their lifecycle as they incorporate new patient data [6]. This dynamic nature amplifies the importance of uncertainty quantification, as each new data point introduces different levels of uncertainty in model predictions [6].
Validation of healthcare digital twins requires specialized methodologies to address their unique characteristics:
For cardiology applications, personalized cardiac electrophysiological models incorporating CT scans have demonstrated potential for diagnosing arrhythmias like atrial fibrillation when properly validated [6]. Similarly, in oncology, models predicting tumor growth and treatment response require rigorous validation against patient-specific data before clinical application [6].
In silico clinical trials (ISCTs) present unique credibility challenges as they aim to generate clinically relevant data through computational modeling rather than traditional clinical studies [81]. A specialized framework has been developed to evaluate model risk and establish credibility requirements for ISCT applications [81].
This framework assesses credibility of clinical validation activities based on multiple factors:
Table 2: Credibility Factors for In Silico Clinical Trial Validation
| Credibility Factor | Assessment Criteria | High-Credibility Example |
|---|---|---|
| Clinical Comparator [81] | Quality of reference clinical data used for validation | Prospective, controlled clinical trial data with comprehensive patient characterization |
| Validation Model [81] | Capability to represent clinical variability | Model incorporating population-level anatomical and physiological variability |
| Input Agreement [81] | Correspondence between model inputs and clinical scenarios | Virtual patient cohort matching real population demographics and disease severity |
| Output Agreement [81] | Statistical rigor in comparing outcomes | Comprehensive statistical analysis demonstrating equivalence within predefined margins |
| Applicability [81] | Relevance of validation to intended use | Validation against clinical data for the same medical device and similar patient population |
A robust validation protocol for in silico trials involves multiple experimental components:
Clinical Data Collection Protocol:
Virtual Cohort Generation:
Treatment Simulation and Outcome Prediction:
Operational Simulation:
Successful implementation of VVUQ for digital twins and in silico trials requires specific computational tools and methodologies:
Table 3: Essential Research Reagents for Digital Twin and In Silico Trial Development
| Tool Category | Specific Examples | Function in VVUQ Process |
|---|---|---|
| Mechanistic Modeling Platforms [83] | Physiologically Based Pharmacokinetic (PBPK) models, Quantitative Systems Pharmacology (QSP) models | Simulate how therapies interact with biological systems across virtual patient cohorts |
| Uncertainty Quantification Tools [79] | Monte Carlo methods, Bayesian inference techniques, Sensitivity analysis tools | Quantify and propagate uncertainties through computational models |
| Data Generation & Curation [83] | Generative Adversarial Networks (GANs), Large Language Models (LLMs), FAIR data principles | Create synthetic patient cohorts and ensure data findability, accessibility, interoperability, and reusability |
| Validation Metrics & Statistical Packages [81] [79] | Area metric, Z metric, waveform comparison algorithms, Statistical equivalence testing | Quantify agreement between model predictions and experimental/clinical data |
| Model Integration & Interoperability [83] | Modular model architectures, API-based integration frameworks | Enable coordinated ecosystems of interoperable models across trial simulation, outcome prediction, and operational planning |
Implementing effective VVUQ processes requires organizational commitment beyond technical solutions:
As computational modeling approaches like digital twins and in silico trials assume increasingly critical roles in medical product development and regulatory evaluation, robust VVUQ frameworks provide the essential foundation for establishing model credibility. The risk-based approaches outlined in this guide enable researchers and drug development professionals to implement appropriate verification, validation, and uncertainty quantification strategies matched to their specific application contexts.
Successful adoption requires both technical rigor—through standardized methodologies, comprehensive validation protocols, and advanced uncertainty quantification—and organizational commitment to simulation-first cultures, cross-functional collaboration, and systematic credibility assessment. When properly implemented, these approaches enable the trustworthy application of computational modeling to accelerate medical innovation while maintaining rigorous safety and efficacy standards.
The ongoing development of VVUQ standards, particularly through organizations like ASME and NAFEMS, continues to refine best practices for credibility assessment in high-stakes applications [1] [79]. Researchers should maintain awareness of evolving standards and methodologies as the field of computational medicine advances.
Verification and Validation (V&V) form the cornerstone of credible computational modeling research. In the context of computational modeling, verification is defined as "the process of determining that a computational model accurately represents the underlying mathematical model and its solution," while validation is "the process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended uses of the model" [3]. Succinctly, verification is "solving the equations right" (mathematics), and validation is "solving the right equations" (physics) [3]. For models intended to inform regulatory submissions or clinical trials, a rigorous V&V process is not merely academic—it is essential for mitigating risk, ensuring accurate predictions, and building confidence in model-based decisions [3] [19] [1].
The process of establishing model credibility is increasingly guided by standardized frameworks. The ASME (American Society of Mechanical Engineers) has developed a series of V&V standards, including V&V 10 for computational solid mechanics and V&V 40 for application to medical devices [1]. Furthermore, in fields involving digital health technologies like Biometric Monitoring Technologies (BioMeTs), the framework has been expanded to V3: Verification, Analytical Validation, and Clinical Validation [84]. This underscores the necessity of ensuring that not only is the computational model itself sound, but its outputs are also clinically meaningful and relevant to the target population [84].
This guide provides a detailed roadmap for creating the essential documentation—the Model Analysis Plan (MAP) and the subsequent V&V Report—that formally captures this evidence and provides a compelling argument for a model's fitness-for-purpose.
A clear understanding of V&V concepts is a prerequisite for creating effective documentation. The relationship between the real world, mathematical model, and computational model is foundational to V&V practices [3].
The following diagram illustrates the general workflow and key relationships in the verification and validation process, from conceptual model to validated simulation.
For biomedical applications, particularly those involving sensor data and algorithms, the V3 framework provides greater specificity. It breaks down the traditional validation step into analytical and clinical components to ensure both technical and clinical relevance [84].
The Model Analysis Plan (MAP) is a proactive, living document created before model development begins. It serves as a blueprint for the entire modeling effort, ensuring that the model is built and evaluated with its intended use and submission requirements in mind [85] [86].
A comprehensive MAP should contain the following key sections, with particular emphasis on the quantitative acceptance criteria that will guide the subsequent V&V activities.
Pre-defining quantitative acceptance criteria is the most critical step in making the V&V process objective and auditable. The following table provides examples of criteria for different types of models, inspired by standards like ASME V&V 40 [1].
Table 1: Examples of Quantitative V&V Acceptance Criteria in a MAP
| Model Component | V&V Activity | Example Acceptance Criterion | Rationale |
|---|---|---|---|
| Numerical Solution | Mesh Convergence Study | Change in Quantity of Interest (QoI) is < 2% with further mesh refinement [3]. | Ensures numerical accuracy is not dominated by discretization error. |
| Solver Settings | Iterative Convergence Study | Residual norms for system equations are reduced by a factor of 10-6. | Confirms that the numerical solution has sufficiently converged. |
| Code Verification | Comparison to Analytical Solution | Predicted stresses are within 3% of an analytical solution for a benchmark problem [3]. | Verifies the correct implementation of the mathematical model. |
| Model Validation | Comparison to Experimental Data | The model predicts strain fields within 10% of experimental measurements across 80% of the data points. | Establishes the model's ability to replicate real-world physics to an acceptable degree. |
| Biomarker Assay | Analytical Validation | The coefficient of variation (CV) for the assay is < 20% [88]. | Ensures the analytical method is sufficiently precise for its intended use. |
This section provides detailed methodologies for critical experiments cited in V&V documentation, drawing from established practices in computational mechanics and biomarker development.
Objective: To ensure that the computational solution is independent of the discretization (mesh) of the geometry [3].
Detailed Workflow:
Objective: To quantify the model's ability to replicate real-world behavior by comparing its predictions to experimental data [3] [1].
Detailed Workflow:
Objective: To establish that the analytical assay used to generate data for a biologically-based model is accurate, precise, and reproducible within the required limits [87] [84] [88].
Detailed Workflow:
The V&V Report is the culminating document that presents the evidence gathered from executing the MAP. It must tell a clear, compelling story about the model's credibility for its specific COU.
The following table provides a template for summarizing V&V results, directly linking evidence to the pre-defined criteria from the MAP. This creates a clear and auditable trail for regulatory reviewers.
Table 2: V&V Report Summary: Results vs. Acceptance Criteria
| V&V Activity | Pre-Defined Acceptance Criterion (from MAP) | Result Obtained | Pass/Fail | Evidence Location |
|---|---|---|---|---|
| Mesh Convergence (Peak Stress) | < 2% change with refinement | 1.5% change | Pass | Fig. 4.1, Section 4.2.1 |
| Code Verification (Biaxial Test) | Stress within 3% of analytical | 2.8% error | Pass | Table B.2, Appendix B |
| Validation (Strain Field) | 90% of data within 15% error | 92% of data within 15% error | Pass | Fig. 5.3, Section 5.1 |
| Validation (Natural Frequency) | First mode within 5% of experimental | 4.1% error | Pass | Table 5.4 |
| Biomarker Assay Precision (CV) | Inter-assay CV < 15% | 12% CV | Pass | Section 5.5, Table D.1 |
The following table details key resources, software, and materials essential for conducting rigorous V&V studies.
Table 3: Research Reagent Solutions for Model V&V
| Item / Solution | Function in V&V Process | Application Notes |
|---|---|---|
| ASME V&V Standards (e.g., V&V 10, V&V 40) [1] | Provides standardized terminology, procedures, and a risk-based framework for assessing model credibility. | Essential for medical device submissions. Guides the level of V&V effort based on model risk. |
| Challenge Problems [1] | Specific engineering problems with defined solutions used to assess and benchmark VVUQ methodologies. | Allows testing and comparison of different V&V approaches on a common problem. |
| Statistical Analysis Software (e.g., R, SAS, SPSS) [89] [85] | Used for the statistical comparison of model predictions to experimental data, uncertainty quantification, and power calculations. | Critical for executing the pre-specified analysis plan and deriving objective validation metrics. |
| Digital Image Correlation (DIC) Systems | An experimental method for measuring full-field surface displacements and strains. | Provides rich, spatially-resolved data that is ideal for validating computational stress/strain fields. |
| Biorepositories and Specimen Archives [87] [88] | Sources of high-quality, well-annotated clinical samples for biomarker discovery and validation. | Large sample sizes from multiple sites are often needed for adequate statistical power in clinical validation [88]. |
| Reference Materials & Phantoms | Physical objects with known properties used to calibrate equipment and validate computational models of the measurement process. | For example, tissue-mimicking phantoms for validating medical imaging or surgical simulation models. |
In computational modeling research, particularly for high-consequence applications in drug development and medical devices, documentation is not an end-point activity. The proactive creation of a detailed Model Analysis Plan and the systematic compilation of evidence in a V&V Report are integral to the scientific process. By adhering to the structured frameworks and detailed protocols outlined in this guide, researchers can build a defensible case for their model's fitness-for-purpose, thereby accelerating innovation and ensuring the safety and efficacy of the technologies that rely on computational predictions.
Verification and Validation are not mere checkboxes but are foundational to credible and impactful computational modeling in drug development. A rigorous VVUQ process, guided by standards like ASME V&V 40 and aligned with regulatory expectations, is crucial for making risk-informed decisions. The future points toward greater integration of AI to democratize MIDD, the expansion of in silico methodologies to reduce animal testing, and the increased acceptance of virtual evidence in regulatory submissions. By systematically applying the principles outlined, researchers can significantly enhance model reliability, accelerate development timelines, and ultimately deliver safer and more effective therapies to patients with greater confidence.