This article provides a comprehensive guide to Verification and Validation (V&V) standards for computational models, tailored for researchers and professionals in drug development and biomedical fields.
This article provides a comprehensive guide to Verification and Validation (V&V) standards for computational models, tailored for researchers and professionals in drug development and biomedical fields. It explores the foundational principles of model credibility, with a focus on the widely adopted ASME V&V 40 risk-based framework. The scope covers methodological applications across medical devices, drug design, and manufacturing, alongside troubleshooting common pitfalls in techniques like QSAR, molecular dynamics, and AI/ML. It further examines quantitative validation metrics, regulatory perspectives from the FDA and EMA, and comparative analysis of standards. The article synthesizes key takeaways to offer a clear path for implementing robust V&V practices, enhancing model reliability for high-stakes decision-making in research and regulatory submissions.
In computational modeling and simulation (CM&S), the adoption of Verification, Validation, and Uncertainty Quantification (VVUQ) is fundamental for establishing model credibility. As industries from medical devices to aerospace increasingly rely on computational predictions for critical decision-making, the rigorous application of VVUQ processes ensures that simulations are fit-for-purpose and yield reliable results. This framework is particularly vital in regulatory contexts and high-consequence applications where model inaccuracies could lead to significant risks [1] [2].
Model credibility is not a binary attribute but a risk-informed judgment on whether a computational model's outputs are adequate to support decision-making for a specific Context of Use (COU). The ASME V&V 40 standard provides a foundational, risk-based framework for establishing these credibility requirements, which has become a key enabler for regulatory submissions, including those to the US FDA CDRH [3]. This guide details the core principles of VVUQ, providing researchers and drug development professionals with the methodologies and standards needed to credibly employ computational models in research and development.
VVUQ encompasses three distinct but interconnected processes:
The following diagram illustrates the logical workflow and relationships between the core VVUQ activities, from problem definition to a credible model prediction.
The ASME V&V 40-2018 standard provides a structured, risk-informed framework for establishing the credibility of a computational model. The core of this methodology is a risk analysis that assesses the model's COU and the decision consequence based on the model's output. The level of credibility required, and thus the rigor of the VVUQ activities, is directly proportional to the perceived risk of an incorrect prediction [3].
The framework guides users through assessing six primary credibility factors, each with a set of credibility activities that can be tiered based on the required level of rigor [3]:
The table below summarizes the key credibility factors and example activities as guided by ASME V&V 40.
Table 1: Credibility Factors and Example Activities Based on ASME V&V 40
| Credibility Factor | Objective | Example Activities |
|---|---|---|
| Model Validation | Determine model accuracy for the COU by comparing to experimental data. | Conduct a validation experiment; Use validation metrics (e.g., area metric, waveform metrics); Assess predictive capability [2] [3]. |
| Uncertainty Quantification | Quantify the impact of input and model uncertainties on output. | Propagate input uncertainties (Monte Carlo, Taylor Series); Distinguish aleatory and epistemic uncertainty; Perform sensitivity analysis [2]. |
| Solution Verification | Estimate numerical accuracy of a specific simulation (e.g., discretization error). | Perform grid convergence studies (systematic mesh refinement); Estimate iterative errors [2] [3]. |
| Code Verification | Ensure software correctly implements the mathematical model. | Use method of manufactured solutions; Perform benchmark comparisons [2]. |
Quantitative data analysis is the backbone of VVUQ, transforming raw numerical data into meaningful insights about model performance and reliability. The process relies on two main statistical approaches [4]:
For VVUQ, specific quantitative techniques are applied at different stages:
Table 2: Key Quantitative Analysis Methods for VVUQ
| VVUQ Stage | Quantitative Method | Application in VVUQ |
|---|---|---|
| Verification | Grid Convergence Index (GCI) | Quantifies discretization error based on systematic mesh refinement [2]. |
| Validation | Validation Metrics (e.g., Area Metric) | Provides a quantitative measure of the difference between model predictions and experimental data [2]. |
| UQ | Monte Carlo Simulation | Propagates input uncertainties by repeatedly sampling from their distributions to build a distribution of the output [2]. |
| UQ | Sensitivity Analysis (e.g., Variance-Based) | Identifies which input parameters are the most significant contributors to output uncertainty [2]. |
A cornerstone of model credibility is a well-designed validation experiment. The protocol must provide high-quality, relevant data for comparing with computational results.
The design of a validation experiment must adhere to strict principles to ensure the data is suitable for assessing the model's predictive capability [2]:
The process of executing a validation activity follows a structured workflow, from planning to final assessment, as shown in the diagram below.
UQ is critical for understanding the reliability of model predictions. The process involves a systematic workflow [2]:
Table 3: Methods for Uncertainty Quantification and Sensitivity Analysis
| Method | Description | Typical Application |
|---|---|---|
| Monte Carlo Simulation | A computational technique that uses random sampling of input distributions to simulate a large number of model evaluations and build an output distribution. | Robust and widely applicable for nonlinear models and large uncertainties. Computationally expensive [2]. |
| Taylor Series Method / Variance Transmission Equation | An approximation method that uses first-order derivatives to estimate the output variance based on input variances. | Efficient for models with small uncertainties and near-linear behavior [2]. |
| Bayesian Inference | A statistical method for updating the probability estimate for a hypothesis (e.g., model parameters) as more evidence or data becomes available. | Used for model calibration to estimate parameter uncertainties based on experimental data [2]. |
| Variance-Based Sensitivity Analysis (Sobol' Indices) | A global sensitivity analysis method that quantifies how much of the output variance each input parameter (or combination of parameters) is responsible for. | Identifies the most important sources of uncertainty to prioritize reduction efforts [2]. |
Implementing VVUQ requires a combination of standards, software tools, and computational techniques. The following table details essential components of the VVUQ toolkit for researchers.
Table 4: Essential Research Tools and Resources for VVUQ
| Tool/Resource Category | Examples | Function in VVUQ |
|---|---|---|
| VVUQ Standards | ASME VVUQ 1 (Terminology), V&V 10 (Solid Mechanics), V&V 20 (CFD/Heat Transfer), V&V 40 (Medical Devices) [1] [3] | Provide standardized definitions, recommended practices, and risk-based frameworks for applying VVUQ in specific disciplines. |
| Software Tools (Statistical Analysis) | R, Python (Pandas, NumPy, SciPy), SPSS [5] [4] | Perform statistical analysis, uncertainty propagation, sensitivity analysis, and generate quantitative data visualizations. |
| Software Tools (Visualization) | Python (Matplotlib), ChartExpo, Ajelix BI [6] [4] | Create charts, graphs, and dashboards to communicate VVUQ results effectively, showing trends, comparisons, and uncertainties. |
| Computational Methods | Method of Manufactured Solutions (Code Verification), Systematic Mesh Refinement (Solution Verification) [2] [3] | Provide methodologies for verifying the numerical implementation and solution accuracy of computational models. |
| Challenge Problems | ASME VVUQ Symposium Workshop Problems [1] | Provide specific engineering problems to study, assess, and compare different VVUQ methods and approaches. |
The rigorous application of Verification, Validation, and Uncertainty Quantification is indispensable for establishing model credibility in computational research. Frameworks like ASME V&V 40 provide a structured, risk-informed approach to determine the necessary level of VVUQ effort, ensuring computational models are credible for their intended use. As computational methods continue to evolve, including the incorporation of Artificial Intelligence and Machine Learning, and as applications expand into high-stakes areas like In Silico Clinical Trials, the principles of VVUQ will remain the foundation for trustworthy and reliable simulation-based science and engineering [3] [7]. For researchers and drug development professionals, mastering these processes is no longer optional but a core competency for advancing innovation safely and effectively.
In the realm of computational modeling and simulation, the Context of Use (COU) is formally defined as a concise description of a model's specified role and scope within a development process [8] [9]. For computational models used in biomedical product development, the COU provides the critical foundation for planning and executing risk-based Verification and Validation (V&V) activities. It serves as the definitive statement that clarifies how the model will be applied to address a specific question or decision point, thereby establishing the boundaries for assessing model credibility.
The American Society of Mechanical Engineers (ASME) V&V 40 standard, recognized by the U.S. Food and Drug Administration (FDA), has established a risk-informed credibility assessment framework where the COU is central to determining the appropriate level of V&V evidence required [10] [11] [1]. This framework contends that the rigor of V&V activities should be commensurate with the risk associated with the model's application, with the COU directly informing this risk assessment [10]. Without a precisely defined COU, model developers and regulatory reviewers lack the necessary context to determine what constitutes sufficient evidence of model credibility for regulatory decision-making.
The ASME V&V 40 standard provides a structured framework for assessing the credibility of computational models, with the COU serving as its cornerstone [10] [1]. This framework outlines a systematic process for establishing model credibility that is proportional to model risk, ensuring efficient resource allocation while maintaining scientific rigor.
The risk-informed credibility assessment framework comprises several interconnected concepts that guide the V&V process:
The following diagram illustrates the logical workflow of the ASME V&V 40 risk-informed credibility assessment framework:
The COU directly influences model risk assessment, which in turn determines the necessary level of credibility evidence. Model risk is evaluated based on two key factors:
The table below illustrates how different combinations of these factors determine overall model risk and corresponding credibility requirements:
Table 1: Model Risk Assessment Matrix and Corresponding Credibility Requirements
| Decision Consequence | Low Model Influence | Medium Model Influence | High Model Influence |
|---|---|---|---|
| Low | Low RiskBasic V&V | Low-Medium RiskModerate V&V | Medium RiskSubstantial V&V |
| Medium | Low-Medium RiskModerate V&V | Medium RiskSubstantial V&V | Medium-High RiskRigorous V&V |
| High | Medium RiskSubstantial V&V | Medium-High RiskRigorous V&V | High RiskComprehensive V&V |
A well-defined COU is essential for establishing a model's purpose, boundaries, and applicability. The COU should be articulated with sufficient detail to guide all subsequent V&V activities and credibility assessments.
A comprehensive COU statement should explicitly address several key elements:
The following protocol provides detailed methodology for defining a comprehensive COU:
Identify the Regulatory or Development Question
Delineate Model Scope and Boundaries
Characterize Technical Specifications
Document Implementation Context
Review and Refine COU Statement
Once the COU is clearly defined, it drives the planning and execution of risk-based V&V activities. The level of rigor applied to each credibility factor should be commensurate with the model risk determined through the COU.
The ASME V&V 40 standard identifies 13 credibility factors across verification, validation, and applicability activities that contribute to establishing overall model credibility [9]. The table below details these factors and provides examples of corresponding activities:
Table 2: Credibility Factors and Associated V&V Activities
| Activity Category | Credibility Factor | Example V&V Activities |
|---|---|---|
| Verification | Software Quality Assurance | Code validation; version control; bug tracking |
| Numerical Code Verification | Comparison to analytical solutions; order-of-accuracy testing | |
| Discretization Error | Grid convergence studies; mesh refinement analysis | |
| Numerical Solver Error | Iterative convergence analysis; solver parameter studies | |
| Use Error | User training; interface design evaluation; workflow documentation | |
| Validation | Model Form | Evaluation of mathematical foundations; comparison to alternative model structures |
| Model Inputs | Sensitivity analysis; uncertainty quantification of input parameters | |
| Test Samples | Representative sampling; appropriate sample size determination | |
| Test Conditions | Coverage of relevant operational conditions; boundary condition testing | |
| Equivalency of Input Parameters | Assessment of parameter consistency between validation and application contexts | |
| Output Comparison | Quantitative comparison to experimental data; acceptance criterion testing | |
| Applicability | Relevance of Quantities of Interest | Assessment of output relevance to COU; uncertainty analysis for predicted quantities |
| Relevance of Validation Activities to COU | Evaluation of how well validation conditions represent the actual COU |
The following protocol guides the implementation of risk-based V&V activities according to model risk:
Map Credibility Factors to COU Requirements
Establish Acceptance Criteria
Execute Verification Activities
Execute Validation Activities
Assess Applicability
A physiologically-based pharmacokinetic (PBPK) model for a small molecule drug eliminated primarily by cytochrome P450 3A4 demonstrates how the same model can have multiple COUs with different risk profiles [9]:
COU 1: Predict effects of weak and moderate CYP3A4 inhibitors and inducers on drug pharmacokinetics in adult patients
COU 2: Predict pharmacokinetic profiles in children and adolescent patients
A computational fluid dynamics (CFD) model evaluating hemolysis levels in a centrifugal blood pump illustrates how the same model applied to different device classifications carries different risk levels [10]:
COU 1: Cardiopulmonary Bypass (CPB)
COU 2: Short-Term Ventricular Assist Device (VAD)
Table 3: Essential Research Reagents and Resources for Computational Model V&V
| Resource Category | Specific Solution | Function in V&V Process |
|---|---|---|
| Software Tools | Commercial CFD/FEA Solvers (e.g., ANSYS) | Numerical simulation of physical phenomena [10] |
| PBPK Modeling Platforms (e.g., GastroPlus, Simcyp) | Prediction of pharmacokinetic behavior [9] | |
| Statistical Analysis Software (e.g., R, SAS) | Quantitative comparison of model outputs to experimental data | |
| Experimental Comparators | In Vitro Test Systems | Provide validation data under controlled conditions [10] |
| Particle Image Velocimetry | Experimental flow field measurement for CFD validation [10] | |
| Clinical Data Sources | Gold-standard comparator for models predicting clinical outcomes [9] | |
| Documentation Frameworks | ASME V&V 40 Standard | Risk-informed framework for establishing model credibility [10] [1] |
| FDA Guidance Documents | Regulatory expectations for model submission and evaluation [12] | |
| Credibility Assessment Plan | Documented strategy for model V&V specific to COU [12] | |
| Quality Management | Version Control Systems | Track model changes and ensure reproducibility |
| Uncertainty Quantification Tools | Quantify and communicate limitations in model predictions [1] |
The Context of Use is not merely an administrative requirement but a fundamental scientific concept that enables efficient, risk-informed V&V planning for computational models. By precisely defining the COU, model developers can establish a clear roadmap for credibility activities that addresses regulatory expectations while optimizing resource allocation. The ASME V&V 40 framework provides a standardized methodology for implementing this risk-based approach across diverse modeling applications, from medical devices to pharmaceutical development.
As computational models assume increasingly prominent roles in regulatory decision-making, the disciplined application of COU-driven V&V planning will be essential for demonstrating model credibility and ensuring the safety and efficacy of biomedical products. The continued evolution of standards and best practices in this area will further strengthen the scientific rigor of computational modeling in regulatory science.
The use of computational modeling and simulation (M&S) has become increasingly critical in both medical device and pharmaceutical development, enabling faster innovation and more robust evaluation of product safety and efficacy. However, the regulatory frameworks governing these computational approaches have evolved along distinct pathways for devices versus pharmaceuticals, creating a complex landscape for researchers and developers. The core regulatory challenge lies in establishing and demonstrating the credibility of computational models for specific decision-making contexts, a requirement now recognized by major regulatory bodies worldwide.
For medical devices, the American Society of Mechanical Engineers (ASME) V&V 40 standard provides the foundational risk-based framework for establishing model credibility. This FDA-recognized standard specifically addresses verification and validation (V&V) activities needed to build confidence in computational models used for medical device evaluation [11]. In parallel, the pharmaceutical domain has developed the Model-Informed Drug Development (MIDD) framework, which utilizes quantitative modeling to integrate nonclinical and clinical data. The International Council for Harmonisation (ICH) M15 guideline, released for public consultation in late 2024, now provides harmonized principles for MIDD applications across regulatory jurisdictions [13] [14].
The European Medicines Agency (EMA) complements this landscape with its own guidance on modeling and simulation, particularly emphasizing mechanistic models and pediatric applications [15] [16]. Together, these frameworks represent a transformative shift in regulatory science, moving toward standardized approaches for evaluating computational models across the product development lifecycle.
The ASME V&V 40-2018 standard, titled "Assessing Credibility of Computational Modeling through Verification and Validation: Application to Medical Devices," provides a structured framework for establishing model credibility based on risk analysis [11] [14]. This standard has been formally recognized by the U.S. Food and Drug Administration (FDA) and is widely implemented across the medical device industry.
The fundamental principle of V&V 40 is that the extent of validation evidence required for a computational model should be commensurate with the risk associated with the decision the model informs. The standard introduces several key conceptual innovations, including:
The practical application of ASME V&V 40 is exemplified in cardiac device development. Dr. Tinen Iles demonstrated how the standard applies to finite element analysis (FEA) models of transcatheter aortic valves used for structural component stress/strain analysis in accordance with ISO5840-1:2021 requirements [11]. In this context, the model credibility assessment directly supports design verification activities under 21 CFR 820.30(f). Similarly, Dr. Snehal Shetye from the FDA highlighted the importance of V&V 40 in evaluating lumbar interbody fusion devices, where subtle variations in contact friction and stiffness parameters significantly impact both global and local mechanical response predictions [11].
Table: Core Components of ASME V&V 40 Standard
| Component | Description | Application Example |
|---|---|---|
| Context of Use Definition | Precise statement of model purpose, boundaries, and decision role | FEA model for heart valve fatigue analysis |
| Risk Assessment | Evaluation of decision consequence and model influence | Determining validation rigor for implant stress predictions |
| Verification | Ensuring computational model is solved correctly | Code verification, calculation checks |
| Validation | Ensuring model accurately represents reality | Bench test comparison, clinical data correlation |
| Uncertainty Quantification | Characterizing statistical and modeling uncertainties | Sensitivity analysis, probabilistic methods |
The FDA's approach to computational model credibility spans both device and pharmaceutical domains, with evolving frameworks that increasingly emphasize harmonization. For medical devices, the FDA has formally recognized ASME V&V 40 as a consensus standard and has published complementary guidelines that adopt its risk-based credibility assessment framework [14].
In the pharmaceutical sector, the FDA's Division of Pharmacometrics (DPM), established within the Office of Clinical Pharmacology, has pioneered the use of quantitative modeling approaches since the early 2000s [17]. The Division's 2020 strategic plan resulted in significant achievements, including the training of 91 pharmacometricians and the development of 14 disease-specific models to support trial design and regulatory decision-making [17]. These disease models span conditions from non-small cell lung cancer to rheumatoid arthritis and have been instrumental in supporting endpoint selection, patient enrichment strategies, and pediatric extrapolation.
The FDA's most recent contribution to harmonization is the December 2024 draft guidance on ICH M15 General Principles for Model-Informed Drug Development [13]. This document represents a multinational consensus on MIDD approaches and provides recommendations on planning, model evaluation, and evidence documentation. The ICH M15 framework explicitly adapts credibility assessment concepts from ASME V&V 40 to pharmaceutical applications, creating a bridge between the device and drug regulatory paradigms [14].
A critical innovation in the FDA's approach is the Model Analysis Plan (MAP), a comprehensive document that pre-specifies modeling objectives, data sources, methodologies, and evaluation criteria [14]. This structured documentation approach ensures transparency and reproducibility in regulatory submissions.
The European Medicines Agency has developed a complementary but distinct regulatory framework for computational models, with particular emphasis on mechanistic models and pediatric applications. The EMA's 2025 concept paper on mechanistic models represents a significant milestone in regulatory science, providing guidance for assessing and reporting sophisticated modeling approaches such as Physiologically Based Pharmacokinetic (PBPK), Quantitative Systems Pharmacology (QSP), and multi-scale mechanistic models [16].
A distinctive feature of EMA's guidance is its detailed treatment of pediatric drug development, where practical and ethical limitations make computational approaches particularly valuable [15]. The EMA recommends specific methodologies for accounting for ontogeny and maturation effects, including:
The EMA also emphasizes the importance of visualization and documentation standards for regulatory evaluation. The agency recommends specific graphical representations showing exposure metrics versus body weight and age on continuous scales, with separate visualizations for children 0-1 years of age [15]. These visualization requirements facilitate transparent assessment of proposed dosing regimens across pediatric subpopulations.
Table: Comparative Analysis of Regulatory Frameworks for Computational Models
| Aspect | ASME V&V 40 | FDA ICH M15 | EMA Guidelines |
|---|---|---|---|
| Primary Scope | Medical Devices | Pharmaceuticals | Pharmaceuticals |
| Core Principle | Risk-based credibility | Model-informed development | Mechanistic model credibility |
| Key Innovation | Context of Use definition | Model Analysis Plan (MAP) | Pediatric ontogeny integration |
| Validation Approach | Credibility evidence matrix | Multidisciplinary assessment | Uncertainty quantification |
| Documentation | V&V protocol and report | MAP and summary documents | Model justification and reporting |
| Regulatory Status | FDA-recognized standard | Draft ICH guideline (2024) | Concept paper (2025) |
The following workflow diagram illustrates the integrated process for establishing model credibility across regulatory frameworks:
Model Credibility Assessment Workflow
This integrated workflow begins with precisely defining the Context of Use, which establishes the model's purpose, boundaries, and decision-making role across all regulatory frameworks [11] [14]. The subsequent risk assessment evaluates the consequences of an incorrect model output and the model's influence on the decision, determining the required level of validation evidence [11]. The planning phase then specifies the detailed verification, validation, and uncertainty quantification activities needed to meet credibility goals.
Implementing the credibility assessment framework requires rigorous experimental protocols tailored to specific model types and contexts of use. Based on case studies from the regulatory standards, the following protocols provide detailed methodologies for key validation scenarios:
Protocol 1: Medical Device Component Stress Analysis This protocol aligns with ASME V&V 40 requirements for finite element analysis of implantable device components, as demonstrated in transcatheter aortic valve development [11].
Model Verification
Experimental Validation
Model-Experiment Comparison
Protocol 2: Pharmacokinetic Model Pediatric Extrapolation This protocol follows EMA and ICH M15 requirements for leveraging adult data to predict pediatric exposures, particularly crucial for orphan diseases [15] [14].
Base Model Development
Pediatric Extrapolation
Dosing Regimen Optimization
Successful implementation of regulatory-compliant computational models requires carefully selected tools and methodologies. The following table catalogues essential research reagents and their functions in model development and validation:
Table: Essential Research Reagents for Computational Model Development
| Reagent/Material | Function | Regulatory Application |
|---|---|---|
| ASTM F2996 Standard | Guidance for finite element analysis of medical devices | ASME V&V 40 compliance for implant models |
| Physiological Bench Test Apparatus | Experimental validation under simulated physiological conditions | Device model validation per FDA recognized standards |
| Virtual Population Software | Generation of anthropometrically and physiologically diverse virtual subjects | Pediatric extrapolation for EMA submissions |
| PBPK Modeling Platform | Mechanistic prediction of absorption, distribution, metabolism, excretion | ICH M15 compliance for drug-drug interaction assessment |
| Uncertainty Quantification Toolkit | Propagation of parameter and model form uncertainties | Risk-informed credibility assessment across frameworks |
| Model Documentation Framework | Standardized reporting of assumptions, methods, and results | Regulatory submission preparation for all agencies |
The regulatory landscape for computational model verification and validation is rapidly converging toward harmonized principles centered on risk-informed credibility assessment. The ASME V&V 40 standard provides the foundational framework for medical devices, while the ICH M15 guideline extends similar principles to pharmaceutical development, creating unexpected alignment between previously separate regulatory pathways [11] [13] [14].
The most significant evolution in this landscape is the transition from validation as a checklist activity to a comprehensive credibility assessment that considers the decision context, risk profile, and available evidence [11] [14]. This evolution enables more efficient regulatory evaluation while maintaining rigorous standards for patient safety.
Future developments will likely focus on several key areas. First, the integration of artificial intelligence and machine learning components into computational models presents new challenges for verification and validation [14]. Second, regulatory agencies are increasingly emphasizing real-world evidence integration with computational models to enhance their predictive capability [14]. Finally, global harmonization initiatives will continue to align assessment criteria across FDA, EMA, and other international regulators, reducing the burden on developers seeking simultaneous market approval in multiple regions [16].
For researchers and drug development professionals, successfully navigating this landscape requires proactive engagement with regulatory agencies through pre-submission meetings and early dialogue about modeling strategies. By adopting the structured approaches outlined in ASME V&V 40, ICH M15, and EMA guidelines, developers can build robust evidence of model credibility that supports efficient regulatory review and, ultimately, accelerates patient access to innovative therapies.
The integration of computational modeling and simulation, particularly through advanced frameworks like Digital Twins, is poised to revolutionize precision medicine. These technologies promise to enable highly personalized treatment strategies by simulating patient-specific health trajectories and interventions. However, their potential cannot be realized without establishing rigorous, standardized Verification, Validation, and Uncertainty Quantification (VVUQ) processes. The current lack of specific guidance creates a critical gap, hindering the reliable adoption of in-silico methods in drug development and regulatory decision-making. This whitepaper examines the urgent need for definitive V&V standards to ensure the safety, efficacy, and credibility of computational models, thereby accelerating the delivery of innovative therapies to patients.
The context for drug development is shifting from a one-size-fits-all model toward precision medicine, which tailors health delivery to an individual's unique physiological and disease characteristics [18]. This paradigm shift is being accelerated by several key drivers:
Despite these advancements, a significant gap remains. While frameworks like the ASME V&V 40 standard provide a risk-based approach for establishing model credibility in medical devices [3] [22], similarly mature and specific guidance for the unique challenges of drug development and precision medicine is still emerging [18]. This lack of a standardized framework is the primary unmet need that must be addressed.
Verification, Validation, and Uncertainty Quantification form the foundational trilogy for establishing confidence in computational models. Their definitions, while sometimes varying across disciplines, are critical for scientific rigor.
Table 1: Core Definitions in VVUQ for Computational Science
| Term | Definition | Core Question |
|---|---|---|
| Verification | The process of ensuring that the computational model is solved correctly. It assesses software correctness and numerical accuracy. [23] | "Are we solving the equations right?" |
| Code Verification | Assessing the reliability of the software coding and the numerical algorithms. [23] | "Is the software implemented correctly?" |
| Solution Verification | Assessing the numerical accuracy of the solution to a computational model (e.g., estimating discretization errors). [18] [23] | "What is the numerical error of this specific solution?" |
| Validation | The process of assessing a model's physical accuracy by comparing computational results with experimental data. [24] [23] | "Are we solving the right equations?" |
| Uncertainty Quantification (UQ) | The formal process of quantifying uncertainties in model inputs, parameters, and predictions. [18] | "What is the confidence bound on the prediction?" |
The ASME V&V 40 standard, though developed for medical devices, offers a valuable risk-informed framework that can be adapted for drug development. It guides the level of V&V effort based on the Model Risk—the potential impact of an erroneous model result on the decision to be made. This risk is a function of the Context of Use (COU) and the Impact of the Model Result on the decision [3]. This risk-based tiering is essential for efficiently allocating resources.
A critical methodology in validation is the use of quantitative validation metrics. These provide an objective, standardized measure of similarity between the computational model's output and experimental data, moving beyond subjective "face validity" checks [24]. The development of these metrics is an active area of research, as they are essential for both validating the virtual component of a Digital Twin and for supporting automated decision-making within a Digital Twin framework [24].
<br />
Diagram Title: VVUQ Process for Model Credibility
`
Translating V&V principles into practice requires structured protocols and tools. The following experimental workflow and "toolkit" outline a systematic approach.
<br />
Diagram Title: Validation Benchmarking Workflow
`
Table 2: The Scientist's Toolkit: Key Research Reagent Solutions for V&V
| Category | Item | Function in V&V |
|---|---|---|
| Computational Tools | Software Quality Engineering (SQE) Tools | Automated testing suites for code verification, ensuring software performs as expected. [18] |
| Grid Convergence Tools | Tools for systematic mesh refinement to perform solution verification and quantify numerical accuracy. [3] [23] | |
| Uncertainty Quantification (UQ) Software | Libraries (e.g., for Bayesian calibration, sensitivity analysis) to quantify input and predictive uncertainties. [18] | |
| Experimental Benchmarks | Perturbed Parameter Ensembles (PPEs) | A suite of model runs with varying parameter values to expose systematic errors and assess model sensitivity. [25] |
| Validation Databases & Benchmarks | Curated, high-quality experimental datasets with documented uncertainties for quantitative model validation. [23] | |
| Manufactured Analytical Solutions | Exact solutions to simplified versions of the governing equations, used for rigorous code verification. [23] | |
| Methodological Frameworks | Risk-Based Credibility Framework (ASME V&V 40) | A structured framework to determine the necessary level of V&V effort based on the model's decision-making impact. [3] [22] |
| Digital Validation Management Systems (DVMS) | Paperless systems (e.g., ValGenesis, Kneat Gx) to automate and manage validation documentation and workflows. [21] |
A specific example of an advanced V&V protocol is the use of Perturbed Parameter Ensembles (PPEs). In this methodology, dozens of model parameters are systematically varied across a defined range to create an ensemble of hundreds of model variants [25]. This approach is highly effective for:
To close the current guidance gap, a concerted effort from researchers, industry, and regulators is required. Key priorities include:
The adoption of computational modeling and Digital Twins represents a frontier for innovation in drug development. However, this potential is tethered to our ability to demonstrate that these complex models are credible, reliable, and fit-for-purpose. The current lack of specific V&V guidance is not merely an academic concern; it is a tangible barrier to translating cutting-edge science into safe and effective patient therapies. By championing the development and adoption of rigorous, standardized, and risk-informed VVUQ processes, the research community can provide the evidence base needed to build trust with regulators, clinicians, and patients. Addressing this unmet need is not just crucial—it is imperative for the future of precision medicine.
In the rapidly evolving field of computational modeling and simulation, the alignment of stakeholders through common standards has emerged as a critical enabler of technological progress and regulatory acceptance. The development and implementation of standards for verification, validation, and uncertainty quantification (VVUQ) create a essential framework that bridges the methodological gaps between regulators, academia, and industry. This alignment is particularly crucial in safety-critical sectors such as medical device development and pharmaceutical research, where computational models increasingly inform high-stakes decisions without traditional physical validation. The ASME V&V 40 standard, specifically developed for assessing credibility of computational models in medical devices, represents a paradigm of such stakeholder-driven standardization efforts [3]. This standard provides a risk-based framework that has become a key enabler for the US Food and Drug Administration's Center for Devices and Radiological Health (FDA CDRH) framework for evaluating computational modeling and simulation data in regulatory submissions [3]. Without such common frameworks, the credibility of computational models remains fragmented, impeding innovation and potentially compromising patient safety through inconsistent evaluation methodologies.
The current standards ecosystem for computational modeling encompasses a diverse portfolio of guidelines and technical reports addressing different aspects of verification, validation, and uncertainty quantification. These standards provide the foundational language and methodological approaches that enable consistent implementation across organizations and sectors.
Table: Key ASME VVUQ Standards and Their Applications
| Standard | Title | Focus Area | Status |
|---|---|---|---|
| VVUQ 1-2022 | Verification, Validation, and Uncertainty Quantification Terminology | Standardized terminology | Published |
| V&V 10-2019 | Standard for Verification and Validation in Computational Solid Mechanics | Solid mechanics | Published |
| V&V 20-2009 | Standard for Verification and Validation in Computational Fluid Dynamics and Heat Transfer | Fluid dynamics, heat transfer | Published |
| V&V 40-2018 | Assessing Credibility of Computational Modeling through Verification and Validation: Application to Medical Devices | Medical devices | Published |
| VVUQ 40.1-20XX | Tibial Tray Component Worst-Case Size Identification for Fatigue Testing | Medical device example | Coming Soon |
| VVUQ 50.1-20XX | Guide to a Model Life Cycle Approach that Incorporates VVUQ | Model life cycle | Coming Soon |
The ASME V&V 40 standard, initially published in 2018, provides a risk-based framework for establishing credibility requirements of computational models [3]. This standard has been particularly influential in medical device regulation, serving as the foundation for FDA CDRH's evaluation framework for computational modeling and simulation data in regulatory submissions. The standard's risk-based approach allows for scalable implementation, where the level of VVUQ rigor is proportionate to the model's intended use and the associated decision consequences [3].
Beyond the core standards, supplementary technical reports provide practical implementation guidance. The upcoming VVUQ 40.1 technical report offers an end-to-end example applying the ASME V&V 40-2018 standard to a computational model assessing the durability of a fictional tibial tray [3]. This report demonstrates the planning and execution of validation and verification activities for each credibility factor, including discussions on additional work that could be performed if greater credibility were required.
The standards landscape continues to evolve to address emerging computational methodologies and applications. Ongoing standardization efforts include:
Patient-Specific Computational Models: The ASME VVUQ 40 Sub-Committee is developing a new technical report applying the ASME V&V 40 standard to patient-specific applications, specifically femur-fracture prediction [3]. This effort includes developing a classification framework for comparators used to assess the credibility of patient-specific computational models, highlighting the strengths and weaknesses of each comparator type.
In Silico Clinical Trials (ISCT): Standards are being adapted for the emerging field of in silico clinical trials, where simulated patients augment or replace results from human patients [3]. These applications present unique credibility challenges, particularly regarding validation against human data, which is rarely possible for practical reasons.
Artificial Intelligence and Machine Learning: As computational modeling increasingly incorporates AI and machine learning components, standardization efforts are expanding to address the unique verification and validation challenges these technologies present [26].
Regulatory agencies approach computational model credibility from a risk-management perspective, requiring sufficient evidence that model predictions are trustworthy for specific decision contexts. The FDA CDRH has formally incorporated the ASME V&V 40 standard into its evaluation framework for medical device submissions, creating a clear pathway for industry implementation [3]. This regulatory adoption provides a compelling case study in how standards can bridge the gap between innovation and public safety.
Regulators particularly value standards that provide:
For in silico clinical trials, regulators face the particular challenge of validating models against human data when such direct validation is rarely possible [3]. This necessitates specialized approaches to model credibility that may differ from traditional physical testing paradigms.
Industry stakeholders implement VVUQ standards to streamline product development, reduce physical testing requirements, and strengthen regulatory submissions. The medical device industry has been particularly active in adopting these standards, with companies like Medtronic, Boston Scientific, and W. L. Gore & Associates actively contributing to standards development and implementation [3].
Industry implementation highlights include:
Medical Device Durability Assessment: The upcoming VVUQ 40.1 technical report provides a practical example of how the ASME V&V 40 standard can be applied to identify worst-case sizes for fatigue testing of tibial tray components [3]. This example demonstrates how standards enable efficient, targeted physical testing informed by computational models.
Multicore System Verification: In safety-critical applications such as aerospace systems, industry faces new verification challenges with the transition to multicore architectures [27]. Standards and best practices are emerging to address these challenges, including formal specifications of processor memory models and methodologies for bounding multicore interference [27].
Systematic Mesh Refinement: Industry practitioners emphasize the importance of systematic mesh refinement for code and calculation verification, particularly highlighting how misleading results can arise when systematic approaches are not applied [3].
Academic institutions contribute to the standards ecosystem through fundamental research, methodological development, and educational initiatives. Researchers are extending VVUQ methodologies to address emerging challenges such as:
Patient-Specific Modeling: Academic researchers are developing classification frameworks for comparators used in validating patient-specific computational models [3]. These frameworks define, classify, and compare different types of comparators, providing rationale for selecting appropriate validation approaches based on model context and application.
Verification and Validation of Modeling Methods: Academic research is clarifying the distinctions between verification, validation, and evaluation (VVE) of modeling methods themselves [28]. This work adapts software engineering principles to modeling methods, asking "Am I building the method right?" (verification), "Am I building the right method?" (validation), and "Is my method worthwhile?" (evaluation) [28].
Formal Method Specifications: Academic-industry collaborations are advancing formal specifications of complex systems, such as the formal definition of Arm's architecture specification language and concurrency model [27]. These efforts improve the ability to verify programs running on complex modern processors.
The ASME V&V 40 standard provides a systematic, risk-based methodology for establishing model credibility requirements. The experimental protocol for implementing this framework involves sequential phases:
Phase 1: Context of Use Definition
Phase 2: Model Risk Assessment
Phase 3: Credibility Factor Identification
Phase 4: Credibility Plan Development
This protocol enables efficient allocation of VVUQ resources by focusing efforts on the areas of highest risk and impact, avoiding both insufficient rigor for high-risk applications and excessive verification for low-risk applications [3].
For computational models using discretization methods such as finite element analysis, systematic mesh refinement represents a critical verification methodology. The experimental protocol for implementation involves:
Step 1: Initial Mesh Generation
Step 2: Systematic Refinement
Step 3: Solution Calculation
Step 4: Discretization Error Estimation
Step 5: Uncertainty Quantification
This methodology is particularly critical for avoiding misleading results in complex simulations, as demonstrated in applications such as blood hemolysis modeling where nonsystematic approaches can produce erroneous conclusions [3].
Table: Essential VVUQ Standards and Implementation Resources
| Resource | Type | Function | Access |
|---|---|---|---|
| ASME V&V 40-2018 | Standard | Provides risk-based framework for establishing credibility of computational models | ASME Standards |
| VVUQ 40.1 (Upcoming) | Technical Report | End-to-end example applying V&V 40 to medical device fatigue testing | ASME Publications |
| VVUQ 1-2022 | Terminology Standard | Standardized terminology for VVUQ activities | ASME Standards |
| SISAQOL-IMI Guidelines | Consensus Guidelines | Standardized PRO assessment in cancer clinical trials | The Lancet Oncology |
| Method VVE Framework | Methodological Framework | Verification, Validation, Evaluation for modeling methods | Springer Publications |
Systematic Mesh Refinement Tools: Software capabilities for controlled mesh refinement maintaining element quality and geometric fidelity, particularly important for unstructured meshes with nonuniform element sizes [3].
Formal Concurrency Modeling Tools: Specialized tools such as "herd7" for interpreting formal concurrency models and "litmus" for running concurrency tests, essential for verifying software on multicore architectures [27].
Uncertainty Quantification Frameworks: Methodologies for quantifying both aleatory and epistemic uncertainties and their propagation through computational models.
Multicore Interference Analysis Tools: Tooling solutions that combine targeted interference generators and measurement capabilities to analyze interference channels in multicore hardware platforms [27].
Reference Interpreters: Formally defined reference interpreters for architecture specification languages, such as those developed for Arm's Architecture Specification Language (ASL) [27].
The alignment of regulators, academia, and industry through common standards represents a critical enabling factor for the advancement and adoption of computational modeling in high-stakes applications. The ASME V&V 40 standard and its expanding ecosystem of technical reports and implementation guides demonstrate how risk-based, practical frameworks can bridge stakeholder perspectives while maintaining scientific rigor. As computational methods continue to evolve—embracing artificial intelligence, digital twins, and in silico clinical trials—the continued collaboration of stakeholders in standards development will be essential for ensuring both innovation and public safety. The upcoming ASME VVUQ Symposium in 2026 provides a forum for this ongoing collaboration, addressing emerging topics including AI/ML models, digital twins, and advanced manufacturing [26]. Through continued commitment to common standards, the computational modeling community can ensure that increasingly sophisticated models deliver trustworthy results that benefit researchers, regulators, and ultimately, the public they serve.
The ASME V&V 40-2018 standard, titled "Assessing Credibility of Computational Modeling through Verification and Validation: Application to Medical Devices," provides a structured, risk-informed framework for establishing the trustworthiness of computational models used in medical device development and regulatory evaluation [22] [30]. Developed through collaboration between the U.S. Food and Drug Administration (FDA), medical device companies, and software providers, this standard addresses a critical industry need for consensus on the evidentiary requirements for model validation [10] [31]. Unlike traditional V&V methodologies that prescribe specific technical procedures, V&V 40 introduces a risk-based approach that determines "how much" verification and validation evidence is sufficient based on the model's intended role and the potential consequences of an incorrect decision [10] [30].
The core tenet of the V&V 40 framework is that credibility requirements should be commensurate with model risk [10]. This principle acknowledges that different applications demand different levels of evidence, allowing organizations to allocate resources efficiently while ensuring patient safety. The standard has gained significant recognition since its publication, including FDA recognition, making it a critical tool for manufacturers seeking regulatory approval for devices developed or evaluated using computational modeling [11] [10]. The framework is flexible enough to accommodate various computational disciplines—including computational fluid dynamics, solid mechanics, heat transfer, and electromagnetism—across the total product life cycle [10].
Credibility: "The trust, obtained through the collection of evidence, in the predictive capability of a computational model for a context of use (COU)" [10]. This trust is established through structured V&V activities rather than assumed.
Context of Use (COU): A detailed statement that defines the specific role and scope of the computational model in addressing a question of interest [10]. The COU precisely specifies what the model will predict, under what conditions, and how the results will inform decision-making.
Verification: The process of determining that a computational model accurately represents the underlying mathematical model and its solution [1] [30]. It answers the question: "Are we solving the equations correctly?" Verification encompasses code verification (checking for programming errors) and calculation verification (estimating numerical errors).
Validation: The process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended applications [1] [30]. It answers the question: "Are we solving the correct equations?" Validation involves comparing computational results with experimental data.
Uncertainty Quantification (UQ): The process of characterizing and assessing uncertainties in modeling and simulation [1]. This includes identifying uncertainties in input parameters, numerical approximations, and physical experiments used for validation.
The V&V 40 standard identifies 13 key factors that contribute to establishing model credibility, categorized under verification, validation, and applicability [30]. These factors provide a systematic way to plan and evaluate V&V activities:
Verification Factors:
Validation Factors:
Applicability Factors:
The first step involves precisely defining the fundamental question that the computational model will help address. This question typically relates to device safety, performance, or effectiveness. For example: "Are the flow-induced hemolysis levels of the centrifugal pump acceptable for the intended use?" [10]. The question should be specific, measurable, and directly relevant to the device's regulatory evaluation or design verification.
The COU provides the critical foundation for all subsequent credibility assessment activities. A well-defined COU includes:
Table: Examples of Context of Use Statements for Different Applications
| Device Type | Sample Context of Use |
|---|---|
| Centrifugal Blood Pump (CPB) | "Use CFD to predict hemolysis index at the nominal operating condition (5 L/min, 3000 RPM) to complement in vitro hemolysis testing for cardiopulmonary bypass applications." [10] |
| Centrifugal Blood Pump (VAD) | "Use CFD to predict hemolysis index across the operating range (2.5-6 L/min, 2500-3500 RPM) to support device safety assessment for short-term ventricular assist device applications." [10] |
| Tibial Tray Component | "Use finite element analysis to identify worst-case size for fatigue testing of a tibial tray component." [3] |
| Hip Fracture Risk Prediction | "Use the Bologna Biomechanical Computed Tomography (BBCT) solution to predict the absolute risk of fracture at the femur for a subject." [32] |
The V&V 40 framework introduces a two-dimensional risk assessment approach that evaluates both the influence of the model on decision-making and the consequence of an incorrect decision [10].
Model Risk = f(Model Influence, Decision Consequence)
The following diagram illustrates the key relationships and workflow for establishing model risk:
Model Influence categories:
Decision Consequence categories:
Table: Model Risk Assessment Matrix
| Decision Consequence | Low Model Influence | Medium Model Influence | High Model Influence |
|---|---|---|---|
| Low | Low Risk | Low Risk | Medium Risk |
| Medium | Low Risk | Medium Risk | High Risk |
| High | Medium Risk | High Risk | High Risk |
Based on the model risk assessment, establish specific credibility goals for each relevant credibility factor [10]. The V&V 40 standard does not prescribe fixed thresholds but instead provides guidance for determining appropriate levels of evidence based on risk. For each credibility factor, the goals should specify:
For example, a high-risk model might require comprehensive grid convergence studies for calculation verification, while a low-risk model might only require a single grid study with estimated discretization errors.
Execute the planned verification, validation, and uncertainty quantification activities according to the established credibility goals. This typically includes:
Verification Activities:
Validation Activities:
The final step involves assessing whether the collected evidence meets the predefined credibility goals and documenting the entire process. This assessment should be performed by a multidisciplinary team with expertise in computational modeling, experimental methods, and the specific device application [10]. Documentation should be comprehensive and include:
Dr. Tinen Iles and colleagues demonstrated the application of the V&V 40 framework to computational modeling of a transcatheter aortic valve (TAV) for finite element analysis of structural component stress/strain for metal fatigue analysis [11]. The implementation followed these key protocols:
Context of Use: "Utilize FEA model for structural component stress/strain (metal fatigue) analysis, in accordance with practices outlined in ISO5840-1:2021, as part of Design Verification activities." [11]
Model Risk Assessment: The risk was assessed as medium-high due to the critical nature of heart valve safety and the substantial influence of the model on fatigue life predictions.
Verification Protocol:
Validation Protocol:
Uncertainty Quantification: Considered uncertainties in material properties, boundary conditions, and manufacturing variations.
A detailed example application to hemolysis prediction in a centrifugal blood pump demonstrates how the same computational model requires different credibility evidence for different contexts of use [10] [31].
Table: Credibility Requirements for Different COUs for a Blood Pump
| Credibility Factor | Cardiopulmonary Bypass (Lower Risk) | Ventricular Assist Device (Higher Risk) |
|---|---|---|
| Calculation Verification | Single mesh resolution with error estimation | Comprehensive grid convergence study |
| Validation Experiments | Comparison with particle image velocimetry (PIV) at nominal condition | PIV across operating range plus in vitro hemolysis testing |
| Validation Metrics | Qualitative comparison of flow fields | Quantitative metrics for velocity, shear stress, and hemolysis index |
| Uncertainty Quantification | Parameter uncertainty analysis | Comprehensive UQ including model form uncertainty |
| Applicability Domain | Analysis at nominal operating point | Analysis across full operating range (2.5-6 L/min, 2500-3500 RPM) |
The experimental protocol for this application included:
Computational Methods:
Experimental Comparators:
Table: Key Computational and Experimental Resources for V&V 40 Implementation
| Tool/Resource | Function in V&V Process | Application Examples |
|---|---|---|
| Commercial CFD/FEA Software (ANSYS, etc.) | Provides verified computational physics solvers with built-in verification tools | Blood flow simulation, structural analysis, heat transfer [10] |
| Grid Convergence Tools | Enables systematic mesh refinement for calculation verification | Estimation of discretization errors in fluid and solid mechanics [3] |
| Particle Image Velocimetry | Provides non-invasive flow field measurements for validation | Velocity field comparison in blood pumps, valve models [10] |
| Digital Image Correlation | Enables full-field strain measurement for structural validation | Strain validation in orthopaedic implants, structural components [11] |
| Uncertainty Quantification Software | Facilitates probabilistic analysis and uncertainty propagation | Quantification of input and model form uncertainties [1] |
| Biomechanical Testing Systems | Generates validation data under controlled loading conditions | Material property characterization, device performance testing [11] |
The ASME V&V 40 standard continues to evolve with several important developments:
The ASME VVUQ 40 subcommittee is developing additional technical reports to provide detailed implementation guidance:
VVUQ 40.1: A technical report providing a comprehensive example of applying the V&V 40 standard to a tibial tray component for worst-case size identification in fatigue testing [3] [1]. This report demonstrates how to select appropriate V&V activities along a continuum rather than simply adopting predefined gradations.
Patient-Specific Modeling: A new technical report is in development focusing on credibility assessment for patient-specific computational models, using femur fracture prediction as an example application [3] [32]. This initiative includes developing a classification framework for comparators used to assess patient-specific model credibility.
In Silico Clinical Trials: The medical device industry is increasingly exploring the use of "In Silico Clinical Trials" (ISCT) where simulated patients augment or replace results from human patients [3]. This application places particularly high credibility demands on computational models, requiring robust validation strategies when direct validation against human data may be limited.
Regulatory Qualification: The Bologna Biomechanical Computed Tomography (BBCT) solution recently underwent a credibility assessment using the V&V 40 framework as part of a qualification advice request to the European Medicines Agency [32]. This represents one of the first public examples of using the standard for regulatory qualification of a computational methodology in the EU.
Systematic Mesh Refinement: Recent work has emphasized the importance of systematic mesh refinement practices, particularly for unstructured meshes with nonuniform element sizes [3]. Proper implementation is critical for both code and calculation verification.
Historical Data as Comparators: Research continues on establishing appropriate use of historical data as validation comparators, which can potentially reduce the need for new physical experiments [3].
The ASME V&V 40 standard represents a significant advancement in establishing credibility for computational models in medical device applications. Its risk-informed approach provides a flexible yet structured framework that enables model developers to determine the appropriate level of evidence required for their specific application while ensuring patient safety. As computational modeling continues to play an increasingly important role in medical device development and regulatory evaluation, the principles and methodologies outlined in V&V 40 will serve as a critical foundation for establishing trust in computational predictions.
In computational modeling, whether for predicting the behavior of a new medical device or simulating fluid dynamics, the credibility of the simulation results is paramount. Code and calculation verification are foundational processes for establishing this credibility. Verification is the process of ensuring that the computational model correctly solves the underlying mathematical equations—"solving the equations right" [33]. A cornerstone of this process is systematic mesh refinement, a method to quantify and reduce the errors introduced by the discretization of the geometry into a finite mesh of elements [34].
This guide details the best practices for performing systematic mesh refinement, a critical component of verification. Within a broader model verification and validation (V&V) framework, verification builds confidence that the model is implemented correctly, while validation determines if the model accurately represents reality [33] [35]. For researchers and drug development professionals, adhering to rigorous verification practices is increasingly critical as regulatory agencies like the FDA and EMA more frequently accept in silico evidence in regulatory submissions [35]. A well-executed mesh refinement study provides quantifiable evidence that your computational results are trustworthy.
In computational modeling, it is vital to distinguish between error and uncertainty. Error is a recognizable deficiency in a model that is not due to a lack of knowledge. Uncertainty is a potential deficiency that stems from a lack of knowledge about the true behavior of the physical system [33].
Verification and Validation (V&V) are coupled processes essential for establishing model credibility [33].
Systematic mesh refinement is a primary methodology for calculation verification. It provides a quantitative estimate of the discretization error, which must be accounted for before a model can be meaningfully validated against experimental results [33].
The core principle of systematic mesh refinement is to observe how a key computational result, known as a Quantity of Interest (QoI), changes as the computational mesh is progressively refined. The QoI is a specific, scalar value critical to the engineering analysis, such as the drag coefficient on an airfoil, the maximum stress in a bone implant, or the flow rate through a vessel [34].
The goal is to reach a mesh-independent solution, a state where further refinement of the mesh does not significantly change the QoI. At this point, the discretization error is considered acceptably small for the Context of Use [34].
There are three primary strategies for refining a mesh, each with advantages and applications:
For complex simulations involving localized phenomena like shock waves or stress concentrations, Adaptive Mesh Refinement (AMR) is a powerful technique. AMR uses physics-based refinement indicators to dynamically adapt the mesh during the simulation, refining and coarsening regions based on the evolving solution [36]. This ensures computational resources are focused where they are most needed.
To quantitatively assess mesh convergence, specific error metrics and performance indicators are used. The error between a computed solution ((u_h)) and a reference or exact solution ((u)) can be measured using standard norms [36]:
To evaluate the performance of an AMR approach versus a uniform mesh, the following metrics are useful [36]:
A rigorous mesh refinement study follows a structured protocol.
The Grid Convergence Index (GCI) is a standardized method for reporting the estimated discretization error. It provides a conservative estimate of the error the user would have if they had used the exact solution [34].
The following workflow outlines the key steps in a systematic mesh refinement study, culminating in the calculation of the GCI.
The GCI for a fine mesh solution is calculated as: [ GCI{fine} = Fs \cdot \frac{|\epsilon|}{r^p - 1} ] where:
The table below lists key "research reagents"—the essential metrics, parameters, and tools required to conduct a successful mesh refinement study.
Table 1: Essential Components for a Mesh Refinement Study
| Component | Function & Description |
|---|---|
| Quantity of Interest (QoI) | A specific, scalar output of the simulation used to judge convergence (e.g., peak stress, drag coefficient, flow rate). Must be relevant to the Context of Use [35]. |
| Grid Refinement Ratio ((r)) | The ratio of element sizes between successive meshes (e.g., (r = \sqrt{2})). Must be constant and greater than 1 for a formal study [34]. |
| Observed Order of Convergence ((p)) | A numerical value calculated from the solutions on three different meshes. It indicates the rate at which the numerical error decreases with mesh refinement and should approach the theoretical order of the numerical method. |
| Grid Convergence Index (GCI) | A dimensionless, conservative estimate of the percentage error (uncertainty) in the QoI due to spatial discretization. Used to build credibility and report results [34]. |
| Richardson Extrapolation | A technique that uses solutions from multiple meshes to estimate the exact solution and the discretization error. It is the basis for the GCI calculation [34]. |
For problems with strongly localized features, Adaptive Mesh Refinement (AMR) can dramatically improve computational efficiency. A study on a 2D shallow water solver demonstrated this using specific performance metrics [36]. The results showed that AMR could maintain a low numerical error while using significantly fewer degrees of freedom compared to a uniform mesh, leading to a much better "time-to-solution" metric, ( r_{\mathrm{t-to-sol}}} ) [36].
Table 2: Example Performance Metrics for AMR vs. Uniform Mesh (Shallow Water Solver)
| Mesh Type | Number of DOF ((n_{\Omega})) | (\ell^2) Error ((e_2)) | Time-to-Solution ((r_{\mathrm{t-to-sol}}})) |
|---|---|---|---|
| Uniform Mesh | 1,000,000 | 1.5e-3 | 1.0 (Baseline) |
| AMR Mesh | 125,000 | 1.7e-3 | 0.15 |
| Performance Gain | 8x fewer DOF | Comparable Error | ~6.7x faster |
The principles of verification, including mesh refinement, are now formally recognized in regulatory frameworks. The ASME V&V 40 standard provides a risk-informed credibility assessment framework for computational models used in medical device evaluation [35]. The model's Context of Use determines the required level of accuracy, which in turn dictates the stringency of the mesh refinement study needed. A high-risk decision, such as one supporting the safety of a novel heart valve implant, would require a much lower GCI (and thus a finer, more thoroughly verified mesh) than a model used for early-stage conceptual design [35].
The following diagram illustrates how systematic mesh refinement is integrated within a broader V&V framework aimed at building model credibility for regulatory decision-making.
Systematic mesh refinement is not an optional exercise but a critical, non-negotiable component of code and calculation verification. It transforms a subjective belief in mesh quality into a quantitative, defensible estimate of numerical uncertainty. For researchers and professionals in drug development and biomedical engineering, mastering these practices is essential for generating trustworthy simulation data.
Adopting the best practices outlined—defining a relevant QoI, using a minimum of three meshes, calculating the GCI, and comprehensively reporting the process—directly builds the credibility of computational models. As the regulatory landscape evolves to embrace in silico evidence, a rigorous and well-documented mesh refinement study provides the foundational proof that your simulations are solving the equations right, a necessary precursor to demonstrating that you are solving the right equations for advancing human health.
In the realm of modern drug discovery and development, computational methodologies have transitioned from supportive tools to central drivers of innovation. Techniques including Quantitative Structure-Activity Relationship (QSAR), molecular docking, molecular dynamics (MD), and artificial intelligence/machine learning (AI/ML) form the backbone of contemporary in-silico research. The reliability and predictive power of these methods are paramount, making their standardization within a Verification, Validation, and Uncertainty Quantification (VVUQ) framework essential for building trust and facilitating regulatory acceptance [18].
VVUQ provides a structured paradigm for ensuring computational models are fit for their intended purpose. Verification addresses whether a model is implemented correctly, Validation assesses how accurately a model represents reality, and Uncertainty Quantification characterizes confidence bounds and potential errors in predictions [18]. This framework is particularly critical as computational models, especially AI/ML, become more complex and integral to high-stakes decisions in precision medicine and therapeutic development [18]. This guide details the methodological standards for these core techniques, providing researchers with the protocols and metrics necessary to establish credibility and reproducibility in their computational work.
QSAR modeling correlates numerical descriptors of chemical structures with biological activity, enabling the prediction of compound properties and activities. The evolution from classical statistical methods to AI-enhanced models has dramatically expanded its capabilities [38].
Robust QSAR modeling demands rigorous validation to ensure predictive reliability and avoid overfitting. The following standards are considered best practice:
Classical QSAR relies on statistical methods like Multiple Linear Regression (MLR) and Partial Least Squares (PLS). These are valued for their interpretability and remain effective for linearly related data with a limited number of variables [38].
AI-Enhanced QSAR utilizes machine learning algorithms such as Random Forest (RF), Support Vector Machines (SVM), and Multilayer Perceptron (MLP) to capture complex, non-linear relationships in large chemical datasets [38] [40]. For instance, a study predicting estrogen receptor-binding activity found that ML-based 3D-QSAR models (RF, SVM, MLP) outperformed traditional VEGA models in accuracy, sensitivity, and selectivity [40]. The integration of graph neural networks (GNNs) allows for the creation of "deep descriptors" directly from molecular structures, further advancing the field [38].
Table 1: Key Metrics for QSAR Model Validation
| Metric | Description | Acceptance Threshold Guideline |
|---|---|---|
| R²trng | Coefficient of determination for training set. | >0.6 |
| Q²cv | Cross-validated R² for internal prediction. | >0.5 |
| R²test | Coefficient of determination for external test set. | >0.5 |
| Y-Randomization | Validates model is not fitting to noise. | Significant performance drop in randomized model |
Molecular docking predicts the preferred orientation and binding affinity of a small molecule (ligand) within a target protein's binding site. It is a cornerstone of structure-based drug design for virtual screening and lead optimization [41] [42].
To yield biologically relevant and reproducible results, docking studies must adhere to strict protocols:
Scoring functions are mathematical models used to predict binding affinity by estimating the enthalpy of binding ( \Delta H ) [42]. They are broadly classified as force-field based, empirical, or knowledge-based. A major challenge is their limited ability to fully account for entropic effects ( \Delta S ) and solvation [41] [42].
The integration of Artificial Intelligence is transforming molecular docking. AI techniques enhance traditional methods by:
MD simulations model the physical movements of atoms and molecules over time, providing a dynamic view of biomolecular processes that static models cannot offer.
Robust MD protocols are necessary to generate physically meaningful data:
MD serves as a powerful complementary technique to docking and QSAR. It is used in a post-docking refinement step to relax the docked pose, account for full receptor flexibility, and provide a more realistic model of the binding interaction [42]. Furthermore, MD simulations can be used to generate multiple receptor conformations for use in ensemble docking, a pre-docking step that helps account for inherent protein flexibility [42].
AI and ML are revolutionizing drug discovery by enabling the analysis of vast, complex datasets to predict bioactivity, toxicity, and molecular interactions with unprecedented speed and scale [38] [43].
The "black box" nature of many AI/ML models necessitates rigorous standards for their development and deployment:
AI/ML is being applied across the drug discovery pipeline. A prominent application is in AI-enhanced QSAR, where models like graph neural networks automatically learn relevant features from molecular structures, moving beyond manually engineered descriptors [38]. Integrated workflows are also emerging. For example, one study used network toxicology, machine learning, and molecular docking to identify the core targets and mechanism of polyethylene terephthalate microplastics (PET-MPs) in inducing periodontitis, which was subsequently validated experimentally [44].
The true power of these computational techniques is realized when they are integrated within a VVUQ-compliant workflow. The following diagram and protocol outline this integrated approach.
Integrated VVUQ Workflow for Drug Discovery
This protocol describes a step-by-step process for lead candidate identification and validation, integrating all previously discussed techniques.
Table 2: Key Software and Databases for Computational Research
| Category | Tool/Resource | Primary Function |
|---|---|---|
| Docking Software | AutoDock Vina, GOLD, Glide, DOCK | Predict ligand binding pose and affinity [41] [42]. |
| MD Software | Desmond, GROMACS, AMBER | Simulate dynamic behavior of biomolecules over time [39]. |
| QSAR/AI Platforms | scikit-learn, KNIME, RDKit, PaDEL | Build ML models, compute molecular descriptors [38]. |
| Structural Databases | Protein Data Bank (PDB), AlphaFold DB | Provide 3D structures of proteins and complexes [41]. |
| Chemical Databases | PubChem, ZINC, ChEMBL, DrugBank | Supply chemical structures, bioactivity data, and compound libraries [41]. |
| Visualization Tools | PyMOL, UCSF Chimera | Visualize molecular structures, surfaces, and interactions [41]. |
The adoption of In Silico Clinical Trials (ISCTs) is accelerating across biomedical research, driven by regulatory modernization, escalating drug development costs, and advancements in computational power. This paradigm shift, underscored by the U.S. Food and Drug Administration's (FDA) landmark decision to phase out mandatory animal testing for many drug types, positions computational modeling and simulation as a central pillar of evidence generation [45]. This whitepaper provides an in-depth technical guide to the credibility considerations essential for ISCTs, framed within the broader context of standards for computational model verification and validation (V&V) research. We detail the risk-based credibility frameworks emerging as industry standards, elaborate on experimental protocols for clinical validation, and analyze current market and application trends. The objective is to equip researchers and drug development professionals with the methodologies and tools necessary to ensure that in silico evidence is robust, reproducible, and regulatory-grade.
In silico clinical trials represent a profound structural transformation across drug development and medical device evaluation. ISCTs use computational models to simulate drug candidates, medical devices, and their effects in virtual patient populations, thereby reducing the experimental workload, enhancing prediction accuracy, and shortening development timelines [46]. The impetus for this shift is clear: traditional drug development is broken, often taking over a decade and costing between $314 million to $4.46 billion per approved drug, with the majority failing in late-stage trials [45].
Regulatory agencies worldwide are now endorsing these methodologies. The FDA's 2025 ruling on animal testing, its Model-Informed Drug Development (MIDD) pilot programs, and analogous efforts by the European Medicines Agency (EMA) and Japan's Pharmaceuticals and Medical Devices Agency (PMDA) signal a coordinated push toward accepting computational evidence [45] [47]. This transition is supported by the maturation of key technologies, including High-Performance Computing (HPC), Artificial Intelligence (AI), and digital twin simulations that can replicate human physiology with remarkable granularity [45] [46].
For ISCTs to fulfill their potential, the models underpinning them must be credible. Credibility is demonstrated through rigorous Verification, Validation, and Uncertainty Quantification (VVUQ). Verification ensures the computational model is solved correctly, while validation confirms the model accurately represents the real-world clinical environment [48] [49].
A pivotal advancement is the adaptation of risk-based credibility frameworks, such as the ASME V&V 40 standard, specifically for ISCTs [48]. This framework assesses model risk based on three independent factors and establishes corresponding credibility targets.
The following workflow outlines the process for applying this risk-based credibility assessment:
A critical step in the workflow above is the execution of clinical validation. The following protocol provides a generalizable methodology for establishing the clinical credibility of an in silico model.
The adoption of ISCTs is reflected in a rapidly growing market, providing quantitative context for its expanding role.
Table 1: In-Silico Clinical Trials Market Overview and Forecast [46] [47]
| Metric | 2023/2024 Value | 2033 Forecast | CAGR (2025-2033) | Key Drivers |
|---|---|---|---|---|
| Global Market Size | USD 3.95 B (2024) | USD 6.39 B | 5.5% | Regulatory endorsement, rising R&D costs, AI/digital twins |
| Segment by Application | ||||
| Drug Development | USD 2.06 B (52%) | - | - | Dose optimization, toxicity prediction, virtual cohorts |
| Medical Devices | USD 1.10 B (28%) | - | - | Implant behavior, biomechanics, failure probability |
| Segment by End-User | ||||
| Pharma & Biotech | USD 1.86 B (47%) | - | - | Reducing R&D risk and optimizing protocols |
| Medical Device Mfrs. | USD 1.15 B (29%) | - | - | Replacing physical prototypes with digital twins |
| Regional Analysis | ||||
| United States | USD 1.74 B (44%) | >USD 3.0 B | - | FDA initiatives, mature AI ecosystem |
| Japan | USD 355 M (9%) | >USD 700 M | - | PMDA's structured approach to digital evidence |
The market data demonstrates that ISCTs are no longer a niche tool but an integral part of the R&D landscape. The application is strongest in drug development, where it is used for virtual screening, PK/PD modeling, and predicting population variability [46]. The following diagram illustrates a generalized workflow for an ISCT in drug development, integrating the credibility processes.
Success in the ISCT field requires a suite of computational tools, platforms, and standards. The table below details key resources.
Table 2: Research Reagent Solutions for In Silico Clinical Trials
| Category / Item | Function & Application | Examples |
|---|---|---|
| Simulation & Modeling Platforms | ||
| Quantitative Systems Pharmacology (QSP) Platforms | Mechanistic modeling of drug effects on biological systems; predicts efficacy and toxicity. | Certara's QSP Platforms |
| PBPK/PD Modeling Software | Simulates Absorption, Distribution, Metabolism, Excretion (ADME) and Pharmacodynamics. | Simulations Plus' GastroPlus, Certara's Simcyp Simulator |
| Medical Device Simulation Suites | Finite Element Analysis (FEA) and Computational Fluid Dynamics (CFD) for implant behavior and biomechanics. | Dassault Systèmes' SIMULIA |
| AI & Toxicity Prediction | ||
| Toxicity Prediction Tools | In silico prediction of drug toxicity, including off-target effects. | DeepTox, ProTox-3.0, ADMETlab |
| Protein Folding AI | Predicts 3D protein structures, aiding target identification and drug design. | AlphaFold [45] |
| Validation & Standards | ||
| Risk-Based Credibility Framework | Standard for establishing model credibility for medical devices. | ASME V&V 40 [48] |
| VVUQ Methodologies | Training and standards for Verification, Validation, and Uncertainty Quantification. | NAFEMS VVUQ Courses, ASME VVUQ 10 & 20 [49] |
The trajectory of ISCTs points toward their deepening integration into the core of biomedical research and regulatory science. Key future trends include the use of virtual patient avatars to replace early-phase studies, AI-powered toxicity prediction largely replacing animal testing, and the expansion of hybrid trials that seamlessly combine real-world data with mechanistic simulation [45] [46]. The regulatory landscape will continue to evolve, with model-based approvals becoming more common, particularly for rare diseases and precision therapies where conventional trials are impractical [45] [50].
In conclusion, the era of in silico clinical trials has arrived. The transformative potential of ISCTs to accelerate drug development, reduce costs, and uphold ethical standards is undeniable. However, this potential is contingent upon a unwavering commitment to scientific rigor and credibility. The adoption of risk-based frameworks, rigorous VVUQ methodologies, and standardized protocols is not optional but essential. For researchers and drug development professionals, mastering these credibility considerations is the key to unlocking a smarter, safer, and more personalized future for medicine.
The increasing reliance on Computational Modeling and Simulation (CM&S) within regulated industries like medical devices and biopharmaceuticals has necessitated the development of rigorous frameworks to ensure model credibility. Verification, Validation, and Uncertainty Quantification (VVUQ) provides a systematic methodology for assessing the accuracy and reliability of computational models used in critical decision-making processes, from pre-market submissions to manufacturing control strategies. These processes are essential for building confidence among manufacturers, regulatory bodies, and end-users that model predictions can be trusted in lieu of, or to reduce, extensive physical testing [1].
Adherence to VVUQ standards is particularly crucial when CM&S data supports applications to regulatory bodies such as the U.S. Food and Drug Administration (FDA). For medical devices, this often involves submissions for 510(k) clearance, De Novo requests, or Premarket Approval (PMA) [51]. Similarly, in biopharmaceuticals, computational models are increasingly pivotal in process development and quality control. The core principles of VVUQ, as defined by standards like those from ASME, include:
This guide explores the application of these principles through specific case studies and provides detailed protocols for implementation, framed within the broader thesis that standardized VVUQ processes are fundamental to advancing credible computational research in drug and device development.
The ASME VVUQ standards provide the foundational lexicon and methodology for credibility assessment. VVUQ 1-2022 establishes a common terminology, which is critical for clear communication between model developers, reviewers, and regulatory agencies [1]. Discipline-specific standards, such as V&V 10 for computational solid mechanics and V&V 20 for computational fluid dynamics and heat transfer, offer detailed application guidance [1].
A significant advancement is the ASME V&V 40-2018 standard, which introduces a risk-informed credibility assessment framework. This standard is a key enabler for the FDA's Center for Devices and Radiological Health (CDRH) framework for evaluating CM&S in submissions [3]. The risk-based approach ties the required level of model credibility to the context of use (COU) and the model influence on the decision at hand. Higher-risk decisions, where the consequence of an incorrect model prediction is severe, demand a higher degree of credibility, which is achieved through more extensive VVUQ activities.
Understanding the regulatory landscape is essential for planning appropriate VVUQ activities. The FDA's primary submission pathways for medical devices have distinct requirements for evidence of safety and effectiveness, which directly impact the scope of necessary model validation [51].
Table: FDA Regulatory Pathways for Medical Devices
| Pathway | Purpose & Qualification | Submission Requirements | Evidence Standard | Typical Review Timeline |
|---|---|---|---|---|
| 510(k) | For devices substantially equivalent to a legally marketed predicate; most common for Class II devices [51]. | Premarket Notification; demonstration of substantial equivalence to a predicate device [51]. | Substantial equivalence to a predicate in intended use, technological characteristics, and performance [51]. | Typically 90 days, but can vary [51]. |
| De Novo | For novel, low-to-moderate risk devices with no predicate; allows reclassification from default Class III to Class I or II [51]. | De Novo request; proof of safety and effectiveness for the novel device [51]. | Valid scientific evidence to provide reasonable assurance of safety and effectiveness [51]. | Approximately 120 days [51]. |
| PMA | For high-risk (Class III) devices that support or sustain human life or present potential unreasonable risk [51]. | Premarket Approval Application; comprehensive scientific evidence, typically including clinical trial data [51]. | Extensive scientific evidence, including from clinical investigations, demonstrating safety and effectiveness [51]. | 6 months to a year or more [51]. |
The role of CM&S varies across these pathways. For a 510(k) submission, a model might be used to demonstrate performance equivalence to a predicate device. In a De Novo or PMA, a model might play a more central role, potentially supporting claims in lieu of some clinical data, which necessitates a higher degree of model credibility as per V&V 40 [3].
The ASME VVUQ 40.1 technical report provides a practical example of applying the V&V 40 risk-based framework to a medical device. The case study involves a computational model with a critical role: identifying the worst-case size configuration of a tibial tray component for subsequent physical fatigue testing [3]. The Context of Use (COU) for the model is precisely defined: to predict the stress distribution under physiological loading conditions to determine which size and geometry will experience the highest stress, and thus should be subjected to destructive physical testing.
The model's risk was assessed as moderate-to-high because an error in identifying the worst-case size could lead to testing a non-worst-case device, potentially allowing a design with a higher risk of fatigue failure to reach the market. This risk level directly informed the credibility requirements, necessitating rigorous verification and validation activities [3].
The validation process for the tibial tray model illustrates a comprehensive approach to building credibility.
1. Define Validation Objectives and COU: The objective was to assess the model's ability to accurately predict stress concentrations in the tibial tray. The COU specifically focused on quasi-static structural performance under standard gait-cycle loading [3].
2. Establish a Validation Hierarchy: A multi-level approach was employed:
3. Design and Execute Validation Experiments:
4. Compute Validation Metrics: The primary metric used was a normalized stress difference between the computational predictions and experimental measurements at each strain gauge location. For a scalar quantity like peak stress, a direct percentage difference was calculated. The model was considered validated if the difference fell within a pre-defined acceptance threshold, justified by the model's risk and COU [3].
5. Uncertainty Quantification: Both aleatory (inherent variability in material properties and loading) and epistemic (model form and numerical solution) uncertainties were quantified. A Monte Carlo simulation was performed by propagating input uncertainties (e.g., elastic modulus, load magnitude) through the model to establish a confidence interval on the predicted peak stress [52].
Diagram: VVUQ Workflow for Medical Device Submission. This flowchart outlines the risk-informed process for building model credibility, from defining the Context of Use to regulatory submission, as illustrated in the ASME V&V 40 case study.
In biopharmaceutical development, computational models of biological pathways are used to analyze and visualize complex biological processes, such as cell signaling networks or metabolic pathways relevant to drug action or bioproduction in cells [53]. These pathway models are not just graphical figures; they are structured knowledge bases that encode interactions between biological entities (e.g., proteins, metabolites) using standardized formats like Systems Biology Markup Language (SBML) or Biological Pathway Exchange (BioPAX) [53]. The COU for such a model could include: predicting cellular responses to a drug candidate, identifying potential off-target effects, or optimizing a metabolic pathway in a production cell line to increase biopharmaceutical yield.
The credibility of these models is assessed differently from engineering physics-based models but follows the same fundamental VVUQ principles. The "validation" of a pathway model involves assessing its biological accuracy and predictive capability against experimental data.
The "Ten simple rules" framework provides a robust protocol for creating reusable and credible pathway models [53].
1. Research and Reuse Existing Models (Rule 1): Before creating a new model, researchers should interrogate existing databases such as Reactome, WikiPathways, KEGG, and Pathway Commons [53]. Reusing and extending a previously curated model enhances interoperability and builds upon established community knowledge. All sources should be formally cited within the model's metadata.
2. Determine Scope and Level of Detail (Rule 2): The model's scope—the entities and boundaries—must be defined by the specific biological question. For a model focused on insulin signaling, the scope might be limited to core receptors, kinases, and effectors, excluding peripheral interactions to maintain clarity and computational tractability [53].
3. Use Standard Nomenclature and Annotation (Rule 3): All entities (proteins, genes, compounds) must be annotated with unique, standardized identifiers from authoritative databases such as UniProt (proteins), Ensembl (genes), and ChEBI (chemicals) [53]. This ensures the model is computationally actionable and can be linked to omics data (e.g., transcriptomics, proteomics).
4. Provide Sufficient Metadata and Documentation (Rule 5): Comprehensive metadata is crucial for credibility and reuse. This includes:
5. Validation against Experimental Data: The model is validated by testing its ability to explain or predict independent experimental data.
Table: Key Research Reagent Solutions for Computational Pathway Modeling
| Resource / Tool | Type | Primary Function in VVUQ |
|---|---|---|
| WikiPathways / Reactome | Pathway Database | Provides existing, peer-reviewed pathway models for reuse and extension, forming the basis for validation [53]. |
| PathVisio / CellDesigner | Pathway Editing Tool | Software for creating, visualizing, and annotating pathway models in standard formats (GPML, SBML) [53]. |
| UniProt / Ensembl / ChEBI | Biological Reference Database | Provides standardized identifiers and annotations for model entities, ensuring accuracy and interoperability [53]. |
| SBML (Systems Biology Markup Language) | Model Format Standard | A canonical format for representing models, enabling exchange, reuse, and simulation by different software tools [53]. |
| Python (with libSBML, etc.) | Programming Environment | Enables custom scripts for model validation, simulation, and uncertainty quantification (e.g., parameter sensitivity analysis) [53]. |
An emerging, high-impact application of CM&S is the In Silico Clinical Trial (ISCT), which uses simulated patient populations to augment or replace traditional clinical trials [3]. ISCTs have the potential to reduce trial costs and duration while improving the quality of information. However, the credibility demands for models used in ISCTs are exceptionally high, as the outcomes directly impact regulatory assessments of safety and efficacy.
The ASME V&V 40 framework is directly applicable to establishing the necessary credibility for these models. A key challenge is validation when direct comparison to human data is limited. This often necessitates a validation hierarchy that leverages all available evidence, from benchtop experiments and animal models to limited human data from early-phase trials [3]. The working group is actively developing best practices for classifying and using different types of comparators to assess the credibility of patient-specific computational models [3].
A critical step in all VVUQ processes is Uncertainty Quantification (UQ), which distinguishes a rigorous predictive model from a simple curve fit. The master class on VVUQ outlines a systematic UQ workflow [52]:
Diagram: Risk-Based Credibility Assessment. This diagram visualizes the ASME V&V 40 framework, where the Context of Use and Model Risk determine the credibility goals, which in turn drive the planning of specific VVUQ activities across multiple credibility factors.
Implementing VVUQ requires a suite of tools and resources. The following table details key items referenced in the case studies and standards.
Table: Essential Research Reagent Solutions for VVUQ Implementation
| Tool / Resource | Category | Function in VVUQ | Relevant Standard / Case |
|---|---|---|---|
| ASME VVUQ Standards (e.g., V&V 10, 20, 40) | Standard | Provides the foundational framework, terminology, and risk-based methodology for planning and executing VVUQ [1]. | Core reference for all applications. |
| Code Verification Test Suite (e.g., MMS) | Verification | Provides problems with exact solutions to verify that a computational code is solving the underlying equations correctly [52]. | ASME V&V 10 & 20. |
| Strain Gauge & Test Frame | Validation | Used in physical experiments to collect high-fidelity data for validating computational stress/strain predictions [3]. | Tibial Tray Case Study. |
| Monte Carlo Simulation Software | UQ | Propagates input uncertainties through a model to quantify the uncertainty and reliability of the output predictions [52]. | UQ Workflow. |
| Pathway Databases (e.g., WikiPathways) | Biological Model | Provides curated, reusable biological pathway models that serve as a starting point for development and validation [53]. | Biopharma Pathway Modeling. |
| Standard Biological Identifiers (UniProt, ChEBI) | Annotation | Ensures biological entities in a model are unambiguously defined, which is critical for model accuracy, reuse, and data integration [53]. | Rule 3 for Pathway Modeling. |
| SBML (Systems Biology Markup Language) | Model Format | A standardized format for representing computational models of biological processes, enabling model exchange, simulation, and reproducibility [53]. | Rule 1 & 5 for Pathway Modeling. |
The case studies presented—from the ASME V&V 40-based assessment of a medical device to the construction of reusable pathway models for biopharmaceutical analysis—demonstrate that standardized VVUQ processes are universally critical for establishing trust in computational models. The core thesis is that a risk-informed approach, which tailors the rigor of VVUQ activities to the model's context of use and decision consequence, provides a scientifically sound and economically viable path forward.
Adherence to established standards like ASME V&V 40 not only facilitates smoother regulatory reviews but also enhances the internal product development and drug discovery processes by identifying critical knowledge gaps and controlling risks early. As computational methods continue to evolve, embracing these rigorous practices for verification, validation, and uncertainty quantification will be the cornerstone of credible and impactful computational science in regulated industries.
Computational studies across scientific disciplines—from materials science to artificial intelligence and drug development—face increasing scrutiny regarding the credibility and reliability of their predictions. Research manuscripts can receive immediate desk rejection for failing to meet fundamental standards in verification and validation (V&V) and related methodological requirements. This technical guide synthesizes current standards and common pitfalls that lead to manuscript rejection, providing researchers with a framework for developing compliant computational studies that meet the evolving demands of scientific peer review.
The foundation of credible computational research lies in establishing rigorous processes for Verification, Validation, and Uncertainty Quantification (VVUQ). As noted in engineering simulation contexts, "key engineering decisions depend on computational simulations, shifting the role of physical tests from product compliance demonstration to simulation model validation" [54]. This paradigm shift places greater responsibility on researchers to ensure their simulations are reliable, particularly as computational methods increasingly support critical decisions in fields like drug development and medical device design.
Manuscripts may be immediately rejected without full peer review for failing to meet formal submission requirements. These administrative issues, while seemingly procedural, are frequently enforced strictly by editorial teams.
Methodological flaws represent the most substantive category of immediate rejection criteria, directly impacting the scientific validity of computational studies.
Table 1: Common Methodological Grounds for Immediate Rejection
| Rejection Category | Specific Deficiencies | Manifestation in Computational Studies |
|---|---|---|
| Research Design | Misalignment between methods and research questions [57] | Using machine learning approaches for problems requiring mechanistic models or vice versa |
| Inadequate research design [57] | Single-method studies without cross-validation or robustness checks | |
| Lack of transparency [57] | Insufficient detail in describing methodology, preventing assessment of rigor | |
| Data Issues | Sampling issues [57] | Non-representative data sampling undermining generalizability |
| Sample size inadequacy [57] | Insufficient data for model training or statistical power | |
| Data quality concerns [57] | Outdated, incomplete, or unreliable datasets | |
| Analytical Approach | Inappropriate analysis techniques [57] | Incorrect statistical methods or evaluation metrics |
| Inadequate operationalization of constructs [57] | Poor definition and measurement of key variables | |
| Lack of cross-validation [57] | No replication using different samples or settings |
Verification establishes that the computational model correctly implements the intended mathematical formalism and numerical algorithms. This process confirms that the model is solved accurately, focusing on code correctness and numerical error reduction.
Validation determines how accurately the computational model represents the real-world system being studied, establishing the model's predictive capability within its intended domain.
The following diagram illustrates the integrated verification and validation workflow essential for credible computational studies:
Uncertainty Quantification (UQ) systematically characterizes and propagates uncertainties from multiple sources to establish confidence in computational predictions.
Artificial intelligence and machine learning models introduce additional verification and validation challenges that warrant specific methodological attention.
Table 2: AI/ML-Specific V&V Requirements and Common Pitfalls
| Requirement Category | Standard/Best Practice | Common Pitfalls Leading to Rejection |
|---|---|---|
| Model Documentation | NIST AI Standards Zero Draft [59] | Insufficient documentation of training data, architecture, or hyperparameters |
| Validation Approach | High-fidelity computational or experimental validation [60] | Validation only on held-out test sets without external validation |
| Physical Plausibility | Physics-based regularization [60] | Physically impossible predictions without constraint mechanisms |
| Uncertainty Quantification | Aleatory and epistemic uncertainty characterization [54] | Point predictions without confidence intervals or uncertainty estimates |
| Reproducibility | FAIR data and model sharing principles | No access to code, data, or training procedures |
Adherence to specific structural and reporting standards is essential for manuscript acceptance in computational fields.
Table 3: Essential Research Reagents for Computational Model V&V
| Reagent Category | Specific Tools/Methods | Function in V&V Process |
|---|---|---|
| Code Verification | Method of Manufactured Solutions (MMS) [54] | Verifies code correctness by testing with analytically known solutions |
| Software Quality Assurance (SQA) processes [54] | Ensures code reliability through version control, testing, and documentation | |
| Solution Verification | Discretization error estimators [54] | Quantifies numerical errors from mesh/grid resolution |
| Iterative convergence criteria [54] | Ensures numerical solutions are fully converged | |
| Validation Metrics | Area metric, Z metric [54] | Quantifies agreement between model predictions and experimental data |
| Waveform comparison metrics [54] | Assesses agreement for time-series or spatial distribution data | |
| Uncertainty Quantification | Monte Carlo methods [54] | Propagates input uncertainties through computational models |
| Sensitivity analysis [54] | Identifies key contributors to output uncertainty | |
| Validation Data | Designed validation experiments [54] | Provides high-quality data for model validation |
| Standard reference problems [54] | Enables model comparison against community-accepted benchmarks |
Establishing credibility for computational models requires systematic processes beyond technical V&V activities.
The following diagram illustrates the credibility assurance process for building confidence in computational models:
Even with understanding of V&V principles, researchers often stumble in implementation.
Avoiding immediate rejection in computational studies requires methodical attention to verification, validation, and uncertainty quantification throughout the research process. By implementing the standards and methodologies outlined in this guide, researchers can enhance the credibility of their computational work and make meaningful contributions to their fields. The evolving landscape of computational research increasingly demands not just novel results, but demonstrably reliable ones, making rigorous V&V practices essential for successful publication and scientific impact.
The escalating complexity of therapeutic modalities, particularly beyond Rule of 5 (bRo5) compounds and patient-specific models, demands equally sophisticated approaches to computational model credibility. As drug discovery increasingly targets protein-protein interactions and previously "undruggable" targets, researchers are employing larger, more complex molecules including PROTACs (Proteolysis Targeting Chimeras), macrocycles, and antibody-drug conjugates that operate outside traditional Lipinski guidelines [61] [62]. These advanced modalities present unique challenges for predictive modeling, necessitating robust verification and validation (V&V) frameworks tailored to their distinct molecular characteristics. The credibility of computational models used in drug development has direct implications for regulatory decision-making, clinical trial design, and ultimately patient access to safe and effective treatments [9]. Establishing trust in these models is particularly crucial when they serve as primary evidence for decisions in cases where clinical trials are not feasible or ethical. This technical guide examines specialized strategies for optimizing model credibility within the context of complex bRo5 modalities and patient-specific applications, providing a structured framework aligned with emerging regulatory science principles.
A risk-informed framework provides the foundation for establishing computational model credibility. Adapted from the American Society of Mechanical Engineers (ASME) standards, this approach assesses model risk through two primary dimensions: (1) model influence, representing the weight of the model in the totality of evidence for a given decision, and (2) decision consequence, reflecting the significance of an adverse outcome resulting from an incorrect decision [9]. The rigor of credibility activities should be commensurate with the determined model risk level, ensuring efficient allocation of verification and validation resources while maintaining appropriate scientific standards.
The credibility assessment process encompasses five key concepts, applied iteratively throughout model development and application. These include precisely stating the question of interest (the specific decision being addressed), defining the context of use (COU) describing how the model will address the question, assessing model risk based on influence and decision consequence, establishing credibility through appropriate V&V activities, and finally assessing credibility to determine if the model is sufficiently trustworthy for its intended purpose [9]. This framework enables consistent evaluation across different modeling approaches and therapeutic areas.
Table 1: Credibility Factors for Computational Model Verification and Validation
| Activity Category | Credibility Factor | Description |
|---|---|---|
| Verification | Software Quality Assurance | Ensuring software reliability and correctness |
| Numerical Code Verification | Confirming accurate implementation of numerical methods | |
| Discretization Error | Assessing errors from continuous system discretization | |
| Numerical Solver Error | Evaluating errors from numerical solution techniques | |
| Use Error | Identifying and mitigating user implementation mistakes | |
| Validation | Model Form | Assessing appropriateness of mathematical structure |
| Model Inputs | Verifying accuracy of input parameters and data | |
| Test Samples | Ensuring representative experimental design | |
| Test Conditions | Validating under conditions relevant to COU | |
| Equivalency of Input Parameters | Confirming parameter consistency across scales | |
| Output Comparison | Quantifying agreement with experimental data | |
| Applicability | Relevance of Quantities of Interest | Ensuring model outputs address COU needs |
| Relevance of Validation Activities | Confirming V&V activities adequately support COU |
Thirteen distinct credibility factors collectively establish model trustworthiness, spanning verification, validation, and applicability domains [9]. Verification activities confirm that the computational model correctly implements the underlying mathematical model and solution, while validation activities determine how accurately the model represents real-world phenomena. Applicability factors ensure that the conducted V&V activities appropriately support the specific context of use. For bRo5 modalities, particular attention should be paid to model form validation, as traditional quantitative structure-activity relationship (QSAR) approaches often fail to capture the complex conformational dynamics and property landscapes of these larger, more flexible molecules [63] [64].
Beyond Rule of 5 molecules exhibit structural and physicochemical properties that present distinctive challenges for computational modeling. These compounds typically have molecular weight >500 Da, high flexibility, and increased polar surface area, leading to complex conformational dynamics and property landscapes [63] [62]. For PROTACs specifically, their heterobifunctional nature (combining a target-binding warhead, linker, and E3 ligase recruiter) creates unique challenges for predicting ternary complex formation, degradation efficiency, and pharmacokinetic properties [65]. The conformational flexibility of PROTACs enables them to adopt different orientations in various environments—a property known as chameleonicity—which significantly influences their cellular permeability, efflux ratio, and ultimately oral bioavailability [63].
The rise of proximity-inducing modalities like PROTACs has changed the demands placed on computational models. Recent milestones highlighting their promise include the advancement of vepdegestrant (ARV-471), a PROTAC targeting the estrogen receptor, into Phase 3 clinical trials for metastatic breast cancer, and the development of luxdegalutamide (ARV-766), an oral androgen receptor degrader showing promise in treating metastatic castration-resistant prostate cancer [61]. Accurately predicting the behavior of these complex molecules requires specialized approaches that go beyond traditional small molecule modeling techniques.
For bRo5 compounds, molecular conformation significantly influences physicochemical properties and biological performance. Recent research on PROTACs with different linker methylation levels demonstrated that conformational sampling in polar and nonpolar environments directly impacts efflux ratio and oral bioavailability [63]. Linker methylation drives chameleonic folding behavior, allowing molecules to adopt more polar, extended conformations in aqueous environments and less polar, compact conformations in lipid-rich environments—a property critical for membrane permeability.
Chromatographic methods have emerged as valuable tools for evaluating permeability-relevant lipophilicity of bRo5 compounds. Studies show that chromatographic retention times can reveal subtle conformational effects and correlate with the ability to sequester hydrogen bond donors in low dielectric media [64]. These chromatographic approaches provide high-throughput methods for estimating hydrocarbon-water partition coefficients for macrocyclic peptides and PROTACs, facilitating prediction of passive cell permeability trends. Molecular dynamics simulations in different dielectric environments further complement experimental measurements by providing atomic-level insights into conformational preferences [63] [64].
Diagram 1: bRo5 Compound Modeling Workflow
Advanced computational methods are essential for addressing the unique challenges of bRo5 modalities. Structure-based PROTAC design benefits significantly from prior protein-protein docking, which greatly increases the success of structure-based design approaches [65]. The Rosetta suite excels among structure-based ternary complex prediction methods, while emerging deep learning approaches show promise for modeling ligand-dependent multicomponent assemblies' conformations [65]. Alpha-Pharm3D represents a recent advancement in deep learning methods that predicts ligand-protein interactions using 3D pharmacophore fingerprints by explicitly incorporating geometric constraints [66]. This approach enhances both prediction interpretability and accuracy of binding affinities, achieving competitive performance (AUROC ~90%) across diverse datasets even with limited training data.
For predicting degradation efficiency, lysine density in the ubiquitination zone has emerged as a reliable predictor [65]. This parameter can be incorporated into quantitative models to prioritize PROTAC candidates with higher likelihood of successful target degradation. Additionally, conformational sampling through molecular dynamics simulations in polar and nonpolar environments provides critical insights into chameleonic properties that influence permeability and efflux [63]. These simulations should explicitly model the molecular environment, as aqueous and lipid-rich conditions promote distinctly different conformational states for flexible bRo5 molecules.
Table 2: Key Experimental Assays for bRo5 Model Validation
| Assay Category | Specific Methods | Relevant Output Metrics | Application to bRo5 Modalities |
|---|---|---|---|
| Physicochemical Properties | Chromatographic lipophilicity (log k′80 PLRP-S) | Hydrocarbon-water partition coefficients | Predicts passive permeability for macrocycles and PROTACs [64] |
| Solubility measurements | Kinetic and thermodynamic solubility | Informs formulation strategies for low-solubility compounds | |
| Hydrogen bonding capacity | Δlog k W IAM | Quantifies molecular chameleonicity [63] | |
| In Vitro ADME | Caco-2 permeability | Apparent permeability (Papp), Efflux Ratio (ER) | Strong predictor of oral bioavailability for PROTACs [63] |
| MDCK assays | Passive cell permeability | Correlates with chromatographic lipophilicity [64] | |
| Metabolic stability | Microsomal/hepatocyte clearance | Informs first-pass metabolism predictions | |
| Biological Activity | Ternary complex formation | CETSA, FRET-based assays | Validates computational ternary complex predictions [65] |
| Degradation efficiency | Western blot, cellular thermal shift assay | Confirms target degradation and ubiquitination | |
| Binding affinity | Ki, IC50, KD measurements | Validates target engagement predictions |
Robust experimental validation is essential for establishing bRo5 model credibility. For PROTACs, the efflux ratio (ER) from Caco-2 assays has been demonstrated as a strong predictor of oral bioavailability (F%), with the chromatographic descriptor log k′80 PLRP-S providing a high-throughput method for estimating ER [63]. This correlation enables early prioritization of candidates with favorable absorption properties. Additionally, chromatographic approaches for estimating hydrocarbon-water shake-flask partition coefficients show strong correlation with MDCK passive cell permeability for various thioether-cyclized decapeptides, providing a convenient, high-throughput method for predicting permeability trends in bRo5 compounds [64].
The Scientist's Toolkit: Essential Research Reagents and Materials:
Physiologically-based pharmacokinetic (PBPK) modeling provides a powerful framework for predicting drug behavior in specific patient populations, including pediatric patients, organ impairment groups, and populations with genetic polymorphisms affecting drug metabolism. The risk-informed credibility framework can be effectively applied to PBPK models to establish their suitability for regulatory decision-making [9]. For example, a PBPK model might be developed to predict pharmacokinetic changes resulting from drug-drug interactions with CYP3A4 modulators in adult patients, and subsequently qualified to predict PK profiles in children and adolescent patients.
When defining the context of use for patient-specific PBPK models, precise specification of the target population, relevant physiological parameters, and metabolic pathways is critical. For a hypothetical small molecule drug primarily eliminated by CYP3A4, model credibility might be established through validation against clinical DDI studies with strong CYP3A4 inhibitors and inducers in adults, followed by extrapolation to pediatric populations using physiologically-scaled parameters [9]. The credibility assessment would evaluate the applicability of the adult validation to the pediatric context of use, considering differences in enzyme maturation, organ size, and body composition.
For targeted protein degraders like PROTACs, patient-specific modeling can incorporate biomarker data to predict differential responses across population subgroups. Models can integrate expression levels of target proteins, E3 ligases, and components of the ubiquitin-proteasome system to stratify patients according to likelihood of treatment response [65]. This approach is particularly valuable for optimizing therapy with complex modalities where multiple biological factors influence efficacy.
Validation of personalized models requires specialized approaches that account for population heterogeneity. Cross-validation techniques using stratified sampling can help ensure model performance across relevant subgroups. When clinical data for specific subpopulations is limited, quantitative systems pharmacology (QSP) models incorporating pathway biology may provide supportive evidence of model applicability. However, the uncertainty associated with limited validation data should be explicitly quantified and reflected in model influence on decisions.
Diagram 2: Patient-Specific Model Credibility
A recent academic-industrial collaboration demonstrated the application of integrated computational and experimental approaches to optimize PROTAC bioavailability through linker modifications [63]. Researchers profiled 11 structurally related von Hippel-Lindau (VHL)-based PROTACs differing in linker length, methylation patterns, and stereochemistry, evaluating in vivo pharmacokinetics in mice alongside in vitro ADME properties and key physicochemical traits. Conformational sampling and molecular dynamics in polar and nonpolar environments revealed that strategic linker methylation drives chameleonic folding behavior, influencing efflux ratio and ultimately oral bioavailability.
This systematic approach enabled prediction of oral bioavailability for VHL-based PROTACs with different linker methylation levels throughout drug discovery. The study highlighted that Caco-2 permeability alone did not correlate with oral bioavailability, while efflux ratio proved to be a strong predictor [63]. This case study illustrates how molecular-level understanding of conformational dynamics can inform design strategies for bRo5 compounds, with linker methylation serving as a minimalist approach to achieving partial rigidification within flexible linkers—a concept the authors termed "linkerology" [63].
The Alpha-Pharm3D platform represents an advanced approach to ligand-based 3D pharmacophore modeling that explicitly incorporates conformational ensembles of ligands and geometric constraints of receptors to construct trainable pharmacophore fingerprints [66]. This method addresses key limitations of conventional pharmacophore modeling, including bias toward specific functional groups, limited interpretability, and reliance on external software for model interpretation. When applied to the neurokinin-1 receptor (NK1R), a cancer-related GPCR, Alpha-Pharm3D prioritized three experimentally active compounds with significantly distinct scaffolds, two of which were optimized through chemical modification to exhibit EC50 values of approximately 20 nM [66].
This case study demonstrates how advanced computational methods can enhance screening efficiency for difficult targets, with performance maintained even with limited training data. The platform achieved a mean recall rate exceeding 25% regardless of data scarcity, performing equal to or better than prevailing traditional and AI-based screening methods [66]. Such approaches are particularly valuable for bRo5 modalities where structural complexity challenges conventional screening methods.
The field of computational modeling for bRo5 modalities continues to evolve rapidly, with several emerging technologies promising to enhance model credibility. Deep learning methods trained on experimental data show increasing capability to model ligand-dependent multicomponent assemblies' conformations [65]. AlphaFold and related structure prediction tools offer potential to reshape PROTAC design by improving predictions of ternary complex formation [65]. Additionally, automated high-throughput experimentation platforms, such as SpiroChem's Hercules+ synthesis platform, generate crucial data for model training and validation across diverse chemical spaces [61].
Advancements in chromatographic methods for evaluating permeability-relevant lipophilicity provide high-throughput experimental data that correlates well with cellular permeability measurements [64]. These methods enable efficient screening of complex library mixtures and pure compounds, supporting more robust model training. Similarly, benchmark sets of bioactive molecules tailored for diversity analysis, such as the recently developed sets of 3k, 25k, and 379k structures mined from ChEMBL, enable more systematic evaluation of chemical space coverage [67].
Establishing credibility for computational models applied to patient-specific scenarios and complex bRo5 modalities requires specialized strategies that address their unique challenges. The risk-informed framework provides a structured approach for determining appropriate V&V activities based on model influence and decision consequence. For bRo5 compounds, particular attention should be paid to conformational dynamics and their influence on physicochemical properties, leveraging both computational sampling and experimental measurements of chameleonicity.
As the field advances, development of standardized benchmark sets, robust validation protocols, and explicit uncertainty quantification will strengthen model credibility across diverse applications. The ultimate goal is a comprehensive, standardized framework for credibility assessment that enables reliable application of computational models to accelerate development of innovative therapeutics while maintaining scientific rigor and regulatory standards. Through continued refinement of these approaches, computational models will play an increasingly central role in navigating the complex landscape of bRo5 drug discovery and personalized therapy optimization.
Computational reproducibility, defined as obtaining consistent results using the same input data, computational steps, methods, and code, represents a fundamental pillar of scientific progress. Despite this, numerous scientific fields are experiencing a severe reproducibility crisis that undermines the credibility of computational findings and wastes substantial resources. Recent quantitative assessments document the alarming severity of this crisis across multiple domains, with reproducibility rates varying dramatically from one field to another.
The financial impact of computational irreproducibility represents a substantial drain on global scientific resources. The pharmaceutical industry alone is estimated to waste $40 billion annually on irreproducible computational research, with individual study replications requiring between 3-24 months and $500,000-$2,000,000 in additional investment. When extrapolated globally across all computational sciences, the total economic impact approaches $200 billion annually [68].
Table 1: Computational Reproducibility Rates Across Scientific Domains
| Domain | Reproducibility Rate | Primary Causes of Failure |
|---|---|---|
| Data Science (Jupyter Notebooks) | 5.9% (245 of 4,169 notebooks) | Missing dependencies, broken libraries, environment differences [68] |
| Computational Physics | ~26% | Software version issues, inadequate documentation [68] |
| Bioinformatics Workflows | Near 0% | Missing data, technical complexity, workflow management issues [68] |
| Computational Chemistry | Variable (15 software packages gave different answers for same crystals) | Algorithmic differences, implementation variations [68] |
The technical roots of this crisis stem from systemic barriers that compound across the entire computing stack. Even theoretically deterministic computational research faces challenges from parallel execution order variations, floating-point arithmetic differences across architectures, compiler optimization choices, and GPU atomic operations that can produce variations of several percent in simulations depending on the specific hardware model and driver version [68]. In the context of quantitative modeling in systems biology, these challenges are exacerbated by inconsistent annotation practices, insufficient model documentation, and the lack of standardized curation processes.
The Minimum Information Requested In the Annotation of Models (MIRIAM) is a community-developed set of guidelines designed to standardize the annotation and curation processes of quantitative models of biological systems [69]. Established as part of the BioModels.net initiative, MIRIAM addresses the critical need for consistent model documentation to facilitate model exchange, reuse, and verification across different research groups and software platforms.
MIRIAM is structured around three core components that deal with different aspects of the information required for effective model curation [69]:
The MIRIAM guidelines establish specific requirements for each of the three annotation components, creating a comprehensive framework for model quality assurance.
Table 2: Core Components of MIRIAM Guidelines
| Component | Key Requirements | Implementation Examples |
|---|---|---|
| Reference Correspondence | • Machine-readable, standardized format (SBML, CellML)• Valid encoding schema• Association with reference description• Reflects biological processes in reference• Instantiable with necessary parameters• Reproduces representative results | SBML model file with referenced publication; Provided initial conditions and parameters for simulation [69] |
| Attribution Annotation | • Model name• Citation and author identification• Creator contact details• Creation and modification timestamps• Terms of use statement | Model name: "Calcium Oscillation Model"; Creator: Jane Doe (j.doe@lab.edu); License: CC BY 4.0 [69] |
| External Resource Annotation | • Unambiguous relationship between knowledge and model constituent• Triple structure: {data collection, identifier, qualifier}• URI-based expression• Proper identifier framework analysis• Qualifier usage for link refinement | URI: http://identifiers.org/uniprot/P12345; Qualifier: is_version_of [70] [69] |
A critical innovation of MIRIAM is its approach to external resource annotation through the use of Uniform Resource Identifiers (URIs). MIRIAM URIs are structured identifiers composed of two parts: the URI of the data type (a controlled description of the data type) and the element identifier (specific piece of knowledge within that data type context) [70]. These identifiers are designed to be unique, perennial, standards-compliant, resolvable, and free to use, addressing the fundamental requirements for reliable scientific identifiers.
The MIRIAM Resources project provides the technical infrastructure to support these annotations, consisting of four interconnected components: the MIRIAM Database (stores information about data types), MIRIAM Web Services (SOAP-based API), MIRIAM Library (provides access to web services), and MIRIAM Web Application (human-readable browsing and editing interface) [70].
The Systems Biology Markup Language (SBML) represents a foundational standard for encoding computational models in systems biology that aligns with MIRIAM principles. As a machine-readable format based on XML, SBML provides a structured framework for representing biochemical models, including metabolic networks, cell signaling pathways, and gene regulatory networks [69].
SBML serves as an ideal implementation vehicle for MIRIAM annotations by providing:
The integration of MIRIAM annotations within SBML files transforms them from mere computational representations into semantically rich models that explicitly reference established biological knowledge bases. This enables both human comprehension and machine-actionable processing of model components.
SBML Annotation Structure
The technical implementation of MIRIAM annotations in SBML utilizes the Resource Description Framework (RDF) embedded within SBML files. This approach enables the creation of semantic triples that connect model components to external database entries using standardized relationship qualifiers.
The annotation workflow follows this sequence:
Objective: Systematically evaluate a computational model's adherence to MIRIAM guidelines for annotation completeness and correctness.
Materials:
Methodology:
Reference Correspondence Verification
Attribution Annotation Check
External Resource Annotation Audit
Compliance Scoring
Validation Metrics: Percentage of annotated components, URI resolution success rate, qualifier appropriateness score
Objective: Quantitatively assess the reproducibility of simulation results across multiple software platforms and environments using MIRIAM-annotated SBML models.
Materials:
Methodology:
Environment Configuration
Cross-Platform Execution
Result Comparison and Variance Analysis
Reproducibility Assessment
Validation Metrics: Numerical variance between simulations, reproducibility classification, annotation density correlation
Table 3: Essential Research Tools for MIRIAM-Compliant Model Development
| Tool Category | Specific Solutions | Function and Application |
|---|---|---|
| Model Format Standards | SBML (Systems Biology Markup Language) | Machine-readable format for representing biochemical models [69] |
| Annotation Databases | MIRIAM Registry (Identifiers.org) | Catalog of standard URIs for unambiguous biological entity identification [70] |
| Model Repositories | BioModels Database | Curated repository of annotated, published computational models [69] |
| Simulation Platforms | COPASI, JWS Online, Virtual Cell | SBML-compliant software for model simulation and analysis [69] |
| Validation Tools | SBML Validator | Online service checking SBML syntax and semantic consistency [69] |
| Containerization | Docker, Singularity | Environment standardization for reproducible execution [71] |
| Workflow Management | Nextflow, Snakemake | Computational pipeline orchestration and dependency management [68] |
Reproducible Model Development Workflow
The integrated workflow for developing reproducible computational models combines MIRIAM annotation principles with SBML implementation in a sequential process that emphasizes verification at each stage. This systematic approach ensures that models are not only computationally functional but also biologically meaningful and reproducible across different research environments.
The workflow incorporates critical verification checkpoints at each transition between stages, with particular emphasis on the annotation and validation phases where MIRIAM compliance is assessed. This integrated approach aligns with broader verification and validation frameworks such as ASME's VVUQ (Verification, Validation, and Uncertainty Quantification) standards, which provide structured methodologies for assessing computational model credibility [1].
The reproducibility crisis in computational science represents both a significant challenge and an opportunity for establishing more rigorous scientific practices. The combined application of MIRIAM guidelines and SBML standardization provides a robust foundation for creating computationally reproducible models that can be reliably shared, verified, and built upon by the scientific community.
Future developments in this area will likely include increased automation of annotation processes through AI-assisted tools, expanded standardization efforts into new modeling domains, and tighter integration with reproducibility verification frameworks. Emerging technologies such as AI-powered replication engines that automatically verify computational findings at the time of publication show particular promise for scaling reproducibility assurance across the entire scientific ecosystem [72]. By adopting and further developing these standards and practices, the research community can transform the reproducibility crisis from a fundamental weakness into a demonstrated strength of computational science.
The integration of adaptive Artificial Intelligence and Machine Learning (AI/ML) models into regulated GxP environments (Good Practice quality guidelines for drug development, manufacturing, and clinical trials) represents a paradigm shift in pharmaceutical research and development. Unlike traditional static software, adaptive AI/ML systems can learn from real-world data, improving their performance over time but also introducing novel challenges for computational model verification and validation [73] [74]. This creates a fundamental tension: the very characteristic that makes these models powerful—their adaptability—clashes with traditional regulatory frameworks designed for static medical products [73]. Within the context of academic research on model verification, this landscape necessitates a new rigorous methodology for ensuring that continuously evolving models remain safe, effective, and reliable throughout their entire lifecycle.
Regulatory bodies, including the U.S. Food and Drug Administration (FDA), have recognized that their "traditional paradigm of medical device regulation was not designed for adaptive artificial intelligence and machine learning technologies" [73]. The core challenge is that a model that changes post-deployment could potentially deviate from its validated state, compromising the integrity of GxP processes and decision-making. Consequently, a new framework for lifecycle management has emerged, centered on principles of robust validation, continuous monitoring, and controlled adaptation. This technical guide details the protocols and experimental methodologies required to meet these regulatory and scientific standards, providing a foundation for verifiable and validated adaptive AI in critical drug development applications.
The regulatory landscape for adaptive AI/ML in GxP is rapidly evolving, with recent guidance crystallizing around a Total Product Lifecycle (TPLC) approach [74]. This approach demands oversight from initial development through post-market performance monitoring, a significant shift from traditional models where post-market changes often triggered new submissions.
In October 2021, the FDA, Health Canada, and the UK's MHRA published Good Machine Learning Practice (GMLP) guiding principles, which have become the cornerstone for AI/ML development in regulated sectors [73] [75] [74]. These principles emphasize:
A revolutionary regulatory mechanism for adaptive AI is the Predetermined Change Control Plan (PCCP). Finalized in FDA guidance in December 2024, the PCCP allows manufacturers to specify planned algorithm modifications in their initial submission [73] [74]. Once authorized, these changes can be implemented without additional premarket review, creating a pathway for controlled, iterative improvement.
A robust PCCP, as outlined in regulatory documents, must contain three essential components [74]:
Table 1: Core Components of a Predetermined Change Control Plan (PCCP)
| PCCP Component | Key Elements | Regulatory Purpose |
|---|---|---|
| Description of Modifications | - Scope and boundaries of changes- Type of modification (e.g., architecture, input data)- Automation level (manual vs. automatic) | To provide a clear, pre-approved envelope for model adaptation, preventing uncontrolled "scope creep." |
| Modification Protocol | - Data management strategies- Retraining procedures and triggers- Performance evaluation methods- Update deployment and rollback processes | To ensure that all changes are implemented using a rigorous, repeatable, and validated methodology. |
| Impact Assessment | - Benefit-Risk analysis of changes- Risk mitigation strategies- Plan for assessing impact on different patient populations | To proactively demonstrate that planned modifications will not compromise device safety and effectiveness. |
Managing an adaptive AI/ML model in a GxP environment requires a seamless, integrated workflow that spans from initial development to post-market surveillance and controlled adaptation. The following diagram maps this complex lifecycle, integrating core development phases with continuous monitoring and the PCCP-driven modification cycle.
Diagram 1: Integrated Lifecycle Management for Adaptive AI/ML in GxP
A cornerstone of lifecycle management is rigorous, ongoing validation. The following protocols are essential for initial authorization and for validating changes under a PCCP.
Objective: To quantitatively evaluate model performance and robustness across diverse operational conditions and patient demographics, ensuring fairness and mitigating bias [75].
Methodology:
Table 2: Key Performance Metrics for Model Validation & Monitoring
| Metric Category | Specific Metrics | Target Acceptance Criterion (Example) |
|---|---|---|
| Overall Performance | Area Under the Curve (AUC-ROC), Balanced Accuracy, F1-Score | AUC-ROC > 0.90 |
| Clinical Sensitivity | Sensitivity (Recall), Positive Predictive Value (PPV) | Sensitivity > 0.95 |
| Clinical Specificity | Specificity, Negative Predictive Value (NPV) | Specificity > 0.85 |
| Fairness & Bias | Minimum Subgroup Performance, Maximum Subgroup Disparity | Max AUC disparity < 0.05 |
| Robustness | Performance under input perturbation | Performance degradation < 5% |
Objective: To continuously monitor the deployed model and trigger the PCCP adaptation cycle when significant drift is detected, indicating model performance may be degrading [76].
Methodology:
The experimental protocols for developing and maintaining adaptive AI/ML models rely on a suite of specialized tools and frameworks. The following table details these essential "research reagents" and their critical functions in the validation and lifecycle management process.
Table 3: Essential Research Reagents for Adaptive AI/ML Model Validation
| Tool/Category | Function in Lifecycle Management | Example Use-Case |
|---|---|---|
| NIST AI RMF | A comprehensive risk management framework to identify, assess, and manage risks throughout the AI lifecycle [75]. | Providing the overarching structure for the risk management activities required in the PCCP Impact Assessment. |
| ISO/IEC 42001 | An international standard for establishing, implementing, and maintaining an Artificial Intelligence Management System (AIMS) [75]. | Creating the quality management system framework for governing AI development, deployment, and monitoring processes. |
| Good Machine Learning Practice (GMLP) | A set of guiding principles for quality system development of ML-based medical devices, covering data management, model training, and evaluation [73] [74]. | Informing the entire Model Development & Training phase, ensuring robust, reproducible, and transparent practices. |
| ALCOA+ Principles | A framework for data integrity ensuring data is Attributable, Legible, Contemporaneous, Original, Accurate, Complete, Consistent, Enduring, and Available [76] [75]. | Governing all data pipelines used for model training and monitoring, which is a fundamental requirement for GxP compliance. |
| Version Control Systems (e.g., Git) | To track and manage changes to code, model architectures, and hyperparameters, ensuring full traceability and reproducibility [75]. | Maintaining an immutable record of every model version deployed, including the exact code state used for retraining under a PCCP. |
| Model Monitoring Platforms | Software tools designed to automatically track model performance, data drift, and concept drift in production environments [76]. | Executing the continuous Performance Monitoring and Drift Detection phases, providing the data for PCCP change triggers. |
The successful integration of adaptive AI/ML into GxP environments hinges on a fundamental shift from a static, point-in-time validation model to a dynamic, evidence-driven lifecycle management paradigm. This requires a deep synergy between regulatory strategy, epitomized by the Predetermined Change Control Plan (PCCP), and rigorous scientific practice, embodied by continuous monitoring and robust validation protocols. For researchers dedicated to computational model verification and validation, this new landscape presents a compelling challenge: to develop methodologies that can prove the ongoing reliability of systems designed to change. By adopting the frameworks, experimental protocols, and tools outlined in this guide, scientists and drug development professionals can not only navigate the current regulatory expectations but also contribute to the foundational research needed to build trustworthy, adaptive intelligence for the future of medicine.
The advent of Beyond Rule of Five (bRo5) therapeutics represents a paradigm shift in drug discovery, enabling targeting of previously "undruggable" proteins with large, flat binding sites. These compounds—which include macrocyclic peptides and proteolysis-targeting chimeras (PROTACs)—violate at least one of Lipinski's Rule of Five criteria, typically exhibiting molecular weights >500 Da, high polar surface area, or increased hydrogen bonding capacity [77]. While this expanded chemical space offers unprecedented therapeutic opportunities, it introduces profound verification and validation (V&V) challenges that conventional small molecule frameworks cannot address. The structural complexity, chameleonic behavior, and unique mechanism of action of bRo5 compounds necessitate equally sophisticated computational and experimental V&V methodologies integrated within a robust regulatory science context.
The validation gap is particularly critical for macrocyclic peptides, which can combine the specificity of biologics with the synthetic accessibility of small molecules, and PROTACs, which operate through event-driven pharmacology by inducing ternary complexes for targeted protein degradation [78] [79]. This technical guide establishes tailored V&V frameworks for these bRo5 modalities, emphasizing computational model credibility, experimental corroboration, and regulatory alignment to ensure research standards meet the demands of this expanding chemical space.
For PROTACs, accurate prediction of ternary complex structure represents the foundational V&V challenge. Recent benchmarking against 36 crystallographically resolved ternary complexes reveals significant performance differences between leading modeling approaches [80]. When assessed using DockQ quantitative interface scoring, PRosettaC outperformed AlphaFold3 in predicting geometrically accurate ternary complexes, though both show limitations.
Table 1: Performance Benchmarking of Ternary Complex Prediction Tools
| Modeling Tool | Methodology | Key Strength | Key Limitation | Optimal Use Case |
|---|---|---|---|---|
| PRosettaC | Rosetta-based protocol with geometric constraints | Superior interface geometry prediction (higher DockQ scores) | Limited linker sampling; fails with misaligned anchors | Systems with well-defined warhead binding modes |
| AlphaFold3 | Deep learning-based multimer prediction | Holistic complex modeling | Performance inflated by accessory proteins | Complexes with stabilizing scaffold proteins |
| SILCS-PROTAC | Monte Carlo/MD simulations with FragMaps | Incorporates protein flexibility and ensemble docking | Computationally intensive for large-scale screening | Predicting PROTAC activity (DC50 correlation) |
The SILCS-PROTAC method addresses critical flexibility considerations by using precomputed ensembles of functional group affinity patterns (FragMaps) and putative protein-protein interaction dimer structures as docking targets [81]. This approach employs a two-step docking method that relaxes PROTAC molecules into dimer FragMaps, with scoring metrics extracted from the most favorable ternary complex in the ensemble. Validation studies demonstrate satisfactory correlation with DC50 values across diverse systems, highlighting its utility for PROTAC optimization [81].
Predicting CNS penetration for bRo5 compounds presents particular challenges, as traditional models like Pfizer's CNS MPO perform poorly with these larger molecules. The CANDID-CNS AI model represents a V&V breakthrough, employing an attentive graph neural network architecture that achieves 87% AUPRC on bRo5 molecules compared to 56% for traditional methods [82]. Importantly, the model distinguishes CNS-penetrant stereoisomers at 68% accuracy compared to 50% for conventional approaches, demonstrating critical sensitivity to stereochemical features that govern bRo5 bioavailability.
Table 2: Performance Comparison of BBB Penetration Prediction Models
| Model | bRo5 AUPRC | Stereoisomer Discrimination AUROC | Chemical Space Coverage | Key Innovation |
|---|---|---|---|---|
| CANDID-CNS | 87% | 68% | Extended bRo5 space | Learns thermodynamic determinants of passive permeability |
| Pfizer CNS MPO | 56% | 50% | Primarily Ro5 space | Rule-based scoring system |
| Traditional QSAR | <50% (estimated) | Not reported | Limited to Ro5 space | Linear regression models |
The model's validation included demonstration that its predictions correlate with quantum mechanical hydration free energy, indicating implicit learning of thermodynamic permeability determinants—a crucial verification for mechanistic credibility in bRo5 applications [82].
For macrocyclic peptides, computational V&V employs structure-guided interface mapping approaches. The Des3PI 2.0 pipeline generates macrocyclic peptides through contact-based scoring functions, with top candidates synthesized for experimental validation [83]. This approach successfully designed peptides targeting the challenging SLIT2/ROBO1 interface—a shallow, extended protein-protein interaction surface resistant to conventional small molecule inhibition. Biophysical validation using TR-FRET and BLI assays confirmed direct binding to the target interface, while pharmacokinetic assessments demonstrated favorable stability profiles, establishing an integrated computational-to-experimental V&V pipeline for macrocyclic peptides [83].
Rigorous experimental validation of PROTAC efficacy requires multi-tiered biochemical and cellular assays. For LAG-3 targeting PROTACs, western blot analysis in Raji-LAG3 cells demonstrated potent, dose-dependent degradation with DC50 values of 0.27 μM for LAG-3 PROTAC-1 and 0.42 μM for LAG-3 PROTAC-3 [84]. This protocol exemplifies the standard approach for PROTAC V&V:
Molecular docking and molecular dynamics simulations provided structural insights into PROTAC-mediated ternary complex formation, correlating computational predictions with experimental degradation efficacy [84].
For macrocyclic peptides targeting protein-protein interactions like SLIT2/ROBO1, comprehensive biophysical and functional characterization is essential:
Binding Affinity Quantification:
In Vitro Pharmacokinetic Profiling:
Lead peptide SP4 from the SLIT2/ROBO1 program demonstrated favorable stability in simulated intestinal fluid and high plasma integrity, establishing a benchmark for macrocyclic peptide V&V [83].
Regulatory frameworks for AI-driven drug development are evolving rapidly, with distinct approaches emerging between major agencies. The FDA employs a flexible, dialog-driven model, while the European Medicines Agency has established a structured, risk-tiered approach outlined in its 2024 Reflection Paper [85]. Both frameworks emphasize:
bRo5 compounds present unique manufacturing challenges that impact validation strategies:
Table 3: Key Research Reagent Solutions for bRo5 V&V
| Reagent/Platform | Function | Application Example |
|---|---|---|
| Digital Validation Platforms (ValGenesis, Kneat Gx) | Automated validation documentation and workflow management | End-to-end validation lifecycle management for FDA compliance [21] |
| CANDID-CNS AI Model | BBB penetration prediction for bRo5 compounds and stereoisomers | Identifying brain-penetrant candidates for CNS targets [82] |
| PRosettaC | Ternary complex structure prediction | PROTAC degrader optimization and mechanistic studies [80] |
| Des3PI 2.0 Pipeline | Structure-guided macrocyclic peptide design | Generating inhibitors for challenging PPIs [83] |
| TR-FRET/BLI Assay Systems | Binding affinity and kinetics quantification | Experimental validation of macrocyclic peptide-target engagement [83] |
| SILCS-PROTAC | Ensemble docking for PROTAC ternary complexes | Predicting PROTAC activity and optimizing linker geometry [81] |
The complexity of bRo5 therapeutics demands integrated V&V workflows that bridge computational predictions and experimental confirmation. The following diagram illustrates a comprehensive validation pipeline for bRo5 drug development:
Integrated V&V Workflow for bRo5 Therapeutics
This workflow emphasizes the iterative nature of bRo5 V&V, where computational predictions inform experimental design, and experimental results refine computational models.
The expansion into bRo5 chemical space represents both a tremendous opportunity and a formidable validation challenge for drug discovery. Macrocyclic peptides and PROTACs demand tailored V&V approaches that address their unique structural and mechanistic features. Computational methods must evolve to accurately model ternary complex formation and membrane permeation, while experimental protocols require enhanced sensitivity to quantify binding and degradation efficacy. Throughout this process, regulatory alignment ensures that V&V frameworks meet the rigorous standards required for therapeutic development.
As AI-driven tools advance and regulatory pathways mature, the field must prioritize transparent model documentation, robust experimental corroboration, and interdisciplinary collaboration. By establishing comprehensive V&V standards specifically designed for bRo5 therapeutics, researchers can fully harness the potential of these innovative modalities to target previously inaccessible disease pathways.
In the field of computational modeling and simulation (CM&S), establishing acceptance criteria is a critical step in the validation process, serving as the definitive measure that determines whether a model's predictions are sufficiently accurate for its specific Context of Use (COU). The U.S. Food and Drug Administration (FDA) has recognized that while standards like ASME V&V 40 provide a vital risk-based framework for establishing model credibility, they do not, by themselves, offer a mechanism for setting the specific acceptance criterion for comparison error—the difference between simulation results and validation experiments [3] [86].
To address this gap, the FDA's Center for Devices and Radiological Health (CDRH) developed a "threshold-based" validation method as a Regulatory Science Tool (RST). This methodology is intended for scenarios where a well-accepted safety or performance criterion for the specific COU is available. It provides a statistically grounded, science-driven means to determine an acceptance criterion, thereby enabling a more objective and defensible determination of model validity for assessing medical device safety [86]. This approach is particularly powerful because it directly links the model's allowed discrepancy to a clinically or safety-relevant threshold, moving beyond arbitrary error margins to a risk-informed validation practice.
The foundational principle of the FDA's threshold-based approach is that the allowable error between a computational model and experimental validation data should be governed by the safety or performance threshold relevant to the medical device's function and patient safety. The method answers a pivotal question: "How close is close enough?" by referencing an independent, clinically significant benchmark [86].
The logical workflow of this method can be visualized as a process that integrates inputs from the model, experiment, and clinical context to arrive at a validation decision.
The method calculates a maximum tolerable model error ((E{max})) based on the known safety threshold ((T)) and the uncertainty inherent in the validation experiments themselves ((U{exp})) [86]. The core logic ensures that the model's error, when combined with experimental uncertainty, does not risk misclassifying an unsafe condition as safe.
The acceptance criterion is derived as follows:
Table 1: Key Inputs and Outputs of the Threshold-Based Framework
| Component | Symbol | Description | Source |
|---|---|---|---|
| Safety/Performance Threshold | (T) | A clinically or biologically established limit for the Quantity of Interest (QoI). | Scientific literature, regulatory guidance, consensus standards. |
| Experimental Mean Value | (M_{exp}) | The mean value of the QoI obtained from validation experiments. | Physical bench tests, in-vivo studies, or high-fidelity reference data. |
| Computational Prediction | (M_{comp}) | The value of the QoI predicted by the computational model. | Finite Element Analysis, Computational Fluid Dynamics, etc. |
| Experimental Uncertainty | (U_{exp}) | The combined uncertainty associated with the validation experimental data. | Uncertainty quantification of the experimental setup and measurements. |
| Comparison Error | (E = |M{comp} - M{exp}|) | The absolute difference between the computational prediction and the experimental mean. | Calculated from validation activity. |
| Max Tolerable Error | (E_{max}) | The calculated acceptance criterion for the comparison error. | Output of the FDA's threshold-based RST algorithm. |
The FDA's threshold-based approach is formalized as a Regulatory Science Tool (RST) with the reference number RST24CM03.01 [86]. Its intended purpose is to be used in conjunction with other established verification and validation methods, not to replace them.
The following protocol outlines the application of the threshold-based approach, synthesizing the FDA's methodology and its demonstrated use cases.
Step 1: Pre-Validation and Verification
Step 2: Experimental Validation and Comparison
Step 3: Decision and Documentation
Table 2: Essential Research Reagents and Materials for Threshold-Based Validation
| Category | Item / Solution | Function in Validation Protocol |
|---|---|---|
| Computational Tools | Finite Element Analysis (FEA) Software | Solves complex biomechanical problems (e.g., stress/strain in implants). |
| Computational Fluid Dynamics (CFD) Software | Models fluid flow and related phenomena (e.g., blood flow, drug delivery). | |
| FDA RST24CM03.01 Algorithm | Computes the maximum tolerable model error ((E{max})) based on T and Uexp [86]. | |
| Experimental Equipment | Biomechanical Test System | Provides controlled mechanical loading for device performance validation. |
| Flow Loop & Pressure Sensors | Generates and measures fluid dynamics conditions for CFD validation [86]. | |
| High-Speed Imaging / PIV | Captures flow fields or structural deformations for quantitative comparison. | |
| Data & Standards | Safety/Performance Threshold (T) | Provides the clinical benchmark against which model accuracy is gauged [86]. |
| ASME V&V 40-2018 Standard | Provides the overarching risk-based framework for establishing model credibility [3]. | |
| ISO 10993 (Biological Evaluation) | May provide safety thresholds for certain biological endpoints. |
The FDA's threshold-based approach was demonstrated in a peer-reviewed study using the FDA nozzle model to illustrate validation techniques in Computational Fluid Dynamics (CFD) simulations for blood damage prediction [86].
This case study underscores the method's utility in a high-stakes application where model accuracy is directly linked to patient safety.
The threshold-based approach is not a standalone standard but a powerful tool that operationalizes the principles of the ASME V&V 40 risk-informed framework [3]. V&V 40 guides users in determining the level of effort required for verification and validation activities based on model risk and the COU. The threshold-based RST directly addresses the "Validation" pillar of this framework by providing a quantitative method to fulfill credibility goals related to model accuracy [86]. The relationship between the high-level framework and the specific tool is synergistic, as illustrated below.
The use of CM&S and in silico methods is projected to become the largest proportion of evidence in medical device submissions [88]. Simultaneously, the FDA is actively promoting the use of AI/ML in drug and device development, as evidenced by its 2025 draft guidance documents [89] [90] [87]. A core tenet of these new guidelines is the establishment of model credibility through a risk-based framework [87] [91].
The threshold-based approach is perfectly aligned with this evolving landscape. It provides a rigorous, quantitative method to establish credibility for computational models, including many AI/ML models, especially those of a mechanistic or physics-based nature. Furthermore, the focus on the COU in the threshold method echoes the FDA's emphasis on the "Context of Use" as a critical element in assessing the credibility of AI models for drug development [90] [87]. By adopting such a scientifically rigorous and regulatory-endorsed method, researchers and drug developers can build the robust evidence needed for successful submissions in this modern paradigm.
Verification and Validation (V&V) frameworks provide the foundational methodology for establishing credibility in computational modeling and simulation, a discipline of increasing importance across engineering and scientific fields. As computational models replace costly physical testing for critical decision-making, the need for standardized processes to ensure their reliability has become paramount [92] [93]. This analysis examines three prominent V&V frameworks: ASME V&V 40, developed for medical devices but applicable more broadly; NASA standards, representing aerospace industry rigor; and NAFEMS guidelines, offering a comprehensive perspective for general engineering simulation. Each framework addresses the fundamental challenge of demonstrating that computational models are both mathematically correct (verification) and scientifically grounded in reality (validation), but they approach this challenge through different philosophical and methodological structures [94] [95]. Understanding their distinct characteristics enables researchers to select and implement the most appropriate framework for their specific context, particularly in regulated fields like drug development and medical device innovation.
Before examining individual frameworks, it is essential to establish the fundamental principles and definitions that underpin V&V practices across disciplines.
Verification: The process of determining that a computational model accurately represents the underlying mathematical model and its solution [1]. Essentially, it answers the question: "Are we solving the equations correctly?"
Validation: The process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended uses of the model [1]. It answers the question: "Are we solving the correct equations?"
Uncertainty Quantification (UQ): The process of characterizing and quantifying uncertainties in modeling and simulation, including those in numerical parameters, physical parameters, and model form [1].
Credibility: The trust in the predictive capability of a computational model for a specific context of use, established through evidence gathered from V&V activities [92].
These foundational concepts are consistently recognized across frameworks, though their implementation varies based on industry-specific requirements and risk considerations.
The ASME V&V 40 standard provides a risk-informed credibility assessment framework specifically developed for medical devices but designed to be general enough for application to other physics-based disciplines [92] [22]. This framework establishes credibility goals based on model risk and context of use, recognizing that not all models require the same level of rigor.
Key Framework Steps:
The standard has gained significant traction in regulatory contexts, with the U.S. Food and Drug Administration (FDA) recognizing it as a consensus standard [92] [11]. This regulatory acceptance makes it particularly valuable for drug development and medical device applications where submissions to regulatory bodies are required.
NASA's approach to V&V is characterized by its systematic, rigorous methodology documented in standards such as NASA-STD-7009 and detailed in verification and validation plan outlines [96]. The NASA framework emphasizes:
NASA's methodology is particularly noted for prescribing required levels for each V&V activity for each risk level, making it highly structured for critical applications [92].
NAFEMS provides comprehensive guidance through publications like the "Guidelines for Validation of Engineering Simulations" and specialized training programs [94] [97]. The NAFEMS approach introduces several key concepts:
NAFEMS adopts ISO 9000 definitions, which embed the more stringent ASME requirements as a subset while allowing for a wider range of validation referents [94]. This flexibility makes it applicable across diverse industrial contexts with varying criticality requirements.
Table 1: Comparative Analysis of V&V Frameworks
| Aspect | ASME V&V 40 | NASA Standards | NAFEMS Guidelines |
|---|---|---|---|
| Primary Domain | Medical devices (generalizable) [92] | Aerospace and space systems [96] | General engineering simulation [94] |
| Core Philosophy | Risk-informed credibility assessment [92] | Systematic, prescribed rigor [92] [96] | Spectrum of validation methods [94] |
| Risk Framework | Model risk categorization (low/medium/high) [92] | Prescribed levels for each risk category [92] | Criticality-based rigor assessment [94] |
| Regulatory Status | FDA-recognized consensus standard [92] [11] | Internal agency standard with government application | Industry consensus guidelines [94] |
| Implementation Flexibility | Moderate - risk-based goals with implementation flexibility [92] | Low - highly structured and prescribed [96] | High - adaptable to application criticality [94] |
| Key Innovation | Context of use-driven credibility goals [92] | Bidirectional requirements traceability [96] | Validation rigor attributes and spectrum concept [94] |
Table 2: V&V Methodology Comparison
| Method Category | ASME V&V 40 | NASA | NAFEMS |
|---|---|---|---|
| Verification Methods | Code and calculation verification [92] | Analysis, inspection, demonstration, test [96] | Code verification, solution verification [94] [97] |
| Validation Referents | Primarily physical experiments [92] | Physical testing under realistic conditions [95] | Physical measurements, simulation results, expert review [94] |
| Uncertainty Quantification | Integrated into credibility assessment [1] | Embedded in verification and validation activities [96] | Explicit methodologies including Monte Carlo, Latin Hypercube, polynomial chaos [97] [98] |
| Documentation Approach | Credibility evidence reporting [92] | Comprehensive V&V plans with detailed sections [96] | Validation plans and rigor characterization [94] |
The ASME V&V 40 framework implements a systematic process for establishing model credibility:
Context of Use Definition: Precise specification of how the model will be used to address a specific question, including all relevant operating conditions, outputs of interest, and decision thresholds [92] [11]. This step defines the boundaries for all subsequent V&V activities.
Model Risk Assessment: Evaluation of the potential consequences of an incorrect decision based on the model results. Risk levels are typically categorized as:
Credibility Goal Setting: Determination of the required level of rigor for each V&V activity based on the model risk. Higher risk models require more extensive and rigorous V&V evidence [92].
This methodology was successfully applied in a computational fatigue analysis of a tibial tray component of an artificial knee implant, demonstrating how credibility goals are established and met for a medical device application [92].
NASA's methodology emphasizes thorough planning and documentation through a structured V&V Plan outline:
Verification Methods Implementation:
Validation Approach: Validation testing conducted under realistic or simulated conditions on end products to determine effectiveness and suitability for mission operations [95]. NASA emphasizes that validation should occur throughout development phases, not only at delivery, enabling early course corrections.
Certification Process: Integration of V&V results with supporting documentation (reports, safety documentation, drawings, waivers) to certify system readiness for operation [96].
NAFEMS promotes a structured yet flexible approach to validation:
Hierarchical Validation: Implementation of validation activities across multiple levels of model complexity, from simple component models to full system representations [94] [97]. This builds confidence incrementally and identifies model limitations at appropriate complexity levels.
Rigor Characterization: Assessment of validation activities against key attributes:
Validation Methods Spectrum: Selection from three categories of validation referents:
This approach allows appropriate validation strategies based on application criticality and available resources.
Figure 1: V&V Framework Relationships and Application Domains
Table 3: Essential Research Reagents and Tools for V&V Implementation
| Tool Category | Specific Examples | Function in V&V Process |
|---|---|---|
| Software Verification Tools | Method of Manufactured Solutions, Method of Exact Solutions [93] [97] | Verify computational code implementation against analytical solutions |
| Discretization Error Estimators | Richardson Extrapolation, Grid Convergence Index (GCI) [93] [98] | Quantify numerical errors from mesh or time step discretization |
| Uncertainty Quantification Methods | Monte Carlo Simulation, Latin Hypercube Sampling, Polynomial Chaos [93] [97] [98] | Propagate and quantify uncertainties in input parameters |
| Validation Metrics | Area metric, deterministic comparison metrics, waveform metrics [93] | Quantify agreement between model predictions and experimental data |
| Sensitivity Analysis Methods | Analysis of Variance (ANOVA), FORM-SORM methods [93] [97] | Identify key parameters driving model outcomes |
| Physical Validation Referents | Dedicated validation experiments, quality-controlled test data [94] | Provide empirical basis for model validation |
| Credibility Assessment Frameworks | Risk-based assessment matrices, credibility scales [92] [94] | Systematically evaluate and document model credibility |
For researchers and professionals in drug development and medical device fields, several critical implementation considerations emerge from this comparative analysis:
Regulatory Alignment: The FDA's recognition of ASME V&V 40 makes it particularly relevant for regulatory submissions [92] [11]. Implementation should focus on clear documentation of the context of use, risk assessment, and credibility evidence generation.
Resource Optimization: The risk-informed approach of ASME V&V 40 enables efficient allocation of V&V resources, focusing rigorous activities on high-risk modeling applications while employing appropriate but less resource-intensive approaches for lower-risk applications [92].
Leveraging Clinical Data: Emerging approaches enhance model credibility using clinical data alongside traditional benchtop validation, particularly valuable in applications like shoulder arthroplasty and cardiac device development [11].
Cross-Framework Integration: Organizations can leverage strengths from multiple frameworks, such as applying NASA's rigorous documentation practices within ASME's risk-based structure or incorporating NAFEMS' spectrum of validation methods for non-critical model components.
The comparative analysis of ASME V&V 40, NASA, and NAFEMS V&V frameworks reveals distinct philosophical approaches united by common objectives of ensuring computational model credibility. ASME V&V 40's risk-based methodology provides a structured yet flexible approach particularly valuable for regulated medical product development. NASA's prescribed rigor offers comprehensive coverage for high-consequence applications, while NAFEMS' spectrum concept enables appropriate implementation across diverse industrial contexts. For drug development researchers and professionals, understanding these frameworks enables informed selection and implementation of V&V strategies that balance scientific rigor with regulatory requirements and resource constraints. As computational modeling continues to expand its role in product development and regulatory decision-making, these frameworks provide the essential foundation for ensuring model credibility and building stakeholder confidence in simulation results.
The adoption of computational modeling and simulation (CM&S) is transforming biomedical research and drug development. These tools enable personalized treatment strategies and can accelerate medical innovation by reducing reliance on traditional physical tests and clinical trials [99]. A pivotal challenge in this field is establishing model credibility—the trust in a model's predictive capability for a specific context of use. Credibility is primarily assessed through Verification, Validation, and Uncertainty Quantification (VVUQ) processes [18] [1].
This technical guide focuses on the critical role of credible comparators—real-world data used as a benchmark to validate computational models. Specifically, we explore the use of historical datasets from existing studies and patient-specific data acquired from individuals. Within a framework of standards for computational model V&V research, leveraging these data sources effectively is key to demonstrating that a model accurately represents the real-world system it is intended to simulate [100]. The proper use of comparators is essential for advancing high-stakes applications, including In Silico Clinical Trials (ISCTs) and patient-specific digital twins [3] [18].
In VVUQ, a comparator is the reference data against which computational model predictions are evaluated to assess their physical accuracy [100].
The ASME V&V 40-2018 standard provides a risk-based framework for establishing model credibility, where the required level of VVUQ effort is determined by the model risk—the consequence of a model producing an incorrect answer in its specific Context of Use (COU) [3] [1]. The standard identifies several "credibility factors" related to the comparator, including the quality of test samples and the equivalency of inputs used in both the simulation and the real-world experiment [100].
Table: Classification of Data Sources for Model Validation
| Data Source Type | Definition | Key Characteristics | Primary Use in Validation |
|---|---|---|---|
| Historical Data | Pre-existing data from previous studies, trials, or literature. | Often large sample sizes; potential variability in collection protocols; retrospective. | Building virtual cohorts; validating population-level model performance; assessing generalizability. |
| Prospective Experimental Data | Data specifically collected for the purpose of model validation. | Controlled conditions; designed for model input/output alignment; can be costly and time-consuming. | High-risk validations where comparator input/output equivalence is critical. |
| Patient-Specific Data | Clinical, imaging, and biomarker data acquired from an individual patient. | High relevance for personalized predictions; often limited quantity per patient; requires personalization techniques. | Validating patient-specific models (PSMs) and digital twins; clinical decision support. |
The selection of a comparator type is driven by the model's COU. For example, a model designed to predict a population-level treatment effect may be validated against aggregated historical data from a clinical trial cohort [101]. In contrast, a model developed to optimize stent placement for a specific patient must be validated against data from that individual [99] [100].
Figure 1: A hierarchical breakdown of the core components that constitute a credible comparator strategy for computational model validation.
The ASME V&V 40 standard provides a structured, risk-informed methodology for planning and assessing VVUQ activities [3] [1]. The framework's workflow begins with three preliminary steps:
This model risk directly informs the necessary level of credibility for each credibility factor, including those related to the comparator. For a high-risk model, such as one used to plan a surgical intervention, the requirements for comparator data quality and validation thoroughness will be substantially higher than for a low-risk model used for early-stage research [3].
When using patient-specific data as a comparator, unique challenges and considerations emerge that extend beyond the validation of generic models. A primary challenge is the assessment of "every-patient" error—the need to understand the model's predictive accuracy not just for a population, but for each individual patient [100]. This is complicated by inter- and intra-user variability in the process of creating the patient-specific model itself, such as differences in how medical images are segmented by different operators [100].
Furthermore, it is critical to distinguish between uncertainties arising from personalized inputs (e.g., a patient's heart geometry from a CT scan) and non-personalized inputs (e.g., generic tissue properties from the literature) [100]. Effective UQ must propagate these different uncertainty sources to the model output to provide a meaningful confidence interval for the individual prediction.
Historical data is often used to construct and validate virtual patient cohorts. The PERMIT project outlines a pipeline for personalized medicine research where the first building block is the "Design, building and management of stratification and validation cohorts" [101]. In this context, patient stratification involves identifying homogeneous patient subgroups based on multimodal profiling, which can include genomic, clinical, imaging, and lifestyle data [101].
Prospective cohorts are often preferred as they enable optimal measurement conditions and controlled data collection. However, retrospective cohorts, built from existing datasets, are also widely used, especially when prospective collection is impractical [101]. A key challenge is the frequent scarcity of information and standards for calculating the optimal size of these cohorts and for integrating multiple retrospective datasets, which can hinder the reproducibility and robustness of the resulting patient clusters [101].
To be credible, historical data must undergo rigorous data validation to ensure its integrity. In clinical data management, this process focuses on three key components [102]:
Modern techniques like Targeted Source Data Validation (tSDV), guided by a Risk-Based Quality Management plan, focus validation efforts on the most critical data fields, such as primary endpoints and adverse events, thereby optimizing resource allocation [102]. Batch validation using automated tools is essential for efficiently handling large historical datasets, ensuring consistent application of validation rules, and maintaining high data quality [102].
A foundational protocol for validating a computational model against patient-specific data is the comparison of methods experiment. Its purpose is to estimate the systematic error (inaccuracy) of the test method (the computational model) by comparing its outputs to those from a comparator method on the same patient samples [103].
Table: Key Experimental Parameters for a Comparison of Methods Study
| Parameter | Recommended Guideline | Rationale & Considerations |
|---|---|---|
| Number of Specimens | Minimum of 40 different patient specimens. | Specimen quality and range of values are more critical than sheer quantity. 100-200 specimens help assess method specificity. |
| Specimen Selection | Cover the entire working range; represent the spectrum of expected diseases. | Ensures validation across all clinically relevant conditions, not just a narrow band. |
| Measurement Replication | Single measurements are common, but duplicates are ideal. | Duplicates help identify sample mix-ups, transposition errors, and confirm if large differences are repeatable. |
| Time Period | Minimum of 5 days, ideally extended over a longer period (e.g., 20 days). | Minimizes systematic errors that could occur in a single analytical run. |
| Specimen Stability | Analyze specimens within two hours of each other by both methods. | Prevents differences due to specimen degradation rather than analytical error. |
The analysis of data from a comparison of methods experiment should combine graphical and statistical techniques [103].
The correlation coefficient (r) is more useful for verifying that the data range is wide enough to provide reliable regression estimates (r ≥ 0.99) than for judging method acceptability [103].
Table: Key Tools and Methods for Comparator-Based Validation
| Tool / Method | Category | Function in Validation |
|---|---|---|
| Electronic Data Capture (EDC) Systems | Data Management | Facilitate real-time data validation at point of entry; automate range, format, and logic checks to reduce manual errors [102]. |
| Statistical Software (e.g., SAS, R) | Data Analysis | Provide robust environments for performing complex statistical analyses, regression, and generating validation graphics [102]. |
| Medical Imaging Data (CT, MRI) | Patient-Specific Inputs | Serve as the primary source for generating patient-specific anatomical geometries for models in cardiology, orthopaedics, etc. [99] [100]. |
| Biosensor & Wearable Data | Patient-Specific Comparator | Provide real-time, continuous physiological data (e.g., ECG, activity) for dynamic calibration and validation of digital twins [18]. |
| Linear Regression Analysis | Statistical Method | Quantifies proportional and constant systematic error between model predictions and comparator data across a range of values [103]. |
| Uncertainty Quantification (UQ) Methods | Analytical Framework | Characterizes how input uncertainties (e.g., measurement error) propagate to uncertainty in model outputs, defining prediction confidence bounds [18] [100]. |
Figure 2: A workflow for the validation of a patient-specific computational model, highlighting the integration of patient data and comparator analysis at each stage.
Globally, regulatory bodies are developing frameworks to evaluate AI/ML-enabled medical devices and computational models. As of late 2025, the U.S. Food and Drug Administration (FDA) has cleared nearly 950 AI/ML-enabled devices and has issued finalized guidance on their review [104]. The European Union's AI Act classifies many medical AI systems as "high-risk," imposing additional requirements on top of the existing Medical Device Regulation [104]. These evolving regulations underscore the necessity of robust validation practices using credible comparators.
Initiatives like the European Health Data Space and the Virtual Human Twins Initiative aim to foster the development and application of computational medicine by addressing challenges related to data access, standardization, and model credibility [99].
For computational models to transition from research to clinical practice, several barriers must be addressed:
A key to building trust is the recognition that computational models, including digital twins, are tools to augment, not replace, clinical expertise. Their predictions should enhance a physician's ability to make decisions under uncertainty, provided the limitations and confidence of those predictions are clearly communicated [18] [104].
The credibility of computational models in biomedical research and drug development hinges on a rigorous, evidence-based demonstration of their predictive capability. This process, framed within the broader context of model verification and validation (V&V), provides the foundation for trusting model predictions, especially when they are used to inform high-stakes decisions in areas like medical device design or therapeutic development [33]. The American Society of Mechanical Engineers (ASME) V&V 40 standard offers a risk-informed framework specifically for establishing this credibility, where the required level of evidence is directly tied to the model's context of use (COU) and the potential impact of a model error [22] [3].
Assessing predictive capability evolves in complexity when moving from simple scalar outputs to complex, time-varying waveforms. A scalar quantity, such as a single peak stress value or an average concentration, provides a discrete data point for comparison. In contrast, a complex waveform—such as a blood pressure trace over a cardiac cycle or an electrophysiological signal—contains multidimensional information on magnitude, phase, frequency, and shape, demanding more sophisticated comparison methodologies [33]. This guide details the principles, metrics, and experimental protocols for quantifying predictive capability across this spectrum, providing a technical roadmap for researchers and drug development professionals engaged in computational modeling.
The terms verification and validation represent two distinct but interconnected processes, often summarized as "solving the equations right" and "solving the right equations," respectively [33].
Verification is the process of ensuring that the computational model is implemented correctly, without errors in the code or the numerical solution of the underlying mathematical equations. It answers the question: "Is the model being solved correctly?" Key activities include code verification (ensuring the software is bug-free) and calculation verification (ensuring the numerical solution is accurate, e.g., through mesh refinement studies) [33] [3]. Systematic mesh refinement is cited as being at the heart of calculation verification, crucial for avoiding misleading results [3].
Validation, in contrast, is the process of determining how accurately the computational model represents the real-world physics it is intended to simulate. It is a comparison against experimental data, which serves as the "gold standard" [33]. Validation answers the question: "Is the right model being solved?" The ASME V&V 40 standard emphasizes that validation is not a binary pass/fail exercise but a risk-informed process of building credibility sufficient for a model's specific context of use [22].
Error and uncertainty are central concepts motivating V&V. Error is a recognizable deficiency, while uncertainty is a potential deficiency arising from a lack of knowledge [33]. The required level of accuracy for a model is not absolute but is determined by its intended use, and credibility is established through repeated statistical testing against appropriate null hypotheses [33].
The choice of metric for assessing predictive capability is dictated by the nature of the model output and its context of use. The following tables and sections summarize standardized metrics for different data types.
Table 1: Standard Scalar Metrics for Predictive Capability Assessment
| Metric | Formula | Application Context | Interpretation |
|---|---|---|---|
| Absolute Error | ( AE = | y_\text{exp} - y_\text{sim} | ) | Single-point comparison | Direct measure of deviation; scale-dependent. |
| Relative Error | ( RE = \frac{| y_\text{exp} - y_\text{sim} |}{| y_\text{exp} |} ) | Single-point comparison | Dimensionless; expresses error as a fraction of the measured value. |
| Bias | ( \text{Bias} = \frac{1}{n} \sum (y_\text{exp} - y_\text{sim}) ) | Multiple data points | Indicates systematic over- or under-prediction. |
| Root Mean Square Error (RMSE) | ( \text{RMSE} = \sqrt{ \frac{1}{n} \sum (y_\text{exp} - y_\text{sim})^2 } ) | Multiple data points | Overall accuracy measure; sensitive to outliers. |
Table 2: Metrics for Complex Waveforms and Field Data
| Metric | Formula / Description | Application Context | Interpretation |
|---|---|---|---|
| Correlation Coefficient (R) | ( R = \frac{ \sum (y_e - \bar{y_e})(y_s - \bar{y_s}) }{ \sqrt{ \sum (y_e - \bar{y_e})^2 \sum (y_s - \bar{y_s})^2 } } ) | Waveform shape similarity | Measures linear relationship and phase agreement; R=1 indicates perfect correlation. |
| Normalized Root Mean Square Error (NRMSE) | ( \text{NRMSE} = \frac{ \text{RMSE} }{ y_{\text{exp},\,\max} - y_{\text{exp},\,\min} } ) | Overall waveform magnitude and shape | Normalizes RMSE by the range of experimental data for cross-study comparison. |
| Magnitude-Squared Coherence (MSC) | ( C{xy}(f) = \frac{ | S{xy}(f) |^2 }{ S{xx}(f) S{yy}(f) } ) | Frequency-domain agreement | Assesses frequency-specific correlation; 1 indicates perfect linear dependence at that frequency. |
| Feature-Specific Analysis | Direct comparison of key features (e.g., peak timing, amplitude, rise time, area under the curve). | Critical performance parameters | Provides direct, clinically or biologically relevant error measures. |
For complex waveforms, a multi-faceted approach is necessary. Analysts should not rely on a single metric but should decompose the waveform into its constituent elements: magnitude, phase, and frequency content [33]. Time-domain metrics like NRMSE provide an overall error measure, while frequency-domain analysis via the MSC can reveal if the model correctly captures dominant oscillatory modes, even if there is a phase shift. Furthermore, a feature-based analysis is often the most insightful, as it focuses validation efforts on the specific aspects of the waveform that are most critical for the model's decision-making purpose [3].
A robust validation protocol is a combined computational and experimental effort designed to provide a stringent test of the model's predictive capability for its context of use [33].
This protocol is suitable for validating models that predict discrete outcomes, such as a maximum principal strain or a diffusion coefficient.
Validating a model that outputs a time-series or spatial field requires a more nuanced protocol.
The following table catalogues key resources, both physical and computational, essential for conducting the verification, validation, and sensitivity analyses described in this guide.
Table 3: Research Reagent Solutions for Computational V&V
| Item / Resource | Category | Function in Predictive Capability Assessment |
|---|---|---|
| Bench-top Mechanical Testing System | Experimental Equipment | Generates gold-standard experimental data for model validation under controlled loading and boundary conditions [33]. |
| Calibrated Sensors & Transducers (e.g., load cells, pressure catheters) | Experimental Equipment | Provides high-fidelity, quantitative measurements of physical quantities (force, pressure, strain) with characterized uncertainty [33]. |
| Strain Gauges or Digital Image Correlation (DIC) | Experimental Equipment | Provides full-field displacement and strain data for comprehensive validation against spatial field outputs from computational models. |
| A Posteriori Error Estimator | Computational Tool | Provides quantitative estimates of numerical error in finite element solutions, guiding mesh refinement for calculation verification [105]. |
| Statistical Analysis Software (e.g., R, Python SciPy) | Computational Tool | Performs quantitative comparison of data (scalar and waveform), hypothesis testing, and uncertainty quantification. |
| ASME V&V 40-2018 Standard | Guidance Document | Provides the risk-based framework for planning and assessing the credibility of computational models, defining concepts like context of use [22]. |
| ASME VVUQ 40.1 Technical Report | Guidance Document | Provides a detailed, end-to-end example of applying the V&V 40 standard, offering practical strategies for defining credibility activities [3]. |
A systematic, metrics-driven approach is paramount for assessing the predictive capability of computational models, from simple scalars to complex waveforms. The process is anchored by the fundamental principles of verification and validation, which together provide evidence that a model is both solved correctly and is representative of reality. The ASME V&V 40 standard's risk-informed framework ensures that the level of rigor applied is appropriate for the model's context of use, particularly in regulated fields like medical device development and drug development [22] [3]. By adhering to the detailed methodologies and metrics outlined in this guide—including scalar error quantification, multi-faceted waveform analysis, and rigorous experimental protocols—researchers can robustly establish model credibility and foster peer acceptance, thereby enabling the confident use of computational simulations in scientific discovery and clinical translation.
The use of computational modeling and simulation (CM&S) in drug development and medical device regulation represents a paradigm shift, offering the potential to reduce reliance on animal testing, lower development costs, and accelerate the delivery of life-saving treatments [106]. However, this potential is contingent upon one critical factor: demonstrating model credibility to regulatory bodies. A Credibility Evidence Dossier is the comprehensive collection of evidence and documentation that substantiates a model's reliability for its specific context of use. Framed within the broader thesis of standards for computational model verification and validation research, this guide provides a structured approach to building this essential dossier, drawing upon current regulatory guidance and industry best practices. The transition is already underway; regulatory agencies like the FDA have issued a roadmap outlining a phased plan for the use of "New Approach Methodologies" (NAMs), which include in silico modelling, in drug development [106].
Regulatory guidance provides the foundation for building a credible dossier. The primary documents governing this space are the FDA guidance "Assessing the Credibility of Computational Modeling and Simulation in Medical Device Submissions" and the ASME V&V 40-2018 standard, which provides a risk-informed framework for assessing model credibility [107] [108].
Understanding the following key concepts is essential for navigating regulatory expectations:
The following workflow outlines the process of defining the model's purpose and planning the evidence generation strategy based on a risk-informed framework.
A well-structured dossier systematically addresses the following core components, with the depth of evidence scaled to the model's risk.
Verification answers the question: "Did I build the model correctly?" It ensures the computational model has been implemented accurately and is free from numerical errors.
Detailed Methodology:
Validation answers the question: "Did I build the right model?" It ensures the model accurately represents the real-world physics, biology, or chemistry of the system for its intended COU.
Detailed Methodology:
Table 1: Summary of Key Credibility Factors and Evidence Types
| Credibility Factor | Description | Recommended Evidence |
|---|---|---|
| Verification | Ensuring the computational model is solved correctly. | Code documentation, unit test results, convergence studies. |
| Validation | Ensuring the model accurately represents reality. | Comparison to experimental/clinical data, validation metrics. |
| Uncertainty Quantification | Assessing the impact of uncertainties on model outputs. | Sensitivity analysis, probabilistic analysis, confidence intervals. |
| Technical Review | Independent assessment of the model and its use. | Review report from subject matter experts independent of the development team. |
The required level of credibility evidence is not one-size-fits-all. The ASME V&V 40 standard introduces a risk-informed framework where the "Credibility Assessment Scale" is determined by the model risk. The following diagram illustrates how different levels of model risk dictate the necessary rigor of evidence for each credibility factor.
Structured presentation of quantitative data is crucial for regulatory reviewers to assess model performance efficiently.
Table 2: Example Validation Metrics Table for a Pharmacokinetic Model
| Output Quantity of Interest | Experimental Mean | Model Prediction | Validation Metric (RMSE) | Acceptance Criterion | Status |
|---|---|---|---|---|---|
| Cmax (ng/mL) | 125.5 | 119.2 | 6.3 | < 15 | Pass |
| Tmax (h) | 2.0 | 2.1 | 0.1 | < 0.5 | Pass |
| AUC0-24 (h*ng/mL) | 845.2 | 880.7 | 35.5 | < 50 | Pass |
| Half-life (h) | 12.5 | 11.8 | 0.7 | < 1.5 | Pass |
Building and validating a credible model requires a suite of tools and methodologies. The following table details key resources used in this field.
Table 3: Key Research Reagent Solutions for Computational Modeling
| Item / Solution | Function in Credibility Evidence Generation |
|---|---|
| ASME V&V 40 Standard | Provides the foundational risk-informed framework for planning credibility activities and defining evidence requirements [108]. |
| FDA Credibility Guidance | Offers specific recommendations on assessing and documenting credibility for regulatory submissions to the FDA [107]. |
| High-Fidelity Experimental Data | Serves as the gold standard for model validation; used to quantify the accuracy of model predictions. |
| Uncertainty Quantification (UQ) Software | Tools to perform sensitivity analysis and propagate uncertainties to understand their impact on model outputs. |
| Version Control System (e.g., Git) | Tracks all changes to the model code and documentation, ensuring reproducibility and auditability. |
| Unit Testing Frameworks | Automates the process of code verification, ensuring that individual model components function as intended. |
Building a persuasive Credibility Evidence Dossier is a systematic, risk-informed process grounded in established standards like ASME V&V 40 and regulatory guidance from the FDA. The dossier must convincingly bridge the gap between a model's digital predictions and its real-world clinical context of use. By strategically focusing verification, validation, and uncertainty quantification efforts on the questions of highest impact to patient safety and decision-making, researchers can construct a robust argument for model credibility. This structured approach not only paves the path to regulatory acceptance but also fosters the development of more reliable, human-relevant tools that promise to make drug development faster, safer, and more efficient.
The establishment of credibility through rigorous Verification and Validation is no longer optional but a fundamental requirement for computational models in biomedical research and development. This guide has synthesized that a risk-based framework, centered on the model's Context of Use, is the cornerstone of an effective V&V strategy, as exemplified by the ASME V&V 40 standard. Success hinges on the meticulous application of best practices across the model lifecycle—from code verification and validation against high-quality experiments to transparent uncertainty quantification. The future will see these principles further embedded in regulatory pathways, enabling greater reliance on in silico evidence, particularly for niche populations and complex therapeutic modalities. The ongoing collaboration between industry, academia, and regulators to refine and harmonize these standards will be paramount in accelerating the delivery of safe and effective therapies to patients.