Artificial intelligence virtual cells as a new paradigm in nephrology: from multiomics integration to clinical translation
Article information
Abstract
Kidney diseases remain heterogeneous and mechanistically complex, and current experimental models only partially capture patient-specific dynamics. We advance artificial-intelligence virtual cells (AIVCs) as a translational framework that learns cross-modal representations of renal cell states, performs high-throughput in silico perturbations, and improves via experiment-in-the-loop updates with organoid and real-world readouts. We outline a methodological pathway—data, representation, intervention, evaluation, deployment—anchored to atlas-grade tissue references (e.g., Kidney Precision Medicine Project) and complementary Human Cell Atlas resources, paired molecular-clinical cohorts, and minimally invasive urinary modalities to enable longitudinal, patient-centered modeling. AIVCs complement physical cells: molecular signals are harmonized into uncertainty-calibrated latent states; counterfactual simulations explore dose-time-context responses; and multicellular interactions are formalized as testable niche dynamics. To address competence and trust, we adopt a risk-proportionate verification-validation-uncertainty perspective aligned with contemporary guidance, emphasizing multisite external validation, counterfactual validity on held-out perturbations, probabilistic calibration with decision utility, and subgroup fairness auditing. Clinically, we map opportunities across mechanism reconstruction, individualized treatment-response prediction, nephrotoxicity triage, trial emulation, and routes toward kidney digital twins. Finally, we propose a practical AIVC-organoid partnership—simulate → experiment → validate → iterate—to prioritize hypotheses, shrink the experimental search space, and link mechanistic evidence to bedside decisions. By integrating representation learning, virtual experimentation, and continual updating with interoperable data standards and governance, AIVCs offer an actionable roadmap for precision nephrology.
Introduction
Chronic kidney disease (CKD) has become a major global public health challenge, affecting more than 10% of the population—roughly 850 million individuals [1]. The burden is particularly heavy in low- and middle-income countries (LMICs) [2], where prevalence and mortality continue to increase and constrained resources limit early screening, thereby exacerbating health inequities [3,4]. By 2030, the number of people requiring kidney replacement therapy is projected to double [5], and even in high-income settings, 15%–20% of patients initiating dialysis die within 1 year [6]. Acute kidney injury (AKI) poses an equally stark threat: an estimated 13.3 million people are affected annually, approximately 85% of episodes arise in community settings—predominantly within LMICs—and approximately 1.4 million deaths are attributed to AKI each year [7,8]. Among hospitalized patients, AKI occurs in 7%–18% and in 25%–30% of those in intensive care units, with case fatality rates approaching 50% [9]. Crucially, CKD and AKI do not occur in isolation. Through bidirectional “acute-chronic” coupling, AKI accelerates CKD progression, whereas CKD increases susceptibility to AKI, compounding long-term outcomes [10,11].
Despite advances in molecular technologies that have illuminated mechanisms across CKD [12–14], AKI [15,16], diabetic kidney disease (DKD) [17,18], and polycystic kidney disease (PKD) [19,20], kidney disorders are characterized by pronounced heterogeneity, diverse clinical phenotypes, and complex trajectories. Traditional models—including animal models and two-dimensional cell culture—have supported discovery but lack structural complexity and individual specificity, limiting their ability to recapitulate human disease [21]. Three-dimensional (3D) organoids partially bridge this gap by better approximating the renal microenvironment; however, their incomplete maturation, substantial batch-to-batch variability, and limited scalability for high-throughput experimentation limit their utility for large-scale mechanistic studies and drug screening [22,23].
A confluence of scientific and technological advances is beginning to change this landscape. Whole-genome, transcriptomic, proteomic, metabolomic, and spatial-omics datasets are being developed at an unprecedented pace [24,25], whereas artificial intelligence (AI)—notably deep learning, generative models such as variational autoencoders, and approaches in explainable AI —has matured in biomedicine [26]. These tools enable integrative analysis of multimodal data and are pushing biological system modeling toward greater precision and individualization.
In this context, the concept of the AI virtual cell (AIVC) has emerged [27]. AIVCs aim to model, in a data-driven manner, the dynamic behavior of cells under diverse genetic backgrounds, environmental contexts, and disease states, with the distinctive potential to integrate multiomics information, perform high-throughput in silico experimentation, and predict individualized therapeutic responses.
Against the backdrop of persistent unmet needs in CKD and AKI and the accelerating momentum of enabling technologies, this review advances a translational program centered on AIVCs in nephrology. We first delineate two gaps that presently impede progress—the paucity of temporally resolved, interventional datasets linking molecular states to bedside phenotypes and the limited generalizability of models across care settings and ancestries—and then show how AIVCs, in synergy with kidney organoids, can begin to close them. We survey the prospects of AIVCs for risk stratification, treatment selection, and nephrotoxicity avoidance; outline an experiment-in-the-loop framework that couples in silico counterfactuals to organoid readouts; and trace the scientific, technical, and ethical prerequisites for robust deployment, from data standards and causal inference to uncertainty quantification, privacy, and fairness. Our aim is to identify actionable strategies that connect mechanisms to decision-making in precision nephrology.
Core concept and distinctive value of artificial intelligence virtual cell
Definition and principles
Over the past decade, cell modeling has shifted from rule-based, mechanism-explicit simulations to a virtual-cell paradigm. Early whole-cell efforts—famously the Mycoplasma genitalium model—demonstrated that molecularly detailed simulations could reproduce a bacterial life cycle [28], but also revealed limits in scaling to the nonlinear, multiscale complexity of human systems [29].
Building on rapid advances in multiomics and modern AI, the AIVC has emerged as a data-driven framework. Large neural models learn universal, cross-modal representations of biological states across molecular, cellular, and tissue scales [27]. They also support interpretable in silico experiments via “virtual instruments” (decoders and manipulators) and improve with lab-in-the-loop/active-learning updates as new data arrive. These capabilities—representation, experimentation, and continual improvement—have been explicitly articulated in recent community roadmaps for AIVC [30].
Operationally, an AIVC comprises a cross-modal encoder that maps heterogeneous omics, imaging, and context into a shared latent state; task-specific decoders for readouts (e.g., gene expression, protein abundance, and morphology); and manipulators that implement counterfactual perturbations. Clinical credibility requires calibrated uncertainty (epistemic and aleatoric), the ability to answer causal queries rather than associations alone, and the injection of structural priors from pathways and genetic instruments to prevent implausible counterfactuals. Lab-in-the-loop learning then updates the model on informative experiments selected by active-learning criteria, enabling a virtuous cycle in which prediction, explanation, and validation proceed in lockstep.
Importantly, the AIVC complements rather than replaces physical kidney cells. At the molecular scale, physical cells provide assay-specific readouts that anchor pathway truth, whereas the AIVC encodes them into harmonized latent states that integrate single-cell, spatial, and clinical context to interpolate unobserved conditions. At the cellular scale, biological heterogeneity and experimental constraints limit feasible perturbations, while the AIVC aggregates across donors and experiments to learn uncertainty-calibrated state manifolds for counterfactual dose-time-context testing. At the multicellular scale, tissues instantiate spatial neighborhoods and crosstalk; the AIVC formalizes these as interaction graphs to simulate how alternative therapies reshape niche dynamics before costly experiments.
Unique advantages
AIVC’s distinctive value lies in moving from exploratory description to predictive simulation [31]. Unified representations align heterogeneous omics into a coherent state space, enhancing transfer across tasks and enabling the capture of higher-order, cross-scale mechanisms. Virtual perturbation then offers a high-throughput in silico platform to prioritize hypotheses before wet-lab work [32].
Concurrently, community resources now provide the breadth needed to train and evaluate such models. The Arc Virtual Cell Atlas curates computation-ready observational and perturbational data from >300 million cells [33], including the Tahoe-100M interventional atlas (approximately 100 million single cells spanning approximately 60,000 drug-cell interactions across 50 cancer models and >1,100 compounds) [34]. These datasets are explicitly positioned to fuel AIVC development and benchmarking.
In this way, AIVC supports a “predict-explain-discover” workflow: forecast responses to genetic or pharmacologic inputs, attribute predictions to pathways or features in an interpretable manner, and surface testable hypotheses for targeted experiments—a cycle that accelerates mechanism finding and preclinical decision-making [27].
Why nephrology is a natural fit
Kidney diseases are paradigmatically multifactorial and heterogeneous, with trajectories shaped by interactions across the genome, transcriptome, proteome, metabolome, and spatial context—exactly the settings in which multiomics integration and data-driven representations add the most value [35,36]. Recent reviews and commentaries highlight how AI-enabled multiomics can sharpen subtype discovery [37], early detection, and treatment-response prediction, with implications for cost-effective precision care [38].
Mechanistic studies have already illustrated the benefits of integrated analyses in which AIVC can both learn from and extend. A 2025 C-PROBE analysis in JCI Insight jointly integrated renal transcriptomics with urine/plasma proteomics and metabolomics [39], identifying urinary proteins associated with CKD progression and convergent enrichment of complement/coagulation, cytokine-receptor interaction, and JAK/STAT signaling—pathways directly actionable for prediction and virtual perturbation.
The field also benefits from deep genetic-epigenetic priors. Integrative work has implicated DAB2 via compartment-specific expression quantitative trait loci, DACH1 via transcriptome-wide association study and functional validation, and APOL1 risk variants via extensive genetic and mechanistic evidence—knowledge that can be encoded as constraints or priors within AIVC to improve biological fidelity [40–42].
Two clinically salient arenas further illustrate the upside. First, transplant medicine demands individualized balancing of rejection risk and drug toxicity. An AIVC conditioned on donor-recipient genetics and immune-metabolic states could generate counterfactual trajectories under alternative immunosuppression schedules, with uncertainty guiding surveillance intensity. Second, immune-mediated glomerulonephritides and systemic diseases feature spatially localized injury; joint modeling of single-cell and spatial profiles allows virtual ablation of effector-target crosstalk to prioritize tractable axes for intervention. In both settings, simulated readouts can be harmonized with organoid assays and electronic health record (EHR)-anchored outcomes to tighten the loop between mechanism and care. Taken together, these nephrology-specific features create fertile ground for data-driven, patient-centered modeling—setting the stage for clinically focused applications.
Methodological pathway to build artificial intelligence virtual cell in nephrology
Building an AIVC for nephrology requires an end-to-end, purpose-built loop that links heterogeneous biomedical data to actionable, patient-level decisions via five tightly coupled stages—data, representation, intervention, evaluation, and deployment. On the data side, renal biopsies and organoid/massively parallel signature sequencing readouts, urine-plasma multi-omics, high-content imaging, and longitudinal EHR traces must be harmonized through shared ontologies, batch correction, and semantic alignment so that single-cell and spatial assays can be embedded alongside clinical trajectories in a common learning corpus; this integrated foundation underpins stable “cell-state coordinates” learned through self/weakly supervised cross-modal training and has become a central tenet of recent single-cell integration and trajectory modeling work [43]. Within the representation and dynamics stage, the aim is to turn static snapshots into computable temporal processes. Deep latent-velocity models (e.g., LatentVelo) infer low-dimensional regulatory states that explain lineage-specific kinetics [44], improving trajectory inference over classical velocity assumptions, while neural-ODE–based models (e.g., scNODE) learn continuous-time flows that extrapolate gene expression to unobserved time points and mitigate distribution shift during prediction [45]; optimal-transport formulations (e.g., Waddington-OT) further furnish a principled scaffold for reconstructing ancestry-descendency and regulatory programs from time-course single-cell data, now extended with modern dynamical and computational refinements [46,47].
The intervention stage operationalizes AIVC’s distinctive value: counterfactual, dose- and timing-aware virtual perturbations. Variational and compositional generative models such as scGen and the Compositional Perturbation Autoencoder have demonstrated the ability to predict cellular responses across unseen dosages, cell types, time points, and even species, offering an engineering path from correlation to counterfactuals and combinatorial therapies [48,49]; recent perturbation predictors continue to expand this space with alternative variational autoencoder and diffusion-style designs tailored to single-cell data [50,51]. Rigorous evaluation then moves beyond internal accuracy to decision-focused validation: multisite external testing, calibration of uncertainty, and subgroup fairness auditing, all reported under updated guidance such as ‘Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis-Artificial Intelligence (TRIPOD+AI)’ and ‘the Consolidated Standards of Reporting Trials-Artificial Intelligence (CONSORT-AI)’ extension to ensure that model development, validation, and trialing are transparent, reproducible, and auditable [52].
Finally, real-world deployment in nephrology—where data sovereignty and distribution shift are the norm—benefits from privacy-preserving, multi-institution collaboration and continual monitoring: federated learning studies in kidney imaging and AKI prediction show that centers can jointly train robust models without sharing raw data, pointing to compliant pathways for AIVC training and updating across geographies and populations [53]. In sum, AIVC construction is not model bricolage but the orchestration of standardized renal data resources, biologically faithful spatiotemporal representations, and counterfactual perturbation simulators within a governance and evaluation framework fit for clinical translation—so that single-cell mechanistic insight can be transduced into executable, individualized decisions in nephrology.
To enable effective AIVC construction, we anchor the data backbone on biopsy-based single-cell and spatial references from the Kidney Precision Medicine Project (KPMP) as benchmarks for cellular states and niches, complemented by Human Cell Atlas (HCA) kidney datasets for transfer and external stress-testing [54]. Disease-focused cohorts then supply paired molecular-clinical trajectories for outcome-linked modeling and validation, exemplified by the standardized biopsy/digital-pathology framework of NEPTUNE (Nephrotic Syndrome Study Network) [55]. To capture minimally invasive, time-resolved biology, urinary modalities are incorporated—most notably urinary single-cell RNA-seq (scRNA-seq), which mirrors injury-repair programs and supports longitudinal monitoring. Collection is prospective and FAIR (findable, accessible, interoperable, and reusable)-compliant with consent enabling linkage/re-use [56]; critically, paired sampling across tissue, urine, and plasma at clinically meaningful time points supports counterfactual learning and robust external validation.
Credibility and evaluation of artificial intelligence virtual cell in nephrology
Establishing whether AIVCs are competent and trustworthy demands a risk-proportionate evaluation that couples biological plausibility with clinical utility. We adopt a verification-validation-uncertainty perspective aligned with medical-device credibility frameworks: internal verification of code and numerical stability; biological validation against perturbation-response datasets when available; and explicit uncertainty quantification with calibration so that counterfactuals and risk estimates are interpretable at the bedside. Credibility should scale with intended use and decision risk, echoing the U.S. Food and Drug Administration (FDA)’s risk-informed framework—built on the ASME V&V 40, “Assessing Credibility of Computational Modeling Through Verification and Validation: Application to Medical Devices.”—for assessing computational models [57]. Accordingly, we target evidence along four axes: i) multisite external validation to quantify out-of-distribution generalization; ii) counterfactual validity tested on held-out perturbations with standardized effect-size metrics; iii) probabilistic calibration (e.g., reliability analyses) and decision-analytic value (e.g., net benefit); and iv) subgroup fairness audits across sex, ancestry, and disease stage. Reporting is harmonized to contemporary guidance—TRIPOD+AI for prediction studies [52], ‘Developmental and Exploratory Clinical Investigations of Decision-support systems driven by Artificial Intelligence (DECIDE-AI)’ for early/live evaluations of decision support [58], and ‘Standard Protocol Items: Recommendations for Interventional Trials-Artificial Intelligence (SPIRIT-AI)’/CONSORT-AI for AI trial protocols and reports—so development, validation, and trialing are transparent and reproducible [59].
Clinically driven application scenarios
Across the care continuum, AIVCs are most useful when tethered to concrete decisions. Early detection and triage can be coupled to dynamic EHR predictors to propose mechanistic explanations for impending decline and to simulate preventive strategies. Recent externally validated kidney failure predictors exemplify how such signals can initialize a patient-specific virtual cell for counterfactual testing. With respect to treatment selection, in silico comparisons of candidate therapies provide a rationale for choosing n-of-1 for DKD and glomerulopathies. For safety, virtual nephrotoxicity screens the risk dose and combination space ahead of organoid or microphysiological testing. Finally, at the population level, AIVCs support trial emulation and enrichment by identifying molecularly responsive subgroups, reducing the number of failed studies and unnecessary exposure.
Mechanism mapping and molecular network reconstruction
Kidney diseases unfold through multilayered, dynamic cellular trajectories that traditional models struggle to capture with temporal continuity [37]. Trained on time-resolved and interventional datasets, AIVCs can infer state-transition dynamics and forecast yet-unobserved regimes, enabling two synergistic functions: reconstructing trajectories under genetic, chemical, or environmental perturbations and proposing causal hypotheses via virtual interventions that shrink the experimental search space [27].
AIVCs can integrate single-cell, spatial, and metabolic layers to simulate, for example, the transition from AKI to fibrosis, delineate metabolic reprogramming in DKD, or predict spatially constrained regulatory networks underlying cyst evolution in PKD [60–63]. Recent single-cell and spatial studies have mapped injury-repair trajectories, defined injury-specific microenvironments, and highlighted metabolic rewiring in DKD—empirical scaffolds that AIVCs can learn from and extend through virtual perturbation [64].
Individualized prediction of treatment response
The therapeutic response varies widely across patients—particularly in CKD, AKI, and transplant management—yet current tools to anticipate efficacy and toxicity remain limited. AIVCs address this gap by fusing a patient’s multiomics and clinical context into an individualized virtual cell, allowing in silico simulation of alternative interventions and estimation of benefit-risk before treatment. The feasibility of data-driven response prediction at scale has precedent outside nephrology: the HIV-TRePS system, trained on data from >250,000 patients worldwide, achieved approximately 80% accuracy in independent tests and was designed specifically to aid decisions in resource-limited settings [65].
Within nephrology, dynamic, real-world EHR-based models point in the same direction. KFDeep predicts CKD progression with an internal area under the receiver operating characteristic curve (AUROC) of 0.946 and an external AUROC of 0.805 and applies SHAP-style attribution to key surface clinical drivers—features that can be coupled to an AIVC to close the loop between clinical signals and mechanistic hypotheses [66].
Drug discovery and repurposing
Kidney drug development is hampered by high costs, low success rates, and nephrotoxicity risk [67]. By conducting large-scale in silico triage on virtual cells, AIVCs can prioritize candidates with favorable efficacy and safety profiles before wet-lab work, narrowing the funnel for animal studies and preclinical testing [68–70]. Methodologically, deep learning has already surpassed prior benchmarks in toxicity prediction (for example, the Tox21 challenge), and recent reviews summarize progress across acute and organ-specific endpoints—including nephrotoxicity and carcinogenicity—while emphasizing interpretability and multisource data integration [71,72].
Repurposing is equally tractable: by weaving network pharmacology and knowledge-graph reasoning into AIVCs, one can algorithmically surface disease-specific mechanism-drug matches [73]. Looking forward, the broader push toward programmable virtual humans and in silico trials—with emerging regulatory frameworks for computational model credibility—provides a translational runway for AIVC-guided discovery from molecular to systems scales.
Complementarity and synergy between artificial intelligence virtual cells and kidney organoids
As one of the most physiologically relevant 3D in vitro systems to date, kidney organoids recapitulate nephron-like architecture and retain aspects of filtration and secretion [74], enabling genetic disease modeling and studies of drug-induced AKI [75,76]. Recent adoption of automation and high-content imaging has further improved the throughput of phenotype screening and toxicology assessment, making organoids practical discovery tools rather than bespoke demonstrations [77–79].
However, important limitations remain. Organoids often exhibit incomplete maturation and lack a stable, perfusable vasculature, and self-organization during development introduces batch-to-batch variability that impedes scale and long-term disease modeling [80–82]. Flow or engineered endothelial niches can partially improve vascularization and maturation, and transplantation models increase endothelial survival, but consensus solutions for robust, uniform, long-lived kidney organoids are still evolving [83].
Here, AIVCs provide a complementary counterweight. Practically, organoids capture early nephron programs but remain variably mature and incompletely vascularized; AIVC closes this gap by projecting molecular→cellular→multicellular consequences of candidate interventions, prioritizing conditions most likely to rescue maturation or niche organization for targeted wet-lab validation. Built on multimodal omics and algorithmic control, AIVCs are inherently scalable and programmable: they can run high-throughput virtual perturbations, forecast state transitions, prioritize mechanistic hypotheses, and nominate intervention points that deserve wet-lab attention—the function of a “virtual instrument” at the earliest stage of experimental design [27]. As perturbational single-cell resources grow (for example, giga-scale interventional atlases and curated virtual-cell repositories), they supply the temporal and interventional diversity required to train and benchmark such models.
The more compelling prospect is a closed-loop partnership (Fig. 1). Organoids feed AIVCs with high-fidelity molecular and phenotypic readouts for calibration; AIVCs, in turn, optimize organoid construction and experimentation by predicting informative perturbations, doses, timepoints, and readouts—thereby increasing efficiency and reproducibility. Early studies have demonstrated that AI can enhance organoid image reconstruction, label-free recognition, and quality control, and even enable multilevel, high-speed 3D analyses that stabilize batch consistency [84,85]. As spatial transcriptomics and high-throughput single-cell datasets across organoid systems accumulate, this AIVC-organoid alliance can mature into a hybrid experimental platform with unprecedented precision for mechanism mapping, drug development, and personalized disease modeling [86].
Overview of the AIVC–organoid closed loop for kidney research.
Multimodal inputs (genomics, RNA-seq, spatial, proteomics, organoid readouts, and electronic health record [EHR]) are fed into an AIVC engine that supports representation, virtual perturbation, and continual learning. A four-step organoid loop—simulate, experiment, validate, iterate—closes the model-to-bench cycle. The outputs include mechanistic insights, individualized response prediction, drug screening with nephrotoxicity alerts, and paths to digital twin/virtual trials. A foundation of trust and governance emphasizes transparency, external validation, fairness, privacy/compliance, and a regulatory sandbox. Created in BioRender. Kelai, W. (2025) https://BioRender.com/jzi77su.
AIVC, artificial-intelligence virtual cell; CT, computed tomography; MRI, magnetic resonance imaging; US, ultrasound.
To implement the loop in practice, we propose a simple operating procedure: i) initialize the AIVC with multiomics from tissue, urine and plasma together with prior knowledge; ii) use Bayesian optimization on the AIVC to select a small batch of perturbations (compound, dose, and timing) expected to maximally reduce mechanistic uncertainty; iii) execute these perturbations in standardized kidney organoids or microphysiological systems with prespecified readouts (imaging, scRNA-seq, and proteomics); and iv) update the AIVC with the new evidence, reestimate uncertainty and iterate. This design-of-experiments approach yields shrinking confidence intervals around causal hypotheses and provides a reproducible recipe for moving from exploratory screens to confirmatory assays.
While the complementary roles of organoids and AIVCs are becoming increasingly clear, genuine clinical translation will hinge on overcoming challenges that cut across data, methods, and ethics—a precondition for both scientific credibility and sustainable deployment in practice [87].
Major challenges and future directions
Despite the promise of AIVCs for mechanistic discovery, personalized care, and drug research and development, clinical translation must confront several intertwined obstacles. Foremost is data heterogeneity and reproducibility. Multiomics platforms differ in terms of resolution and batch characteristics; however, missingness and sparsity are common, and modeling studies too often emphasize internal accuracy while neglecting robustness under external shifts [88]. Across health care AI, systematic reviews repeatedly show performance degradation in external validation, highlighting the gap between development cohorts and real-world deployment and underscoring the need for rigorous, transportable evaluation and updating procedures [89,90]. Moreover, distribution shifts (temporal, geographic, demographic, and workflow) can silently erode model reliability. Recent work in medical imaging has shown that monitoring performance alone is insufficient to detect harmful drift, motivating proactive drift detection and adaptive updating [91]. Beyond accuracy, fairness and bias remain central: AI systems can encode demographic shortcuts and yield uneven errors across subgroups, with equity implications for deployment in nephrology [92,93].
Several practical enablers deserve explicit attention. Harmonization of ontologies and data models (for cell types, assays, and outcomes) is a prerequisite for cross-study learning; without it, apparent gains are often spuriously driven by site effects. Privacy-preserving learning—federated, differentially private, or enclave-based—can broaden geographic and ancestry coverage under regulatory constraints, but its utility-privacy trade-offs must be quantified rather than assumed. Model stewardship should move beyond static performance reports to monitored deployment with prespecified drift thresholds and action plans, including human-in-the-loop overrides. Finally, fairness needs outcome-relevant auditing and remediation via data curation and domain adaptation—not merely post hoc reweighting—such that AIVCs benefit patients in low-resource settings as much as they do in high-income centers.
A second barrier is clinical embedding. Many nephrology models are trained in single-center or idealized cohorts without fully accounting for data gaps, incomplete follow-up, or cross-institution variation—factors that contribute to performance collapse during multicenter validation and erode clinician trust. This is precisely why current reporting and appraisal frameworks (TRIPOD+AI for prediction studies; ‘Prediction model Risk Of Bias Assessment Tool-Artificial Intelligence [PROBAST+AI]’ for Risk of Bias) emphasize transparent methods, external validation, and applicability [52,94].
Third, governance and ethics must mature in parallel with methods. When training data are unrepresentative, models risk amplifying sex, race/ethnicity, or age biases. Global guidance now calls for lifecycle governance of AI, including transparency, postdeployment monitoring, and recourse. World Health Organization (WHO) has issued ethics frameworks for AI in health and, more recently, specific guidance for large multimodal models, both of which are relevant to AIVC-style systems that combine diverse data types [95,96]. Finally, there is a regulatory credibility gap for “virtual experiments.” Without risk-informed standards, virtual clinical trials risk appearing precise yet being clinically uninformative. With respect to physics- and physiology-based models, regulators have converged on credibility frameworks—ASME V&V 40 and the FDA’s 2023 final guidance—that align model verification/validation with decision risk [57,97]; a similar discipline is needed as data-driven AIVCs enter the translational pipeline. In short, the field should pivot from “black-box accuracy” to multiscale, interpretable, causally informed modeling with systematic external validation, fairness assessment, and drift-aware maintenance.
Conclusions and calls to action
AIVCs are reshaping nephrology by reconstructing complex pathophysiology, forecasting individualized responses and, in tandem with organoids, closing a simulate-experiment-validate-iterate loop that accelerates translation. Realizing this vision requires coordinated, international action across data, standards, regulation, and the workforce.
First, build global, multicenter data-sharing and validation networks. Kidney-specific efforts such as the KPMP and the HCA provide blueprints for high-fidelity tissue atlases and open data that can calibrate and stress-test AIVCs [98,99]. Such platforms should expand with embedded external validation and equity metrics, operating under robust governance (e.g., General Data Protection Regulation and Health Insurance Portability and Accountability Act frameworks) to enable cross-regional integration of multiomics with clinical data while protecting privacy [100–102].
Second, codevelop regulatory standards tailored to AIVCs. Trial-reporting extensions (SPIRIT-AI/CONSORT-AI) [59,103,104] and development-reporting guidance (TRIPOD+AI, PROBAST+AI) [52,94] should be paired with credibility frameworks for in silico evidence, building on FDA and ASME precedents for computational models and on the European Medicines Agency’s lifecycle guidance for AI in medicines [105]. Together, these can underpin approval pathways for virtual trials and AI-supported decisions in nephrology. In parallel, regulators are piloting sandboxes to derisk real-world deployment; the European Union AI Act mandates that Member States establish AI regulatory sandboxes, and the United Kingdom Medicines and Healthcare products Regulatory Agency’s AI Airlock is already vetting AI-as-a-medical-device use-cases—models for domain-specific AIVC pilots [106,107].
Third, institutionalize cross-disciplinary collaboration and training. Professional societies (e.g., International Society of Nephrology, American Society of Nephrology) can convene consortia for shared benchmarks and prospective, multicenter evaluations of AIVC-augmented decision support, aligned with early-stage clinical evaluation guidance (DECIDE-AI) [58]. Finally, ethics and equity must be first-class requirements, ensuring participation and benefit for LMICs/Global South settings—where kidney disease burden is high—rather than concentrating data and capacity in high-income regions alone. The WHO’s recent large multimodal model guidance offers concrete policy levers for equitable deployment (Fig. 2).
The challenges and future directions of AIVC in nephrology.
The center represents the AIVC engine. Hexagons on the left summarize key barriers: data heterogeneity (multicenter variation, ancestry mix), reproducibility threats (temporal drift, batch effects), methodological limits (uncalibrated uncertainty, domain shift/out-of-distribution [OOD]), and clinical embedding gaps (workflow integration, single-site evidence). Hexagons on the right outline solution pathways: external validation via multicenter benchmarking; prospective real-world datasets (RWD) and registries; privacy-preserving collaboration using federated learning and secure multiparty computation (SMPC); and trustworthy regulation aligned with TRIPOD+AI, PROBAST+AI, and SPIRIT-AI/CONSORT-AI. Connections indicate conceptual correspondence between barriers and remedies. Created in BioRender. Kelai, W. (2025) https://BioRender.com/eeysk4j.
AIVC, artificial-intelligence virtual cell; CONSORT-AI, Consolidated Standards of Reporting Trials-Artificial Intelligence; PROBAST+AI, Prediction model Risk Of Bias ASsessment Tool-Artificial Intelligence; SPIRIT-AI, Standard Protocol Items: Recommendations for Interventional Trials-Artificial Intelligence; TRIPOD+AI, Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis-Artificial Intelligence.
If these strategies take root, AIVCs can evolve into core infrastructures for precision nephrology, not only replaying cellular dynamics but also serving as a third space that fuses mechanistic insight with decision support. Over the next decade, three scientific priorities deserve focus. First, scaling should move from single-cell to multiscale “virtual humans,” linking gene-cell-tissue-organ dynamics, credible pathways already exist via model-informed development programs and risk-based credibility assessment. Second, regulatory sandboxes for the clinical use of AIVC under health-sector regulators should be operationalized, creating safe transitional routes from pilot to practice. Third, virtual clinical trials should be coupled with real-world data to advance kidney digital twins, laying the groundwork for sustainable digital twins that continuously learn from care [108].
Functionally, physical cells supply the mechanistic substrate and boundary conditions, whereas AIVC provides a scale-bridging, counterfactual engine that turns measurement into individualized, testable decisions. With systematic progress on these fronts, AIVCs are poised not only to refine nephrology’s research toolkit but also to anchor a new era of predictive, personalized, and equitable kidney medicine.
Notes
Conflicts of interest
All authors have no conflicts of interest to declare.
Funding
This research was supported by grants from the National Natural Science Foundation of China (82370724), the Qingdao Key Health Discipline Development Fund, and the Qingdao Key Clinical Specialty Elite Discipline project.
Acknowledgments
We thank Shuang quan Su for assisting with figure visualization in BioRender. Some illustrations were created with BioRender.com.
Data sharing statement
No new data were generated or analyzed in this narrative review. All information supporting the findings of this study is contained within the article. For further information, please contact the corresponding author.
Authors’ contributions
Conceptualization, Methodology, Project administration, Visualization: YH, WJ
Formal analysis, Investigation: YH, YW, XQ
Funding acquisition, Supervision: WJ
Writing–original draft: YH
Writing–review & editing: All authors
All authors read and approved the final manuscript.
