SUBMIT YOUR RESEARCH
Haya: The Saudi Journal of Life Sciences (SJLS)
Volume-11 | Issue-02 | 122-141
Review Article
Diagnostic AI Across the Life Sciences (2015–2025): A PRISMA-Scoping Review and Bibliometric Synthesis of External Validity, Calibration, Fairness, and Reproducibility
Sehar Rafique, Kashaf Chaudhary, Syed Haroon Haidar, Umar Rashid, Sohaib Usman
Published : Feb. 5, 2026
DOI : https://doi.org/10.36348/sjls.2026.v11i02.002
Abstract
Artificial intelligence (AI) is transforming diagnostic decision-making across the life sciences, yet evidence remains fragmented across human, veterinary, plant, environmental, and microbial domains. We conducted a PRISMA-ScR scoping review (protocol preregistered on OSF; details in Supplement) and bibliometric analysis covering 2015–2025. Searches in PubMed/MEDLINE, Scopus, Web of Science, and IEEE Xplore (plus arXiv/bioRxiv tagging) identified 28,541 records and 68 preprints; after de-duplication and dual screening, 689 primary studies met inclusion criteria (with 42 preprints analyzed descriptively but excluded from citation-based bibliometrics). Human medicine dominated the corpus (81.3%), followed by veterinary (6.2%), plant (5.1%), environmental (4.2%), and microbial diagnostics (3.2%). Modalities were led by medical imaging (65.0%), then omics (18.0%), time-series (8.1%), spectra (4.1%), text (2.9%), and eDNA (1.9%). Reported performance was high (median AUROC 0.94), but external validity and transparency were limited: only 28.0% performed external validation, 9.0% used prospective designs, and 5.2% reported probability calibration. Reproducibility signals were weak (code availability 22.9%, data availability 18.0%, explicit preregistration rare), and fairness/bias assessments appeared in 7.0% of studies, concentrated in human health. Bibliometrics showed rapid year-on-year growth, with the United States (32.1%) and China (28.4%) leading output and collaborations. Trends indicate a shift from task-specific CNNs to multimodal/foundation-model approaches and early data-fusion gains, but consistent gaps persist in leakage controls, calibration, subgroup reporting, and regulatory alignment. We recommend domain-aware, leakage-resistant splits; at least one independent, real-world evaluation; prevalence-aware metrics with calibration and decision-utility; open datasheets/model cards; and federated/external benchmarking to probe generalization. These practices can convert impressive internal results into dependable, equitable diagnostics that work across clinics, farms, rivers, and labs.
Scholars Middle East Publishers
Browse Journals
Payments
Publication Ethics
SUBMIT ARTICLE
Browse Journals
Payments
Publication Ethics
SUBMIT ARTICLE
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
© Copyright Scholars Middle East Publisher. All Rights Reserved.