The Global Landscape of Artificial Intelligence Randomized Clinical Trials
The full data behind the poster
This companion site expands every figure summarized on the printed EBM Live poster — full specialty, geographic, and journal breakdowns; the complete risk-of-bias and AI-reporting picture; the six CAMEO-AI domains; and a structured comparison against RoB 2, CONSORT-AI, and SPIRIT-AI. Use the tabs above, or scan the QR code from anywhere on the page.
Mixed-methods bibliometric analysis + systematic quality assessment. Multi-database search of PubMed, Embase & Scopus for English-language RCTs of ML/AI clinical interventions, Jan 2020–Aug 2025. After de-duplication and QC: N = 2,826 trials.
Cochrane RoB 2 and the CONSORT-AI extension across the full corpus; the SPIRIT-AI extension among trials with protocol materials available; CAMEO-AI piloted descriptively on a stratified n=200 subsample.
AI RCT volume grew 2.5× from 2020 to 2025, but core AI-specific safeguards — external validation, fairness assessment, leakage controls — are documented in well under a third of trials.
Bibliometric details, full corpus (N = 2,826)
Click any specialty or country bar to reveal an example research theme drawn directly from the corpus. All figures below are computed from the underlying enriched dataset (PubMed + Embase + Scopus, Jan 2020–Aug 2025).
International_Collab field (co-authors spanning >1 country among all listed authors) gives
a lower rate of 30.7% (869/2,826) — the 43% figure reflects the abstract's broader collaboration
definition. Both are shown here for transparency.BERTopic with transformer embeddings, run within specialty on free-text titles/abstracts, surfaced 2,600+ distinct micro-themes beyond coarse specialty labels. A representative sample:
Risk of bias, reporting completeness & AI-specific practices
Three established instruments were applied to the corpus. Each assesses a different stage of the trial lifecycle and was not designed with AI-specific validity threats in mind — the motivation for CAMEO-AI, detailed in the next tab.
of trials show high risk of bias or some concerns across the five RoB 2 domains.
- Randomization process
- Deviations from intended interventions
- Missing outcome data
- Measurement of the outcome
- Selection of the reported result
mean completeness against the 37 core CONSORT 2010 items plus 14 AI-specific extension items (11 extensions + 3 elaborations), applied across the full reported corpus.
AI-specific additions cover: intervention description & skills required, integration setting, input/output data handling, human–AI interaction, and error-case analysis.
mean completeness against SPIRIT 2013 plus 15 AI-specific protocol items (12 extensions + 3 elaborations), assessed among trials with protocol materials available.
AI-specific items were the most consistently under-reported items on both CONSORT-AI and SPIRIT-AI.
development
The Critical Appraisal Method for Evaluating Outcomes in AI
CAMEO-AI is a 190-item exploratory framework spanning six domains, piloted descriptively on a stratified subsample of 200 trials. It is proposed as a complement to — not a replacement for — RoB 2, CONSORT-AI, and SPIRIT-AI. Click each domain to expand.
Marked AI-specific methodological variability — even among trials meeting conventional RoB 2 / CONSORT-AI thresholds. Recurrent gaps: data provenance, external & temporal validation, transparency / reproducibility documentation, subgroup & fairness analysis, and ethical-regulatory oversight. Domain- and overall-level numeric scores are summarized descriptively in the source study and are not reproduced here, as CAMEO-AI has not yet been independently validated.
CAMEO-AI vs. RoB 2, CONSORT-AI & SPIRIT-AI
A structural comparison of all four instruments, drawn from their original publications. CAMEO-AI is exploratory and not yet validated; it is shown here as a proposed complementary layer, not a substitute for the other three.
| Dimension | RoB 2 | CONSORT-AI | SPIRIT-AI | CAMEO-AI |
|---|---|---|---|---|
| Primary purpose | Risk-of-bias judgement for a trial result | Completeness of trial reporting | Completeness of trial protocol reporting | Broader methodological & translational-readiness appraisal |
| Trial stage assessed | Completed trial, results stage | Trial report, publication stage | Trial protocol, design / pre-registration stage | Spans protocol through post-deployment |
| Structure | 5 bias domains + signalling-question algorithm | 37 core CONSORT 2010 items + 14 AI items (11 ext. + 3 elab.) | SPIRIT 2013 items + 15 AI items (12 ext. + 3 elab.) | 190 items across 6 domains |
| Purpose-built for AI? | No — general-purpose | Yes | Yes | Yes |
| Output format | Low / Some concerns / High, per domain + overall | Item-by-item completeness checklist | Item-by-item completeness checklist | Descriptive domain & overall scores |
| Validation status | Cochrane-endorsed standard (2019) | Consensus-developed, EQUATOR-registered (2020) | Consensus-developed, EQUATOR-registered (2020) | Exploratory — not yet independently validated |
| Source | Sterne JAC et al., BMJ 2019 | Liu X et al., Nat Med / BMJ / Lancet Digital Health 2020 | Rivera SC et al., Nat Med / BMJ / Lancet Digital Health 2020 | This study (EBM Live, Submission #56) |
Search strategy, screening & appraisal pipeline
A five-step pipeline took the corpus from multi-database search through quality appraisal.
- 43.4% of records (1,226 / 2,826) carried an explicit PubMed "Randomized Controlled Trial" publication-type tag; the remainder were identified via title/abstract and study-design screening.
- Full text was successfully retrieved for only 140 / 2,826 trials (the rest: download failed, no PDF available, or paywalled) — the binding constraint on protocol-level SPIRIT-AI assessment and deep manual verification.
- Citation/impact metrics (total citations, citations-per-year) were checked during analysis and found to contain implausible outliers likely reflecting a journal-level enrichment artifact; they are deliberately excluded from this site rather than presented as reliable findings.
- Specialty and topic labels were produced by LLM-assisted classification with stored per-record rationale, not manual chart review — appropriate for landscape-level bibliometrics, not for individual-trial clinical interpretation.
- English-language restriction may under-represent non-English AI trial activity, particularly from China and other non-Anglophone research-intensive countries.
- RoB 2 / CONSORT-AI / SPIRIT-AI scoring relied on automated/LLM-assisted extraction from available text, calibrated against manual spot-checks rather than full dual-reviewer adjudication.
- CAMEO-AI is exploratory, piloted on a single n=200 subsample, and has not undergone independent inter-rater reliability or external validation testing.
Take-home message, citations & team
AI-RCT publication has grown 2.5× since 2020, but methodological and reporting quality have not kept pace — and the AI-specific safeguards that matter most for safe clinical translation (external validation, fairness assessment, transparency) remain the least documented. CAMEO-AI is offered as an exploratory starting point for closing that gap, alongside — not instead of — RoB 2, CONSORT-AI, and SPIRIT-AI.
1.Sterne JAC, Savović J, Page MJ, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ. 2019;366:l4898.
2.Liu X, Cruz Rivera S, Moher D, Calvert MJ, Denniston AK; SPIRIT-AI and CONSORT-AI Working Group. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat Med / BMJ / Lancet Digit Health. 2020.
3.Cruz Rivera S, Liu X, Chan AW, Denniston AK, Calvert MJ; SPIRIT-AI and CONSORT-AI Working Group. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Nat Med / BMJ / Lancet Digit Health. 2020.
4.Durrani Z, Suhail Z, Faisal A, et al. CAMEO-AI: a critical appraisal method for evaluating outcomes in AI randomized clinical trials — a bibliometric analysis and exploratory quality framework. EBM Live Conference, Submission #56 (this study).
PubMed, Embase & Scopus search exports, enriched via publisher/indexing APIs for affiliations, open-access status, and journal metadata; LLM-assisted specialty/topic classification with stored rationale; BERTopic topic modeling within specialty.
Zahid Durrani¹ · Zubia Suhail² · Asima Faisal³ · Laeeq Malik¹ · Munazza Tayyab⁴ · Kheem Dharmani¹ · Zaryan Hasan⁷ · Sahar Fatima⁵ · Izhar Hasan³·⁶ ★
1 MD ACCES, Karachi, Pakistan · 2 Baqai Institute of Diabetology & Endocrinology, Karachi, Pakistan · 3 Dow University of Health Sciences, Karachi, Pakistan · 4 Rahbar Medical College, Lahore, Pakistan · 5 Ysbyty Gwynedd Hospital, Betsi Cadwaladr University Health Board, Wales, UK · 6 Hackensack Meridian School of Medicine, Nutley, NJ, USA · 7 MD ACCES, Princeton, NJ, USA
Presented at the EBM Live Conference · Submission #56. For correspondence regarding this companion site or the underlying dataset, please refer to the printed poster's listed corresponding author.
Scan to open this page
This QR code always encodes the live address of this page — share it from a poster, slide, or printout.