CAMEO‑AI · SUBMISSION #56
EBM Live Conference · Poster Presentation
AI Methodology & Evidence Quality

The Global Landscape of Artificial Intelligence Randomized Clinical Trials

A bibliometric analysis and systematic quality assessment using the exploratory Critical Appraisal Method for Evaluating Outcomes in AI (CAMEO-AI)
CAMEO-AI · PubMed + Embase + Scopus, Jan 2020–Aug 2025 · N = 2,826 randomized trials
Zahid Durrani¹ · Zubia Suhail² · Asima Faisal³ · Laeeq Malik¹ · Munazza Tayyab⁴ · Kheem Dharmani¹ · Zaryan Hasan⁷ · Sahar Fatima⁵ · Izhar Hasan³·⁶ ★ presenting author
1 MD ACCES, Karachi, Pakistan   2 Baqai Institute of Diabetology & Endocrinology, Karachi, Pakistan   3 Dow University of Health Sciences, Karachi, Pakistan   4 Rahbar Medical College, Lahore, Pakistan   5 Ysbyty Gwynedd Hospital, Betsi Cadwaladr University Health Board, Wales, UK   6 Hackensack Meridian School of Medicine, Nutley, NJ, USA   7 MD ACCES, Princeton, NJ, USA
Corpus
2,826 RCTs
AI / machine-learning randomized clinical trials, Jan 2020–Aug 2025 — up from 266 in 2020 to 660 in the first 8 months of 2025.
Risk of bias
68%
of trials show high risk of bias or significant concerns on Cochrane RoB 2.
Exploratory tool
190-pt
CAMEO-AI framework, piloted descriptively on a stratified subsample of 200 trials.
iWhat this page is

The full data behind the poster

This companion site expands every figure summarized on the printed EBM Live poster — full specialty, geographic, and journal breakdowns; the complete risk-of-bias and AI-reporting picture; the six CAMEO-AI domains; and a structured comparison against RoB 2, CONSORT-AI, and SPIRIT-AI. Use the tabs above, or scan the QR code from anywhere on the page.

Study design

Mixed-methods bibliometric analysis + systematic quality assessment. Multi-database search of PubMed, Embase & Scopus for English-language RCTs of ML/AI clinical interventions, Jan 2020–Aug 2025. After de-duplication and QC: N = 2,826 trials.

Quality instruments applied

Cochrane RoB 2 and the CONSORT-AI extension across the full corpus; the SPIRIT-AI extension among trials with protocol materials available; CAMEO-AI piloted descriptively on a stratified n=200 subsample.

Headline finding

AI RCT volume grew 2.5× from 2020 to 2025, but core AI-specific safeguards — external validation, fairness assessment, leakage controls — are documented in well under a third of trials.

04Results · global landscape

Bibliometric details, full corpus (N = 2,826)

Click any specialty or country bar to reveal an example research theme drawn directly from the corpus. All figures below are computed from the underlying enriched dataset (PubMed + Embase + Scopus, Jan 2020–Aug 2025).

AI RCT publication growth, by year
266
401
448
511
540
660
202020212022202320242025*
2.5× growth from 2020 to full-year-equivalent 2025 (*Jan–Aug, partial year — already exceeds every prior full year). Eligibility: English-language RCTs of ML/AI clinical interventions indexed across PubMed, Embase & Scopus.
Leading specialties (top 20 of 80, share of corpus)
Example theme: Lymph node metastasis prediction (4 trials) · 15.7% of corpus
Example theme: Robot-assisted gait training (7 trials) · 12.7% of corpus
Example theme: Smoking cessation interventions (4 trials) · 9.1% of corpus
Example theme: Atrial fibrillation screening (4 trials) · 6.4% of corpus
Example theme: Robot-assisted gait training (6 trials) · 4.8% of corpus
Example theme: Robot-assisted gait training in cerebral palsy (5 trials) · 4.0% of corpus
Example theme: Early warning systems in critical care (5 trials) · 3.7% of corpus
Example theme: Adenoma detection during colonoscopy (2 trials) · 3.6% of corpus
Example theme: Personalized nutrition for obesity (2 trials) · 3.5% of corpus
Example theme: Robot-assisted partial nephrectomy (3 trials) · 3.5% of corpus
Example theme: Neovascular age-related macular degeneration (14 trials — the single largest topic in the corpus) · 3.4% of corpus
Example theme: Robot-assisted spinal surgery (3 trials) · 3.3% of corpus
Example theme: COPD exacerbation prediction (3 trials) · 2.9% of corpus
Example theme: Drug repurposing for COVID-19 (3 trials) · 2.6% of corpus
Example theme: Uterine fibroid management (2 trials) · 2.4% of corpus
Example theme: Sarcopenia intervention strategies (2 trials) · 1.8% of corpus
Example theme: Age-specific hyperuricemia risk factors (1 trial) · 1.6% of corpus
Example theme: Risk stratification in CAR T-cell therapy (1 trial) · 1.6% of corpus
Example theme: Microvascular invasion in HCC (3 trials) · 1.5% of corpus
Example theme: Virtual patient modeling in rheumatoid arthritis (1 trial) · 1.1% of corpus
Includes anesthesiology, dentistry, dermatology, general medicine, emergency medicine, and 55 further specialties, each individually <1% of the corpus · 10.8% combined.
Remaining 60 specialties account for 304 trials (10.8%). Workflow / decision-support interventions — across all specialties — made up just 0.4% of the corpus.
Corresponding-author affiliation country (top 12)
United States
663
China
321
United Kingdom
87
South Korea
77
Italy
72
Germany
58
Canada
57
Japan
53
Australia
48
Netherlands
40
Turkey
37
France
30
Notable joint-affiliation bylines: China–US (37 trials), UK–US (33), Canada–US (29) — consistent with the 43% international co-authorship rate below.
International collaboration
43% intl. co-author
43% — international co-authorship (abstract-reported rate)
57% — single-country authorship
Methodology note: a stricter byline-only coding of the International_Collab field (co-authors spanning >1 country among all listed authors) gives a lower rate of 30.7% (869/2,826) — the 43% figure reflects the abstract's broader collaboration definition. Both are shown here for transparency.
Leading journals (top 20, by trial count)
1
Scientific Reports
82
2
J. Neuroengineering & Rehabilitation
58
3
J. Medical Internet Research
52
4
PLoS One
49
5
JAMA Network Open
37
6
Trials
32
7
BMC Med. Informatics & Decision Making
31
8
BMC Medical Imaging
29
9
BMJ Open
29
10
medRxiv
29
11
JCO Clinical Cancer Informatics
22
12
Nature Medicine
20
13
JMIR mHealth & uHealth
18
14
Computational & Math. Methods in Medicine
17
15
Nature Communications
17
16
JAMIA
16
17
Medicine
16
18
Frontiers in Oncology
15
19
JMIR Research Protocols
15
20
Contrast Media & Molecular Imaging
14
A long, fragmented publication tail across general-medicine, informatics, and specialty venues — no single journal dominates AI-RCT publishing.
Journal country of registration (top 5)
United States
1,033
England
1,019
Switzerland
212
Netherlands
157
Canada
129
Distinct from author geography: this reflects where the publishing journal is legally registered (heavily weighted toward UK/US/Swiss/Dutch publishing houses), not where the research was conducted — see "corresponding-author affiliation country" above for true research geography.
Intervention / study type (full corpus)
Therapeutic / clinical intervention
47.1%
Prognostic / risk
25.1%
Diagnostic / screening
16.3%
Procedure / surgical / technical
4.4%
Other / unclear
3.3%
Monitoring / follow-up
3.2%
Workflow / decision support
0.4%
Two additional micro-categories — Rehabilitation and Telemedicine / Patient Engagement (2 trials each, <0.1%) — are omitted from the chart for scale.
Open-access status
65% open access
65.3% open access (1,826 of 2,795 with known status)
34.7% closed / subscription access
Topic modeling: example research themes by specialty

BERTopic with transformer embeddings, run within specialty on free-text titles/abstracts, surfaced 2,600+ distinct micro-themes beyond coarse specialty labels. A representative sample:

Robot-assisted gait training Upper-limb rehabilitation post-stroke Neovascular age-related macular degeneration Lymph node metastasis prediction Atrial fibrillation screening Early warning systems in critical care Smoking cessation interventions Adenoma detection during colonoscopy Robot-assisted partial nephrectomy Precision radiotherapy planning
05Results · methodological quality

Risk of bias, reporting completeness & AI-specific practices

Three established instruments were applied to the corpus. Each assesses a different stage of the trial lifecycle and was not designed with AI-specific validity threats in mind — the motivation for CAMEO-AI, detailed in the next tab.

Cochrane RoB 2
68%

of trials show high risk of bias or some concerns across the five RoB 2 domains.

  1. Randomization process
  2. Deviations from intended interventions
  3. Missing outcome data
  4. Measurement of the outcome
  5. Selection of the reported result
Domain-level breakdowns were not reported at the abstract stage; the 68% reflects the overall judgement (high-risk or some-concerns vs. low-risk) across all five domains combined.
CONSORT-AI completeness
61%

mean completeness against the 37 core CONSORT 2010 items plus 14 AI-specific extension items (11 extensions + 3 elaborations), applied across the full reported corpus.

AI-specific additions cover: intervention description & skills required, integration setting, input/output data handling, human–AI interaction, and error-case analysis.

SPIRIT-AI completeness
54%

mean completeness against SPIRIT 2013 plus 15 AI-specific protocol items (12 extensions + 3 elaborations), assessed among trials with protocol materials available.

AI-specific items were the most consistently under-reported items on both CONSORT-AI and SPIRIT-AI.

AI quality practices: how often documented?
Prospective temporal validation
31%
Data-leakage safeguards
27%
External validation
23%
Code availability
18%
Algorithmic fairness assessment
12%
Every core AI safeguard was documented in well under one-third of trials — fairness assessment in just 1 in 8. These five practices are not formally part of RoB 2, CONSORT-AI, or SPIRIT-AI; they were extracted descriptively to characterize the corpus and motivate CAMEO-AI.
Where appraisal gaps emerge across the AI model lifecycle
1
Data
2
Model
development
3
Validation
4
Deployment
5
Monitoring
Conventional instruments (RoB 2, CONSORT-AI, SPIRIT-AI) concentrate on validation and deployment-stage reporting; data provenance and post-deployment monitoring (highlighted) are the most consistently under-examined stages.
06CAMEO‑AI framework

The Critical Appraisal Method for Evaluating Outcomes in AI

CAMEO-AI is a 190-item exploratory framework spanning six domains, piloted descriptively on a stratified subsample of 200 trials. It is proposed as a complement to — not a replacement for — RoB 2, CONSORT-AI, and SPIRIT-AI. Click each domain to expand.

Whether the trial's eligibility criteria, comparator, randomization scheme, and outcome selection are appropriate for an AI-enabled intervention — including whether the comparator reflects real-world clinical workflow rather than an idealized baseline, and whether outcomes capture clinically meaningful endpoints rather than purely algorithmic performance metrics.
Provenance, representativeness, and documentation of the data used to train and evaluate the AI system — including population coverage, label quality, handling of missing or noisy inputs, and disclosure of data sources, consistent with the data-leakage and provenance gaps identified across the corpus.
Soundness of model selection, training procedure, hyperparameter tuning, and handling of class imbalance or confounding — the technical complement to RoB 2's randomization and outcome-measurement domains, adapted to algorithmic rather than purely statistical methods.
External and prospective temporal validation, calibration, and subgroup performance — directly probing the practices found to be documented in well under one-third of trials corpus-wide (external validation 23%, temporal validation 31%).
Code, model, and data availability; versioning; and sufficiency of reporting for independent replication — mapping onto the corpus-wide code-availability rate of just 18%.
Algorithmic fairness assessment across demographic subgroups, informed-consent handling of AI-specific risks, and regulatory clearance status — the domain most sparsely documented in the corpus, with fairness assessment present in just 12% of trials.
CAMEO-AI pilot (n = 200, exploratory)

Marked AI-specific methodological variability — even among trials meeting conventional RoB 2 / CONSORT-AI thresholds. Recurrent gaps: data provenance, external & temporal validation, transparency / reproducibility documentation, subgroup & fairness analysis, and ethical-regulatory oversight. Domain- and overall-level numeric scores are summarized descriptively in the source study and are not reproduced here, as CAMEO-AI has not yet been independently validated.

Framework comparison

CAMEO-AI vs. RoB 2, CONSORT-AI & SPIRIT-AI

A structural comparison of all four instruments, drawn from their original publications. CAMEO-AI is exploratory and not yet validated; it is shown here as a proposed complementary layer, not a substitute for the other three.

Dimension RoB 2 CONSORT-AI SPIRIT-AI CAMEO-AI
Primary purpose Risk-of-bias judgement for a trial result Completeness of trial reporting Completeness of trial protocol reporting Broader methodological & translational-readiness appraisal
Trial stage assessed Completed trial, results stage Trial report, publication stage Trial protocol, design / pre-registration stage Spans protocol through post-deployment
Structure 5 bias domains + signalling-question algorithm 37 core CONSORT 2010 items + 14 AI items (11 ext. + 3 elab.) SPIRIT 2013 items + 15 AI items (12 ext. + 3 elab.) 190 items across 6 domains
Purpose-built for AI? No — general-purpose Yes Yes Yes
Output format Low / Some concerns / High, per domain + overall Item-by-item completeness checklist Item-by-item completeness checklist Descriptive domain & overall scores
Validation status Cochrane-endorsed standard (2019) Consensus-developed, EQUATOR-registered (2020) Consensus-developed, EQUATOR-registered (2020) Exploratory — not yet independently validated
Source Sterne JAC et al., BMJ 2019 Liu X et al., Nat Med / BMJ / Lancet Digital Health 2020 Rivera SC et al., Nat Med / BMJ / Lancet Digital Health 2020 This study (EBM Live, Submission #56)
CAMEO-AI is designed to sit alongside — not replace — the other three: RoB 2 still judges internal validity, CONSORT-AI/SPIRIT-AI still govern reporting completeness, and CAMEO-AI adds an AI-lifecycle lens (data, methodology, validation, transparency, ethics) that none of the three was built to cover.
02Methods

Search strategy, screening & appraisal pipeline

A five-step pipeline took the corpus from multi-database search through quality appraisal.

Pipeline
1
Multi-database search
PubMed, Embase & Scopus, Jan 2020–Aug 2025, English-language RCTs of ML/AI clinical interventions.
2
De-duplication & QC
Record-level de-duplication and quality control across the three source databases → 2,826 unique trials.
3
API enrichment
Bibliographic enrichment (affiliations, journal/open-access metadata, MeSH terms) via publisher and indexing APIs.
4
LLM-assisted classification
Specialty & topic tagging with stored rationale, zero-temperature prompting; BERTopic within-specialty topic modeling.
5
Quality appraisal
RoB 2 + CONSORT-AI across the full corpus · SPIRIT-AI among trials with protocols available · CAMEO-AI on an n=200 stratified subsample.
Screening & full-text access notes
  • 43.4% of records (1,226 / 2,826) carried an explicit PubMed "Randomized Controlled Trial" publication-type tag; the remainder were identified via title/abstract and study-design screening.
  • Full text was successfully retrieved for only 140 / 2,826 trials (the rest: download failed, no PDF available, or paywalled) — the binding constraint on protocol-level SPIRIT-AI assessment and deep manual verification.
  • Citation/impact metrics (total citations, citations-per-year) were checked during analysis and found to contain implausible outliers likely reflecting a journal-level enrichment artifact; they are deliberately excluded from this site rather than presented as reliable findings.
  • Specialty and topic labels were produced by LLM-assisted classification with stored per-record rationale, not manual chart review — appropriate for landscape-level bibliometrics, not for individual-trial clinical interpretation.
Limitations
  • English-language restriction may under-represent non-English AI trial activity, particularly from China and other non-Anglophone research-intensive countries.
  • RoB 2 / CONSORT-AI / SPIRIT-AI scoring relied on automated/LLM-assisted extraction from available text, calibrated against manual spot-checks rather than full dual-reviewer adjudication.
  • CAMEO-AI is exploratory, piloted on a single n=200 subsample, and has not undergone independent inter-rater reliability or external validation testing.
References & correspondence

Take-home message, citations & team

Take-home message

AI-RCT publication has grown 2.5× since 2020, but methodological and reporting quality have not kept pace — and the AI-specific safeguards that matter most for safe clinical translation (external validation, fairness assessment, transparency) remain the least documented. CAMEO-AI is offered as an exploratory starting point for closing that gap, alongside — not instead of — RoB 2, CONSORT-AI, and SPIRIT-AI.

References

1.Sterne JAC, Savović J, Page MJ, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ. 2019;366:l4898.

2.Liu X, Cruz Rivera S, Moher D, Calvert MJ, Denniston AK; SPIRIT-AI and CONSORT-AI Working Group. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat Med / BMJ / Lancet Digit Health. 2020.

3.Cruz Rivera S, Liu X, Chan AW, Denniston AK, Calvert MJ; SPIRIT-AI and CONSORT-AI Working Group. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Nat Med / BMJ / Lancet Digit Health. 2020.

4.Durrani Z, Suhail Z, Faisal A, et al. CAMEO-AI: a critical appraisal method for evaluating outcomes in AI randomized clinical trials — a bibliometric analysis and exploratory quality framework. EBM Live Conference, Submission #56 (this study).

Data source & enrichment

PubMed, Embase & Scopus search exports, enriched via publisher/indexing APIs for affiliations, open-access status, and journal metadata; LLM-assisted specialty/topic classification with stored rationale; BERTopic topic modeling within specialty.

Authors & affiliations

Zahid Durrani¹ · Zubia Suhail² · Asima Faisal³ · Laeeq Malik¹ · Munazza Tayyab⁴ · Kheem Dharmani¹ · Zaryan Hasan⁷ · Sahar Fatima⁵ · Izhar Hasan³·⁶

1 MD ACCES, Karachi, Pakistan · 2 Baqai Institute of Diabetology & Endocrinology, Karachi, Pakistan · 3 Dow University of Health Sciences, Karachi, Pakistan · 4 Rahbar Medical College, Lahore, Pakistan · 5 Ysbyty Gwynedd Hospital, Betsi Cadwaladr University Health Board, Wales, UK · 6 Hackensack Meridian School of Medicine, Nutley, NJ, USA · 7 MD ACCES, Princeton, NJ, USA

Correspondence

Presented at the EBM Live Conference · Submission #56. For correspondence regarding this companion site or the underlying dataset, please refer to the printed poster's listed corresponding author.

Scan to open this page

This QR code always encodes the live address of this page — share it from a poster, slide, or printout.