Technical validity
Does the model perform on appropriately curated and independent data with prespecified endpoints?
Evidence before adjectives
Performance is more than a single headline metric. Caire’s evidence program is organized around technical validity, clinical utility, workflow impact, and ongoing real-world monitoring.
Evidence framework
Does the model perform on appropriately curated and independent data with prespecified endpoints?
Does performance hold across relevant sites, scanners, acquisition patterns, and patient subgroups?
Does the output improve clinician orientation or decision workflow without introducing harmful automation bias?
Does deployment change meaningful pathway intervals, workload, escalation reliability, or disposition?
Can intended users understand, access, acknowledge, and appropriately act on the output under pressure?
Can shifts in data, workflow, delivery, performance, and user behavior be detected and governed?
Validation blueprint
A credible evaluation separates algorithm performance from the performance of the deployed clinical system.
Specify population, modality, setting, user, output, clinical role, exclusions, and foreseeable misuse.
Prespecify endpoints, thresholds, reference standard, adjudication, missing-data handling, and subgroup analysis.
Evaluate across unseen institutions and relevant technical and demographic variation.
Measure what changes after deployment, including time intervals, alert burden, user behavior, and unintended consequences.
Review performance, delivery health, overrides, complaints, drift signals, and protocol changes under governance.
Peer-reviewed research
Three published retrospective studies evaluate Caire ICH across algorithm performance and AI-assisted physician interpretation. Results should be understood in the context of each study design and population.
Emergency physician reader study
Five board-certified emergency physicians reviewed 532 non-contrast cranial CT scans before and after assistance from Caire ICH.
Read on PubMed →The authors concluded that prospective research with larger cohorts is needed to understand effects on ED logistics and patient outcomes.
External validation study
External retrospective validation used 510 non-contrast head CT scans: 402 with ICH and 108 without ICH.
Read on PubMed →Radiologist reader study
Three board-certified radiologists reviewed 532 non-contrast head CT scans in a retrospective multi-reader, multi-case study, before and after Caire ICH assistance.
Read the full article →Sensitivity, specificity, and inter-reader agreement increases were not statistically significant in this study.
These studies were retrospective and used enriched datasets. They do not by themselves establish prospective clinical outcomes or performance in every deployment population. Study disclosures and author affiliations are available in each publication.
Reporting discipline
| Measure | Why it matters | What must accompany it |
|---|---|---|
| Sensitivity / specificity | Characterizes finding-level discrimination | Confidence intervals, prevalence, reference standard, threshold, and cohort definition |
| Positive / negative predictive value | Reflects expected usefulness in the deployment population | Local prevalence and sampling design |
| Time to notification | Shows technical and delivery latency | Start/end timestamps, failures, and percentile distribution |
| Time to clinical action | Tests whether the full pathway changed | Workflow definition, comparator, adjustment, and clinical context |
| Subgroup performance | Surfaces uneven performance | Sample size, prespecified groups, uncertainty, and limitations |
Research collaboration