somaCURA · Research Position · 2026

Beyond the AI Scribe

Nick Amosson, MD

What your note would look like if the system knew your patient as well as you do.

A research-grounded argument for physician-oriented clinical intelligence — and against the reduction of medical documentation to ambient transcription.

79%

Hybrid human-AI note
approval rate

vs 23% AI-only · Hack 2025

23%

Patients lacking any
clinical impression

n=432 · Egerton-Warburton 2025

44%

Hallucination reduction
with knowledge graphs

5 medical datasets · Chen 2025

0.90

F1 score for RAG-based
problem identification

n=5,118 utterances · Zhang 2025

The ambient scribe asks: what did the doctor say? somaCURA asks: what does the doctor know, and what should they do next? That is the divide. Everything else is implementation.

Problem-Oriented A&P Evidence Graphs Fragment Routing Deterministic Compilation RAG-Enhanced Knowledge Physician-in-the-Loop Intentional Capture

Read the structured abstract — Background, Methods, Findings, Conclusion

Background

Ambient AI scribes have been widely adopted as the default model for AI-assisted documentation. A growing body of evidence suggests this paradigm optimizes for speed of text production rather than quality of clinical reasoning.

Methods

Semantic search across peer-reviewed literature (2018–2026) via Scholar Gateway: 36 unique papers from 6 structured queries spanning ambient scribe efficacy, documentation-integrated decision support, problem-oriented records, diagnostic reasoning, knowledge-augmented systems, and physician-AI collaboration models. Findings mapped against somaCURA's shipping architecture (185,000 lines).

Findings

Hybrid physician-AI notes achieve 79% unedited approval vs. 23% for AI-only (Hack 2025, n=20, 10 blinded reviewers). Editing burden negates time savings for experienced physicians (Atiku 2026, 14 studies). 23% of encounters lack documented clinical impressions (Egerton-Warburton 2025, n=432). Graph-based knowledge retrieval outperforms naive RAG in diagnostic accuracy (He 2025). The "productivity paradox" absorbs scribe efficiency into higher volume, not reduced burnout (Goodson 2025).

Conclusion

The clinical note is a reasoning artifact, not a transcription byproduct. The problem-oriented medical record was built to keep physician thinking in the chart; the scribe paradigm — human and AI — has slowly taken it out. Systems that scaffold reasoning rather than transcribe around it — through structured problem tracking, deterministic evidence routing, and knowledge-augmented compilation — honor both the evidence base and the record's founding intent. The path beyond the scribe is not a new direction. It is a return to the one the record was built for.

The Fundamental Divide

Two Models of Clinical AI

The industry treats documentation as a transcription problem. The research says it's a reasoning problem. The distinction is not semantic — it determines what gets built and what gets lost.

The AI Scribe Model

Passive Listener

Records conversation ambient. AI is a silent third party in the room that the patient didn't ask for.

Transcript → Note

LLM formats speech into clinical prose. No reasoning involved — word arrangement, not clinical judgment.

Doctor as Proofreader

Physician reviews AI-generated text for errors. Taylor (2024): "clinicians may struggle to proofread and correct inconsistencies in ambient AI scribe output."

No Clinical Context

No awareness of problem evolution, prior hospital days, lab trends, medication rationale, or disease trajectory.

Uninvested Third Party

System has no stake in clinical accuracy. Same generic tool for every specialty, every patient, every acuity level.

→

Physician-Oriented Intelligence

Active Collaborator

System tracks problems, routes evidence, maintains the clinical course across hospital days. It knows the patient.

Evidence → Reasoning → Note

Structured clinical assertions compiled into prose only at finalization. LLM is scoped per-problem, never unbound.

Doctor Directs Reasoning

The physician IS the reasoning engine. The system is structured clinical memory — not a prose generator pretending to think.

Deep Clinical Context

Longitudinal patient state, lab ontology, medication indications, prior A&P text, problem status evolution — all per-encounter.

Domain-Native Intelligence

35+ clinical calculators, evidence graphs, RAG-injected guidelines, problem-aware compilation. Built for inpatient medicine.

The Core Thesis

Ambient scribes treat documentation as a transcription problem to solve — the note as a record of what was said. somaCURA treats documentation as a clinical reasoning artifact to produce — the note as a structured argument for a care plan, grounded in evidence, organized by problem, and directed by the physician who owns the patient.

What the Research Says

Peer-Reviewed Evidence for This Approach

These are not cherry-picked papers. Six structured queries across four databases returned 36 unique papers. The convergence was unsolicited — the literature arrived at the same conclusions that drove somaCURA's architecture.

Hybrid > AI-Only

Human-AI Notes Outperform Pure AI 3.4:1

"Hybrid notes written by an attending surgeon with GPT assistance received the highest 'as is' approval rate (79%), outperforming all other groups. GPT-only notes had the lowest approval rate (23%) and the highest incidence of both omissions and overdocumentation."

Design: Blinded multicentric study. 20 operative notes, 5 procedures, 10 reviewers (5 residents, 5 attendings). 8-domain scoring. Effect: 79% vs 23% approval (Cohen's d ≈ 0.6 on completeness).

Hack et al. (2025) · The Laryngoscope, 136(2), 605-615 · doi:10.1002/lary.70063

somaCURA: Doctor directs reasoning, system compiles. Never autonomous generation.

Clinical Reasoning

Documentation IS the Diagnostic Pause

"Accurate and thoughtful documentation of patient encounters is a cornerstone of sound clinical practice, offering time for a diagnostic pause and a thorough record of diagnostic reasoning, differential diagnoses, uncertainties and therapeutic interventions — which is not guaranteed when relying solely on human memory."

Design: Retrospective EMR review, n=432 randomly selected adults across 4 Australian EDs. Finding: 23.4% lacked any documented clinical impression (95% CI 20-28%). Diagnostic error contributes to 66.7% of severe adverse events.

Egerton-Warburton et al. (2025) · Emerg Med Australasia, 37(4) · doi:10.1111/1742-6723.70086

somaCURA: Note is a structured clinical argument. Problem detection prevents reasoning omissions.

The Original Vision

The POMR Was Always About Making Reasoning Visible

"Essential to the POMR was that patients had 'problems' which would be enumerated and defined to the best of a physician's ability. Rather than a comprehensive narrative, the POMR attempted to provide a scientific framework for clinical reasoning."

Context: Point-counterpoint on the future of the SOAP note. Reviews Larry Weed's 1960s Problem-Oriented Medical Record and its philosophical lineage through modern EHRs.

Rodman et al. (2023) · J Hosp Med, 18(10), 957-961 · doi:10.1002/jhm.13180

somaCURA: Problem-oriented A&P with per-problem evidence, status tracking, and clinical course evolution.

Anchoring Risk

Ambient Scribes Introduce Diagnostic Error

"Clinicians may struggle to proofread and correct inconsistencies in ambient AI scribe output. Excessive trust risks decreasing alertness for subtle clinical clues, introducing anchoring bias around AI differentials, or failing to recognize inconsistencies."

Context: Comprehensive review of AI integration in emergency medicine diagnostic workflows. Identifies automation bias, anchoring, and feedback loop risks specific to ambient documentation tools.

Taylor et al. (2024) · Acad Emerg Med, 32(3), 327-339 · doi:10.1111/acem.15066

somaCURA: Doctor feeds reasoning IN. System never generates unsupervised clinical assessments.

Structured Notes

Templates Beat Free Text on Every Metric

"Combined with a small educational intervention, structured note templates can make progress notes more accurate and succinct, make note writing more efficient, and be harnessed to improve quality metrics. Institutions should consider developing internal best practices for clinical documentation."

Context: Quality improvement study implementing standardized progress note templates. Demonstrated measurable improvements in documentation accuracy, conciseness, and adherence to ACP documentation standards.

Kahn et al. (2018) · J Hosp Med, 13(6), 378-382 · doi:10.12788/jhm.2898

somaCURA: progressNoteSpec defines section ordering, problem structure, density rules, lint validation.

Knowledge Graphs

Graph-RAG Outperforms Naive LLMs in Diagnosis

"LLMs enhanced by [graph-based knowledge systems] outperformed naive RAG and raw models in diagnostic accuracy. This architecture enables real-time adaptation to guideline updates and robust fusion of structured and unstructured data."

Design: Knowledge graph integrating 5 international clinical guidelines with NLP-processed multimodal patient data. Finding: Specialist ratings favored graph-enhanced output over both raw LLM and standard RAG approaches.

He et al. (2025) · Med Research, 1(3), 412-423 · doi:10.1002/mdr2.70039

somaCURA: Evidence Graph + RAG ChromaDB + lab ontology = structured knowledge fusion at compile time.

Productivity Paradox

The Scribe Won't Fix What's Actually Broken

"It is premature to assert that AI tools will reduce physician burnout. AI scribes are offered as the solution where documentation is perceived as the problem — but studies show that efficiency gains may be absorbed by increased patient volume, nullifying the burnout relief."

Design: Systematic analysis of peer-reviewed AI scribe studies (OVID Medline, 2020+, 3,805 titles screened). Draws on EHR utilization research and health systems delivery literature to model second-order effects.

Goodson et al. (2025) · Learning Health Systems, 9(4) · doi:10.1002/lrh2.70013

somaCURA: Targets cognitive value — better reasoning, not just faster typing. The note should help you think.

RAG + Problem Mapping

Automated Problem Identification at F1=0.90

"GPT-4o-mini with RAG achieved F1-score of 0.90 for both problem and sign/symptom identification, using the Omaha System framework. RAG-enhanced LLMs effectively automated health problem identification from clinical conversations."

Design: 5,118 utterances from 22 clinician-patient encounters. Compared 3 LLMs (Llama, GPT-4o-mini, GPT-o3-mini) with varied RAG configurations. Best config: 1-utterance window, top_k=15, few-shot + chain-of-thought.

Zhang et al. (2025) · J Nursing Scholarship, 57(6), 1003-1011 · doi:10.1111/jnu.70039

somaCURA: Two-phase problem detection (Opus LLM + consolidation) with ontology-first deterministic routing.

Interpretable Hybrid Systems

Beyond Black-Box AI: Four Levels of Integration

"Level 1: Knowledge retrieval & integration — embedding evidence-based references such as biomarker thresholds or diagnostic criteria directly into reports. The system enhances clinical interpretability without altering the core data."

Context: Proposes a progressive integration framework from knowledge retrieval through adaptive decision support. Argues against opaque AI outputs in clinical settings where interpretability is a safety requirement.

Kang et al. (2026) · Alzheimers Dement: Diagnosis, 18(1) · doi:10.1002/dad2.70236

somaCURA: Evidence provenance visible per problem. Every assertion is traceable to source data.

Problem-Evidence Linking

Hospitals That Link Problems to Data Do Better

"Our hospital uses an EHR capable of linking a problem on the problem list to its supporting progress notes, administrative data, and clinical data such as test results and symptom documentation."

Design: Survey of 95 US hospitals assessing commitments to diagnostic error reduction. Problem-list-to-evidence linking identified as a key infrastructure gap. Only a minority of hospitals reported this capability.

Campione Russo et al. (2024) · J Hosp Med, 20(2), 120-134 · doi:10.1002/jhm.13485

somaCURA: EvidenceGraph deterministically links labs, vitals, and meds to problems via ontology mapping.

Anti-Hallucination

Knowledge Graphs Reduce Hallucinations by 44%

"Technical optimization through knowledge graphs and multi-stage training significantly reduces hallucinations, while clinical integration through expert feedback loops and multidisciplinary workflows enhances output reliability."

Finding: MMed-RAG with domain-aware retrieval achieved an average 44% improvement in factual accuracy across 5 medical datasets (radiology, ophthalmology, pathology). Hierarchical evidence retrieval (guidelines first, then primary studies) outperformed broad database search.

Chen et al. (2025) · J Evidence-Based Medicine, 18(3) · doi:10.1111/jebm.70075

somaCURA: Scoped LLM calls with pre-routed evidence. Hallucination surface is per-problem, never full-note.

The Solution Shop

Move Physicians Off the Production Line

"Clinicians have their maximal value, and greatest job satisfaction, when they focus on unstructured problems and have time to deliver empathic counseling — moving physicians from the 'production line' to the 'solution shop.'"

Context: Vision for EMR redesign that reduces cognitive load of data collection, allows non-clinicians to handle routine documentation, and makes the record "smarter and capable of providing cognitive support."

Calabrese et al. (2023) · Arthritis & Rheumatology, 75(9), 1499-1502 · doi:10.1002/art.42537

somaCURA: Automates structured assembly and evidence routing. Preserves the physician's judgment space.

Architecture Alignment

Five Pillars of Physician-Oriented Intelligence

Each pillar maps shipping production code to the research evidence. This is not a roadmap — it is what runs today.

PILLAR 01

Problem-Oriented Clinical Course

Every note is organized around an enumerated, evolving problem list. Status tracking (improving/worsening/stable) across hospital days. The structure mirrors clinical reasoning, not conversation flow.

Code: progress_generator.py, problem_detector_service.py
Evidence: Rodman 2023, Kahn 2018, Campione Russo 2024

PILLAR 02

Deterministic Evidence Routing

Clinical fragments routed to problems in <5ms using lab ontology, vital mapping, and keyword matching. No LLM during accumulation. Reproducible and auditable: same input, same routing, every time.

Code: clinical_update_router.py, evidence_graph.py
Evidence: Zhang 2025 (F1=0.90), He 2025 (graph-RAG)

PILLAR 03

Knowledge-Augmented Compilation

RAG retrieval from clinical guidelines and hospital protocols at compile time. Evidence graphs link data points to problems. LLM generates prose only at finalization, from pre-scoped, pre-routed context.

Code: rag.py (ChromaDB), note_compiler.py, knowledge/
Evidence: Chen 2025 (44% hallucination reduction), Kang 2026

PILLAR 04

Physician as Reasoning Engine

Doctor approves problem list, feeds clinical observations, directs the assessment. System never generates unsupervised clinical judgments. Physician review is the workflow itself, not a safety afterthought.

Code: note-generation.js (two-phase approval flow)
Evidence: Hack 2025 (79% hybrid vs 23% AI-only), Taylor 2024

PILLAR 05

Embedded Clinical Computation

35+ calculators (AKI staging, MELD-Na, SOFA, APACHE II, acid-base analysis), 150+ lab reference ranges, 20+ scoring systems. Intelligence computed deterministically, not hallucinated by a language model.

Code: clinical_calculator.py (3,300 lines), acid_base_analyzer.py
Evidence: Jerjes 2025 (CDSS + diagnostic acumen), Kang 2026

Dimension	Ambient AI Scribe	somaCURA
Input source	Conversation audio (ambient microphone)	Structured clinical fragments (intentional physician input)
When LLM runs	Continuously during encounter	At finalization only (2-10s, scoped per-problem)
Problem awareness	None — generic note formatting	Full problem list with status, trends, and cross-day evolution
Clinical context	Current conversation only	Longitudinal state, prior A&P, lab trajectories, med indications
Evidence provenance	None — opaque prose generation	Per-problem evidence graph with ontology-mapped links
Knowledge base	LLM training data (static, unverifiable)	RAG from versioned clinical guidelines + hospital protocols
Clinical calculations	None	35+ embedded calculators (AKI, MELD, SOFA, GCS, acid-base, etc.)
Doctor's role	Proofreader of AI output	Clinical reasoning director
Note quality risk	Verbosity, omissions, anchoring bias (Taylor 2024)	Lint rules, density enforcement, structured validation
Hallucination surface	Full note (unscoped generation from conversation)	Per-problem prose only (pre-routed evidence context)
Privacy model	Ambient microphone in exam room	Intentional text input — zero audio capture
Determinism	0% — entirely LLM-dependent	~80% deterministic routing + evidence assembly

Position

What Is a Clinical Note For?

This is the question that separates the two paradigms. Everything else — the architecture, the tooling, the LLM strategy — follows from how you answer it.

If the Note Is a Transcript

Then the problem is speed, and the solution is a microphone. Record what was said, format it, hand it to the doctor to sign. The system's job is to minimize the gap between conversation and documentation. The doctor's role is proofreader. The note captures what happened.

If the Note Is a Reasoning Artifact

Then the problem is cognitive, and the solution is structured intelligence. Track problems across days. Route evidence to where it matters. Surface calculations the physician doesn't have time to run by hand. Compile prose only when the reasoning is complete, from evidence that's been audited and mapped. The doctor's role is thinker. The note captures why.

The somaCURA Position

We chose the second answer. Not because transcription doesn't work — it does, modestly, for a narrow definition of "work" (Atiku 2026: ~2.8 min saved per visit, editing burden often negating gains). But because the note was never meant to be a transcript.

Larry Weed understood this in 1960 when he built the Problem-Oriented Medical Record. Rodman et al. (2023) remind us: the POMR was "a scientific solution" where "patients had problems which would be enumerated and defined to the best of a physician's ability." The note was a structured argument. The problems were the skeleton. The assessment was the reasoning.

somaCURA is the computational realization of that vision — sixty years later, with evidence graphs and deterministic routing and knowledge-augmented compilation where the original had only paper and a physician's discipline.

The ambient scribe asks: what did the doctor say?
somaCURA asks: what does the doctor know, and what should they do next?

That is the divide. Everything else is implementation.

References

Primary Sources

[1] Hack, S. et al. (2025). Surgeon, Trainee, or GPT? A Blinded Multicentric Study of AI-Augmented Operative Notes. The Laryngoscope, 136(2), 605-615. doi:10.1002/lary.70063

[2] Egerton-Warburton, D. et al. (2025). Initial Clinical Impressions Are Absent in Around a Quarter of Adult ED Patient Consultations. Emergency Medicine Australasia, 37(4). doi:10.1111/1742-6723.70086

[3] Rodman, A., Schaye, V., Hofmann, H., & Airan-Javia, S. L. (2023). Point-counterpoint: Time to wash away the SOAP note—Or merely rinse it? J Hosp Med, 18(10), 957-961. doi:10.1002/jhm.13180

[4] Kahn, D. et al. (2018). A Prescription for Note Bloat: An Effective Progress Note Template. J Hosp Med, 13(6), 378-382. doi:10.12788/jhm.2898

[5] Taylor, R. A. et al. (2024). Leveraging AI to reduce diagnostic errors in emergency medicine. Acad Emerg Med, 32(3), 327-339. doi:10.1111/acem.15066

[6] He, Y. et al. (2025). RSA-KG: A Graph-Based RAG Enhanced AI Knowledge Graph for Clinical Decision Support. Med Research, 1(3), 412-423. doi:10.1002/mdr2.70039

[7] Zhang, Z. et al. (2025). From Conversation to Standardized Terminology: An LLM-RAG Approach for Automated Health Problem Identification. J Nursing Scholarship, 57(6), 1003-1011. doi:10.1111/jnu.70039

[8] Goodson, D. A., Garcia, B., Hogarth, M., & Tu, S. (2025). AI and physician burnout: A productivity paradox. Learning Health Systems, 9(4). doi:10.1002/lrh2.70013

[9] Kang, M. J. et al. (2026). Beyond black-box AI: Interpretable hybrid systems for dementia care. Alzheimers Dement: Diagnosis, 18(1). doi:10.1002/dad2.70236

[10] Campione Russo, A. et al. (2024). Hospital commitments to address diagnostic errors: An assessment of 95 US hospitals. J Hosp Med, 20(2), 120-134. doi:10.1002/jhm.13485

[11] Calabrese, L. et al. (2023). "Burnout" Coupled with Workforce Shortages Spells Trouble. Arthritis & Rheumatology, 75(9), 1499-1502. doi:10.1002/art.42537

[12] Chen, F. et al. (2025). Strategies for the Analysis and Elimination of Hallucinations in AI Generated Medical Knowledge. J Evidence-Based Medicine, 18(3). doi:10.1111/jebm.70075

[13] Atiku, S., Olakotan, O., & Owolanke, K. (2026). Usability-Related Barriers and Facilitators Influencing the Adoption and Use of AI Scribes. J Eval Clin Practice, 32(1). doi:10.1111/jep.70365

[14] Wang, C. (2022). The 'body mind map' medical record. Medical Education, 56(11), 1122-1123. doi:10.1111/medu.14924

[15] Jerjes, W. (2025). Polypharmacy: Reframing the Diagnostic Paradigm for Future Clinicians. The Clinical Teacher, 22(4). doi:10.1111/tct.70155

Search Methodology

Literature was retrieved via Scholar Gateway semantic search (6 structured queries, 2018-2026 window). Each query returned 10-15 results ranked by RRF score. 36 unique papers were identified after deduplication. Findings were mapped against somaCURA's production architecture (185,000 lines, 586 files) to validate alignment between peer-reviewed evidence and shipping implementation. Secondary citations (Pelletier 2025, Wright 2025) are referenced as cited within the Atiku (2026) scoping review.

Q1: Problem-oriented documentation + physician-in-loop reasoning
Q2: CDS integrated into documentation workflows
Q3: Structured POMR + clinical course evolution
Q4: Evidence graphs vs ambient dictation
Q5: Active diagnostic reasoning documentation vs passive transcription
Q6: Knowledge-augmented RAG systems in clinical AI

Cite As

Amosson, N. (2026). Beyond the AI Scribe: A research position on physician-oriented clinical intelligence. somaCURA. Retrieved from https://somacura.icu/static/infographics/beyond-the-ai-scribe.html

© 2026 Nick Amosson, MD · somaCURA. This document represents a research position synthesis, not a systematic review. All claims are grounded in the cited peer-reviewed literature. Architecture references describe shipping production code.