Blog

The Medical AI Landscape


Two audiences: Everyday People and Medical Pros. The first list is the consumer-facing layer; the second is what your doctor's AI may already be using behind the scenes — which is why Evidence-to-Person Fit matters now more than it did a year ago.

Facts

  • Clinician-side: OpenEvidence accounted for 98.7% of searches across leading AI-enabled clinical reference tools, with traffic rising to ~1.59 million visits/month by June 2025.1
  • Patient-side: In a 2026 Microsoft analysis of over 500,000 health-related Copilot conversations, nearly one in five involved personal symptom assessment, test interpretation, or condition management. Microsoft explicitly notes that benchmark performance does not predict real-world reliability for high-stakes health questions.2

1. Everyday People

Medical AI app Best for Why it stands out
ChatGPT Health OpenAI, announced Jan 2026 · rollout still limited — not yet broadly available to the general public Patient prep + record-explaining (when you get access) OpenAI’s planned general-public health surface inside ChatGPT. Designed to connect to Apple Health, MyFitnessPal, Weight Watchers, Peloton, AllTrails, Function, Instacart, and patient-portal medical records. Capabilities are hard to fully evaluate while rollout is gated — reports so far describe lab explanations and visit-prep, with the usual probabilistic-LLM caveats (overgeneralizing, missing patient-specific evidence). Not a clinician-grade evidence app. No B.S. Med runs as an MCP server inside ChatGPT today, available regardless of whether ChatGPT Health has rolled out to you.
Consensus Quick literature-backed answers AI search engine over 200M peer-reviewed papers. Better for “what does the literature say?” than bedside decision-making.
ChatGPT · Perplexity General Q&A with web grounding General-purpose. Not medical-specific; quality varies by query and by whether the user knows what to ask.
Elicit Systematic reviews + evidence extraction Best for researchers doing screening, data extraction, and evidence mapping. Not point-of-care clinical advice.
Examine.com Supplement & nutrition evidence Curated supplement and nutrition evidence database with quality ratings.
AskClara Patient-facing health Q&A Consumer-facing AI assistant for personal health questions.
Ada Health Conversational symptom triage AI symptom checker that asks personalized follow-up questions and returns ranked condition possibilities with care-routing suggestions. Strong on triage; not an evidence-search tool for treatment decisions.
Mayo Clinic AI Trusted-institution AI search Mayo Clinic’s AI-powered search and summaries over their long-standing patient-education library. High institutional trust; bounded to Mayo’s curated content rather than the open clinical-trial literature.
WebMD Symptom Checker Mass-market symptom triage The most familiar symptom checker for general consumers — large SEO and brand-recognition moat. Closer to a decision-tree than evidence-grounded AI, but still where many people start when something feels off.
UpToDate Expert AI / AI Labs Mainly for clinicians; listed here because individual subscriptions exist Curated clinical reference (clinician-tier pricing) Long-established expert-curated clinical reference with a generative-AI layer on top. Primarily a clinician tool — most access via institutional subscriptions — but individual subscriptions are available at clinician-tier pricing: $499+/year base, $600+/year with AI features. Editorial summaries rather than raw trial-level data.
No B.S. Med This site · MCP in invite-only beta · public ChatGPT App in progress Grounding ChatGPT in trial-level facts; cross-checking the AI used in your care plan Runs as an MCP server inside ChatGPT and Claude — currently invite-only; a public ChatGPT App is in progress. Adds deterministic, patient-specific queries over clinical-trial participant details — eligibility, outcomes, harms — to whatever probabilistic answer your AI gives. Also useful for cross-checking the clinician-grade AI tools your doctor may have used in your care plan (see Medical Pros below). Free.

2. Medical Pros

Tools in this section are gated to verified medical professionals — most require an NPI (the US clinician credential), so patients can’t sign up directly. For everyday people who want the same kind of evidence-grounded answers, No B.S. Med is the patient-side analog — clinical evidence delivered via ChatGPT and Claude, no clinician credential required.

Medical AI app Best for Why it stands out
OpenEvidence Fast clinician answers with citations Built for doctors to look up evidence at the point of care, with citations. Access is gated to verified US clinicians (requires an NPI — the National Provider Identifier). Grounded in peer-reviewed medical literature plus NCCN cancer-care guidelines.
AMBOSS AI Mode / LiSA Evidence-based clinical questions Ranked #1 of 31 AI systems in the Stanford–Harvard NOHARM study for clinical-care safety. Curated US guidelines + drug database + AMBOSS knowledge base.
ChatGPT for Clinicians OpenAI, Jan 2026 · free for verified US pros (physicians, NPs, PAs, pharmacists) Verified-clinician general AI Millions of peer-reviewed studies + clinical guidelines, with citations. HealthBench-evaluated. Supports custom GPTs.
DynaMedex / Dyna AI Evidence grading + drug safety EBSCO/DynaMed workflow integration. Recent KLAS recognition for point-of-care CDS.
Doximity GPT / DoxGPT Physicians already inside Doximity Verified-clinician network with AI-powered clinical reference / literature search inside the existing physician workflow.
ClinicalKey AI / Micromedex Drug + clinical reference at point-of-care Elsevier-curated clinical and drug-information resources with an AI layer.
Glass Health Guideline-directed treatment plan drafting Generates evidence-based treatment plans by searching the latest guidelines, evidence, and drug information across specialties. Treatment plans auto-adjust for patient-specific factors and screen for drug interactions. Clinician-only.
ReachRx Pharmacy decision support AI for pharmacist workflows and clinical-pharmacy decision support.
BastionGPT HIPAA-compliant medical GPT Privacy-first GPT for medical practices that need HIPAA-compliant LLM access.

3. The eval layer — who decides if any of this is good?

Behind every consumer and clinician tool sits a layer most people never see: the benchmarks that decide whether medical AI is “good enough.” This is where claims like “the model performs at physician level” actually come from.

In the eval / audit layer What it is Why it matters for trust
Benchmarks HealthBench (OpenAI) · MedPI · MedPerf Physician-written rubrics / simulated patient–AI encounters that decide whether a model is “good enough.” Where “physician-level” claims come from — but the answer key and grader are themselves auditable (we found errors in HealthBench’s).
Exam-style benchmarks MedQA · MedMCQA Multiple-choice medical-exam questions. Easy to score, far from real care — the field is moving past them.
Eval data, infra & independent leaderboards Scale AI · Lumos · Turing · Vals AI The expert humans + pipelines that produce the graded “ground truth” behind evals, plus third parties that benchmark models across finance, law, and healthcare. An eval is only as good as the people and process behind its labels — and these players grade model answers, not the clinical evidence behind them.
Rubric-eval labs Building prompt-grounded, physician-authored rubrics + failure-pattern analysis, beyond multiple-choice. The frontier of how you even measure patient-facing medical AI.
Benchmark auditors research + No B.S. Med Audit the benchmarks themselves — their citations, ground truth, and grading. A benchmark is only as trustworthy as its own answer key. This is our lane.

The catch: a benchmark is only as trustworthy as its own answer key. Before we ask whether AI beats doctors on a benchmark, someone has to ask whether the benchmark’s own medical evidence holds up — which is exactly what we did to HealthBench (below).

Where No B.S. Med fits: verify, don’t trust

Most of the tools above try to answer medical questions. We do something narrower and more skeptical: we audit the claim. Given a doctor’s note, an insurance denial, or an AI answer, we check whether it’s actually grounded in published clinical evidence — and, just as importantly, whether that evidence applies to you (your age, conditions, and medications change the answer).

It mirrors a broader idea in AI safety: don’t trust an AI’s output — have an independent checker verify it. No B.S. Med is that independent evidence-critic for medicine. We’re building it as a transparent, open audit layer, and we put our money where our mouth is by auditing the benchmarks the field uses to claim progress — starting with our red-team of OpenAI’s HealthBench.

More: the two audits medicine needs · Evidence-to-Person Fit · try it.


References

1  Patel VR, Liu M, Jena AB. Public Interest in an AI-Enabled Clinical Decision Support Tool. JAMA Network Open, Nov 20, 2025.

2  Costa-Gomes B, Tolmachev P, et al. (Microsoft AI). Public use of a generalist LLM chatbot for health queries. Nature Health, April 16, 2026.

Related: The Evidence-to-Person Fit Problem · Medical AI Developer Tooling · About