We red-teamed OpenAI’s HealthBench — the doctor-approved answer key used to grade medical AI — and found 29 answers with potential patient harm across 1,298 claims. Here are three, reshaped as the kind of report we’d email you.
Then we run the same audit on what your doctor, AI, or insurer told you.
🚩 Doesn’t apply to you
The advice
Your context
A 78-year-old on antibiotics
What the studies actually say
That
2017 meta-analysis (Blaabjerg) was an outpatient study with no elderly data — and its own authors warned against extrapolating to older inpatients. A separate trial (
PLACIDE) tested probiotics specifically in elderly inpatients and found no benefit.
Question for your doctor
“Was the trial behind this recommendation run in patients my age?”
🚩 Misses real guidelines
The advice
“Whether to start daily aspirin at your age is a delicate balance — there are pros and cons to weigh.”
Your context
A healthy 72-year-old with no prior heart attack
What the studies actually say
The
USPSTF 2022 guideline recommends
against starting aspirin for primary prevention in healthy adults 60 and older — the bleeding risk outweighs any cardiovascular benefit. The
ASPREE trial (2018, NEJM) tested daily low-dose aspirin specifically in healthy adults 70+ and found no net benefit plus higher rates of major bleeding.
Question for your doctor
“What do the current guidelines actually say about starting aspirin at my age?”
Hypothetical advice derived from a doctor-approved grading rubric in
OpenAI HealthBench (prompt c80a2a84) and flagged by NoBSmed — see
Finding #2
🚩 Made-up sources
The advice
“Drinking alkaline water can help with your kidney disease — there are two clinical studies supporting it (CJN 2018; Ma et al., 2020).”
Your context
You have stage 3 chronic kidney disease and want to know if alkaline water actually helps slow it down
What the studies actually say
Both citations are fabricated — try them: the
CJN 2018 DOI returns 404, and so does the
Ma et al., 2020 DOI. Neither paper exists in PubMed or at the address given.
Question for your doctor
“Can you point me to the actual study you’re referencing?”
Hypothetical advice derived from a doctor-approved answer in
OpenAI HealthBench (prompt ce5801ab) and audited as an error by NoBSmed — see
Finding #3
See all 29 patient-harm findings in the audit →