Our Verdict
OpenEvidence has built impressive journal partnerships (NEJM, JAMA, Cochrane, NCCN, ClinicalKey) and is fast for trivial bedside lookups. But independent peer-reviewed data shows accuracy below 41% on subspecialty questions, repeatability of 72–77%, and a business model funded by pharmaceutical advertising. Wysor is a subscription product with no ads, the freedom to use GPT-5, Claude, and Gemini directly, private knowledge bases for your own protocols, and a complete clinical workspace around the search.
Feature by Feature
Feature Comparison
Independent Accuracy Data
| Feature | Wysor | OpenEvidence |
|---|---|---|
| Subspecialty board accuracy (MedXpertQA, peer-reviewed pilot 2025) | Uses GPT-5, Claude, Gemini directly — published benchmarks in the 46–60%+ range on this dataset for frontier models | Deep Consult: 41%; standard OpenEvidence: 34% (Jolayemi & Hash, medRxiv 2025, n=100) |
| Same-question repeatability | Determined by underlying model settings; user controls temperature on supported models | 77% (OE) / 72% (Deep Consult) — i.e. ~1 in 4 subspecialty questions returned a different answer between runs |
| Open-ended clinical questions (Low et al.) | Multi-model, agentic; not in the cited study | 24% relevant answers; an agentic competitor (ChatRWD) scored 58% on the same set |
| Acknowledges uncertainty | Models can decline or flag low confidence on request | Authors of MedXpertQA pilot reported neither mode said 'I don't know' even when its chosen answer was not in the listed options |
Business Model
| Feature | Wysor | OpenEvidence |
|---|---|---|
| Funding source | Subscription only — no advertising | Pharmaceutical and medical-device advertising (per OpenEvidence's published advertising policy) |
| Sponsored content in answers | None | Sponsored summaries from pharmaceutical manufacturers may appear alongside answers (per OpenEvidence's own description) |
| Editorial independence guarantee | No advertiser relationships to manage | OpenEvidence states ads and content are kept separate; this is a self-policed boundary, not a structural one |
Data Sources
| Feature | Wysor | OpenEvidence |
|---|---|---|
| PubMed / MEDLINE | 40M+ records, MeSH filters, citation-based reranking, retraction detection | Indexed via partner content |
| Premium journal partnerships | Open literature only | Strong — NEJM (1990+), JAMA + 12 specialty journals, Cochrane Systematic Reviews, NCCN guidelines, Wiley, Elsevier ClinicalKey AI |
| FDA drug labels + FAERS adverse events | 256K FDA labels, 20M FAERS reports — surfaced as a structured data source | Drug data limited to label snippets inside answers |
| RxNorm drug classification | 120K drug concepts — resolves brand, generic, ingredient, and dose form | Not integrated as a structured source |
| ICD-10 coding | Both ICD-10-CM (US) and international variants supported | ICD-10-CM only |
| International regulatory data (EMA, ATC) | Integrated for clinicians who cross borders or work with international pharma data | Not integrated — US-centric corpus |
Privacy & Data Handling
| Feature | Wysor | OpenEvidence |
|---|---|---|
| No training on your queries | Contractually guaranteed with every model provider — every plan | Standard privacy policy |
| Data Processing Agreement on request | Published DPA template; signed by default for paid plans | HIPAA-aligned for US clinicians; no public DPA template for international customers |
| Patient-context query handling | Treated as confidential by contract; no advertiser exposure | Queries inform an ad-funded platform serving sponsored content to the same user |
Clinical Workflow
| Feature | Wysor | OpenEvidence |
|---|---|---|
| Multi-model AI | GPT-5, Claude, Gemini, DeepSeek — switch within one chat | Single proprietary stack |
| Private knowledge bases | Upload guidelines, hospital SOPs, departmental protocols — answers cite from your own corpus | No private knowledge base |
| Email integration (Gmail + Outlook) | Draft referrals, patient summaries, colleague emails in the same workspace | Not available |
| On-device voice transcription | Dictate consult notes; audio never leaves the device | Not available |
| Bedside speed for trivial lookups | A few seconds via chat | Optimised for sub-15-second point-of-care answers |
Access
| Feature | Wysor | OpenEvidence |
|---|---|---|
| Eligibility | Open to anyone — clinicians, researchers, pharmacists, students, life-science teams | Free tier requires verified clinician status; primary verification path is US National Provider Identifier (NPI) |
| Non-US clinician access | Same product worldwide | Available outside US but verification path is friction-heavy; reviewers report international users are often unable to access Pro features |
Plans & Pricing
Pricing Comparison
Wysor
OpenEvidence
Two products with overlapping intent
OpenEvidence has earned its place in US clinical practice. Founded in 2023 by Daniel Nadler as part of the Mayo Clinic Platform Accelerate program, it built strong content partnerships — NEJM, the JAMA Network, Cochrane, NCCN, and Elsevier's ClinicalKey AI — and grew to roughly 400,000 verified-clinician users by mid-2025. Doctors searching for a fast, well-sourced answer between patients use it, and for many trivial questions it works.
Wysor is a different product solving an overlapping problem. It is a subscription AI workspace with medical search as one tool among many: chat across multiple frontier models, knowledge bases for your own protocols, email and voice integration. Where OpenEvidence is a single-purpose tool funded by pharmaceutical advertising, Wysor is a workspace funded by paying users.
This page compares the two using only verifiable sources: peer-reviewed accuracy studies, OpenEvidence's own published policies, and independent reviews. Where OpenEvidence is genuinely better — premium journal partnerships, point-of-care speed for US clinicians — the comparison says so.
What independent studies have found about OpenEvidence accuracy
Several peer-reviewed papers have evaluated OpenEvidence directly. The numbers are sobering relative to its marketing.
Jolayemi & Hash (medRxiv, 2025) ran the most rigorous benchmark to date. Two evaluators submitted 100 subspecialty board questions from the MedXpertQA dataset (a standardised benchmark designed to test medical reasoning beyond what USMLE-style questions can measure) to both OpenEvidence and Deep Consult. Results:
- Standard OpenEvidence: 34% accuracy
- Deep Consult: 41% accuracy
- Best frontier LLM on the same dataset (GPT-o1, per Zuo et al.): 46%
- Repeatability: 77% for OpenEvidence, 72% for Deep Consult — meaning roughly one question in four returned a different answer when re-run
For comparison, the authors note that pathologists interpreting breast biopsies for invasive cancer reach 89–92% intra- and inter-observer agreement. OpenEvidence's same-question consistency is lower than that.
Deep Consult also takes considerably longer. Median response time was 240 seconds versus 13 seconds for standard OpenEvidence, and Deep Consult averages 33 references per answer (vs 5 for standard), which the authors note increases verification burden materially.
Low et al. compared OpenEvidence with ChatRWD and three general LLMs on 50 open-ended clinical scenarios scored by nine physicians. ChatRWD reached 58% relevant answers; OpenEvidence reached 24%; general LLMs scored 2–10%. The paper's framing is that OpenEvidence performs well when published evidence already exists — a textbook RAG limitation — but struggles when the question requires synthesising real-world data.
Hurt et al. evaluated OpenEvidence on five common primary-care problems (hypertension, hyperlipidaemia, type 2 diabetes, depression, obesity). The clinical impact was minimal: it reinforced existing plans rather than modifying them. Useful for confirmation; less useful for changing minds.
Hajj et al. compared ChatGPT-4o with OpenEvidence on 15 questions about transcatheter tricuspid valve repair and replacement. Subject-matter experts rated ChatGPT-4o the more reliable answer source.
Patel et al. raised separate concerns about OpenEvidence's transparency, including the process by which articles are curated or excluded, the timeliness of indexed content, and clinical relevance.
The Jolayemi pilot also documented a failure mode worth flagging: in both modes, OpenEvidence sometimes returned an answer that was not among the listed multiple-choice options, and neither mode ever responded "I don't know." Confidently wrong, with confident citations, is a recognised pattern.
None of this means OpenEvidence is unusable. It does mean the marketing language — "PhD-level," "medical superintelligence," "100% on USMLE" — does not match the picture peer reviewers see when they test it on harder questions.
The pharmaceutical advertising business model
OpenEvidence is free for verified US clinicians. The reason it is free is published openly on their own website: it is funded by pharmaceutical and medical-device advertising. They operate ads.openevidence.com for advertisers, and their published advertising policy describes that sponsored summaries from pharmaceutical manufacturers may appear alongside answers.
OpenEvidence states that advertisements and clinical content are kept separate, and that advertisers cannot influence what the system retrieves or synthesises. This is a self-policed editorial boundary, and how much weight to give it is a judgement each clinician makes for themselves.
What is not in dispute is the underlying economics. CPMs for advertising to verified prescribers can run from roughly $70 to over $1,000 per thousand impressions — orders of magnitude above general consumer media. That economic reality is what makes the product free.
Wysor uses a subscription model. Users pay; pharmaceutical companies do not. There are no sponsored summaries and no advertiser relationships to manage. For a hospital procurement officer evaluating clinical AI tools, this is often a meaningful difference in the diligence file — and for an individual physician, it removes a category of question about whose interest is being served by which answer.
Where OpenEvidence's US-centric content matters
For US clinicians, OpenEvidence's content alignment with American guidelines is a feature: AHA, ACC, ADA, and FDA labelling are usually exactly what you want to see. For UK clinicians, it is friction.
Independent reviews note that OpenEvidence tends to cite American Heart Association guidelines and FDA approvals where a UK clinician would expect NICE recommendations and the British National Formulary. Drugs licensed in the US but not in the UK can surface in answers. Dose recommendations sometimes reflect US labelling rather than MHRA-approved labels. Cost-effectiveness logic from US payor systems does not map onto NHS prescribing.
Wysor's medical search is structurally broader. It pulls from:
- PubMed / MEDLINE — 40M+ primary records, MeSH filters, citation-based reranking, retraction detection
- FDA drug labels + FAERS — 256K labels and 20M adverse-event reports as a structured source, not a snippet
- RxNorm — drug brand, generic, ingredient, and dose-form resolution
- International regulatory data (EMA, ATC) — for clinicians who cross borders or work with international pharma
- ICD-10 — both ICD-10-CM and international variants
OpenEvidence has deeper full-text access to a small set of premium journals (NEJM, JAMA, Cochrane, NCCN, ClinicalKey) via partnership. Wysor has broader structured regulatory and coding coverage. Different shapes; for UK clinicians, regulatory breadth usually matters more than premium-journal full text.
Privacy and data handling
Patient-context queries reveal protected information even when no name is attached. "75-year-old male on warfarin with new atrial fibrillation" can re-identify a patient in a small clinic. Where that query is sent, what is logged, and who else has access matters.
OpenEvidence is a US-infrastructure platform with a HIPAA-aligned posture for its primary market. Its business model is pharmaceutical advertising, which means clinical queries flow into a system that simultaneously serves sponsored content to the same user. OpenEvidence states ads and clinical content are kept separate; this is a self-policed boundary.
Wysor takes a different approach. Every model provider — Anthropic, OpenAI, Google, Mistral — has a signed DPA covering no-training guarantees and processor obligations. The DPA template and subprocessor list are public. There is no advertising layer for queries to flow into. For multi-physician procurement, having the contract on hand is usually what unblocks the purchase.
The workspace around the search
OpenEvidence is a single-purpose tool: one search box, one synthesis pane. It does that well. But a clinician's day is not just literature lookups.
Wysor wraps medical search inside a workspace:
- Multi-model chat — the same conversation can call medical search, then switch to GPT-5 for letter drafting, then to Claude for analysis of an uploaded patient summary, then to Gemini for a multimodal question
- Private knowledge bases — upload your hospital's antibiotic protocol, your departmental guidelines, your internal SOPs. The agent cites from your private corpus alongside published literature
- Email — Gmail and Outlook sync. Draft a referral letter referencing literature you just searched, without copy-paste
- On-device voice transcription — dictate consult notes, audio never leaves your device
- Browser extension — Chrome integration for working with hospital web tools
- Voice agent — patient booking and SMS confirmation flows for private practice
OpenEvidence will refer you back to your other tools for everything except the literature question.
Pricing
| Wysor Plus | OpenEvidence | |
|---|---|---|
| Price | €17.99/month | Free for verified US clinicians (NPI-based verification) |
| Funding model | Subscription | Pharmaceutical and medical-device advertising |
| Eligibility | Open to anyone | Clinician verification required |
| Structured medical data | PubMed + FDA + FAERS + RxNorm + EMA + ATC + ICD-10 | PubMed + NEJM + JAMA + Cochrane + NCCN + ClinicalKey partnerships |
| Other AI models | GPT-5, Claude, Gemini, DeepSeek | Single proprietary stack |
| Knowledge bases | 5 included | Not available |
| Email management | Full Gmail + Outlook | Not available |
| Voice transcription | On-device | Not available |
| Verification required | None | NPI-based clinician verification |
For verified US clinicians who only need literature lookup, OpenEvidence's free tier is hard to beat on price. For UK and other non-US clinicians who hit the NPI verification wall, anyone uncomfortable with pharmaceutical-funded clinical AI, or anyone who wants more than literature search in one place — Wysor Plus at €17.99 is the realistic comparison.
Who should choose which
Choose OpenEvidence if you:
- Are a US clinician with an NPI and want a free literature search tool
- Need fast, one-shot answers between patients
- Primarily want NEJM- and JAMA-backed evidence synthesis
- Are comfortable with a pharmaceutical-advertising-funded clinical platform
- Do not need workflow tools beyond literature lookup
Choose Wysor if you:
- Want a clinical AI that does not show pharmaceutical sponsored content
- Are a UK clinician (or any non-US clinician) who hits the NPI verification wall
- Want the option to use GPT-5, Claude, Gemini, and other frontier models directly — and to compare their answers
- Need structured FDA, FAERS, RxNorm, EMA, ATC, or ICD-10 data alongside PubMed
- Want the search inside a workspace that also handles email, dictation, and your private guideline corpus
- Want contractual privacy guarantees with a published DPA, not just a privacy policy
- Are uncomfortable with the accuracy and repeatability numbers in the published peer-reviewed evaluations
Start using Wysor today
Try Wysor free — no clinician verification required. The free tier includes medical search across all integrated databases, five AI models, and one knowledge base for your own protocols.
When you need the full toolset — multiple knowledge bases, every model, email and voice integration — Wysor Plus at €17.99/month is less than a single journal subscription, paid for entirely by users rather than advertisers. Contractual DPAs and no training on your queries are included on every plan.
References
The accuracy claims about OpenEvidence on this page are drawn from the following published sources:
- Jolayemi & Hash. "The accuracy and repeatability of OpenEvidence on complex medical subspecialty scenarios: a pilot study." medRxiv preprint, 2025.
- Zuo et al. "MedXpertQA: a benchmark for evaluating medical reasoning in LLMs." 2025.
- Low et al. "Answering real-world clinical questions using large language model, retrieval-augmented generation, and agentic systems." 2025.
- Hurt et al. Study on OpenEvidence and primary care decision-making.
- Hajj et al. ChatGPT-4o vs OpenEvidence in structural heart disease clinical decision support.
- Patel et al. Commentary on OpenEvidence transparency and curation.
- OpenEvidence published advertising policy: openevidence.com/policies/advertising
- Independent UK clinician review series, iatroX, 2025.
FAQ
Frequently Asked Questions
A peer-reviewed pilot study (Jolayemi & Hash, medRxiv 2025) tested OpenEvidence and Deep Consult on 100 subspecialty board questions from the MedXpertQA dataset. Standard OpenEvidence scored 34%; Deep Consult scored 41%. The leading frontier LLM (GPT-o1) scored 46% on the same dataset (Zuo et al.). In an earlier study (Low et al.), OpenEvidence produced relevant answers for only 24% of open-ended clinical questions, while a competitor (ChatRWD) scored 58% on the same set.
OpenEvidence's founder describes the standard product as 'an ensemble of specialised models... trained exclusively on peer-reviewed medical literature' — which the medRxiv authors interpret as retrieval-augmented generation (RAG). The newer Deep Consult mode adds clarifying questions and visible reasoning steps, similar to ChatGPT or Perplexity Deep Research. Calling this 'agentic' is generous; calling it autonomous multi-step clinical reasoning is not supported by the published accuracy data.
OpenEvidence is funded by pharmaceutical and medical-device advertising (per its own published advertising policy). The platform's own documentation describes that sponsored summaries from pharmaceutical manufacturers may appear alongside answers, with OpenEvidence stating that ads and content are kept separate. This is a self-policed editorial boundary. Wysor takes no advertising revenue and runs on subscriptions, so this question does not arise.
Technically yes, but the verification path is built around the US National Provider Identifier (NPI). Independent reviews (iatroX, 2025) report that UK clinicians often cannot complete the verification flow for Pro features. The underlying content is also US-centric — citing AHA guidelines and FDA approvals where a UK clinician would expect NICE recommendations and the BNF, and sometimes recommending drugs not licensed in the UK. Wysor is the same product worldwide with no clinician-verification gate.
Three things, fairly. First, journal partnerships — full-text access to NEJM, JAMA, Cochrane, NCCN, and Elsevier's ClinicalKey AI is a real moat for evidence synthesis. Second, response time for one-shot point-of-care questions is excellent. Third, for verified US clinicians the free tier is genuinely free. If those three things outweigh the accuracy, advertising, and content-bias concerns for your practice, OpenEvidence is the right tool.
Ready to try a better AI workspace?
Get access to all major AI models with real privacy guarantees. Free to start, no credit card required.
Try Wysor FreeEditorial note: This comparison was created by the Wysor team. All feature and pricing information reflects publicly available data as of April 2026. Features, pricing, and policies may have changed since publication. We recommend verifying details on OpenEvidence's official website before making a decision.
Keep exploring