Data sources
Every API, dataset, and rules file Proofbase consults — what each one is for, what it isn't, whether it requires a key, and which claim categories route to it.
How sources are chosen
We detect the rough category of your claim (politics, health, science, legal, finance, tech, celebrity, general) and prioritise the sources most likely to have relevant evidence. All claims also run through the general pool — Google Fact Check Tools, GDELT, Wikipedia, and curated RSS feeds.
If a source fails, times out, or blocks our request, we show it transparently on the result page rather than hiding it.
All consulted sources (16)
- Google Fact Check Tools API ↗Key requiredFact-checkerfactchecktools.googleapis.com/v1alpha1/claims:search
What it does: Indexes fact-check articles published by Politifact, Snopes, FactCheck.org, AFP, Reuters Fact Check, Lead Stories, and dozens of other IFCN-verified outlets. Drives the supports/disputes/mixed verdict label.
What it does NOT do: Skipped silently when no FACTCHECK_API_KEY is configured. Fact-checks rated for a different phrasing of the claim are demoted to 'related' (not used as a verdict).
Routed for: All claim categories. - GDELT 2.0 DOC API ↗No keyNews indexapi.gdeltproject.org/api/v2/doc/doc
What it does: Real-time global news index. Tens of thousands of outlets across 65+ languages, indexed within minutes of publication.
What it does NOT do: Only the last ~30 days. Cannot assert stance — every GDELT item is labeled 'related'.
Routed for: All categories — runs on every check. - Wikimedia REST + MediaWiki Search ↗No keyEncyclopediaen.wikipedia.org/api/rest_v1/
What it does: English Wikipedia article search + summary endpoints. Background context — who, what, when, where.
What it does NOT do: Community-edited. Always orientation, never a primary source. Cannot assert stance.
Routed for: All claim categories. - arXiv API ↗No keyPre-print archiveexport.arxiv.org/api/query
What it does: Open archive of preprints in physics, mathematics, computer science, quantitative biology, statistics, and economics.
What it does NOT do: NOT peer-reviewed at time of posting. Treat as preliminary; check the final journal version before citing.
Routed for: science-research, technology. - Crossref ↗No keyScholarly metadataapi.crossref.org/works
What it does: DOI registration metadata for academic publications: title, abstract, authors, journal, year. Polite-pool requests via mailto.
What it does NOT do: Metadata only — indexing does not equal peer review or endorsement.
Routed for: science-research, health-medical. - PubMed (NCBI E-utilities) ↗Key optionalBiomedical literatureeutils.ncbi.nlm.nih.gov/entrez/eutils/
What it does: US National Library of Medicine biomedical literature index — clinical trials, case reports, systematic reviews.
What it does NOT do: Indexes the citation; the full text is usually on the publisher's site (often paywalled). Abstracts only.
Routed for: health-medical, science-research. - OpenAlex ↗No keyOpen scholarly graphapi.openalex.org/works
What it does: Open scholarly knowledge graph — works, authors, institutions, citations. Includes retraction status.
What it does NOT do: Indexing does not equal endorsement. Retracted works are flagged in the evidence card.
Routed for: science-research, health-medical. - Semantic Scholar ↗Key optionalScholarly discoveryapi.semanticscholar.org/graph/v1/paper/search
What it does: Searches Semantic Scholar's academic graph for papers, abstracts, authors, venues, citation counts, and open-access links.
What it does NOT do: Academic discovery only. A paper result is not proof that a claim is settled.
Routed for: science-research, health-medical, technology. - CourtListener ↗Key optionalCourt opinionscourtlistener.com/api/rest/v4/search/
What it does: Free Law Project search across US federal and state court opinions. Optional COURTLISTENER_API_KEY for higher rate limits.
What it does NOT do: Does not summarize legal outcomes — open the opinion itself for the holding.
Routed for: legal-court. - Hacker News (Algolia) ↗No keyTech discussionhn.algolia.com/api/v1/search
What it does: Full-text search over Hacker News story submissions and discussions.
What it does NOT do: User-submitted aggregator. Popularity is not credibility. Discussion quality varies wildly.
Routed for: technology. - GitHub ↗Key optionalTechnical provenanceapi.github.com/search/issues + /search/repositories
What it does: Searches public issues and repositories for technical claims, security discussions, provenance, and open-source evidence.
What it does NOT do: Developer discussion is not automatically authoritative. Popular repositories can still be wrong.
Routed for: technology, science-research. - Stack Exchange ↗No keyTechnical Q&Aapi.stackexchange.com/2.3/search/advanced
What it does: Searches Stack Overflow questions for technical context and implementation evidence.
What it does NOT do: Accepted answers can be outdated. Treat as context, not a primary source.
Routed for: technology, science-research. - Reddit (public JSON) ↗No keySocial discussionreddit.com/search.json
What it does: Anonymous public search across subreddits — best-effort.
What it does NOT do: Reddit blocks most public bots; this adapter often reports 'blocked' rather than 'error'. No editorial oversight. Always 'related'.
Routed for: celebrity-viral. - Curated RSS feeds ↗No keyNewsroom & gov bulletinsCurated reputable feeds (AP, NPR, BBC, PBS, FactCheck.org, PolitiFact, Snopes, CDC, FDA, NIH, NASA, BLS, Census, Eurostat, FTC, SEC)
What it does: Pulls fresh items from a hand-curated list of newsroom and government RSS feeds. 10-minute in-memory cache. Filters items against the user's query.
What it does NOT do: Headlines only — never treated as a verdict, even for fact-checker feeds (the verdict still needs the Google Fact Check API match).
Routed for: All categories (general fallback) + politics-news + health-medical + finance-business. - Optional web/news APIs ↗Key optionalOptional expansionBrave Search, NewsData.io, Mediastack
What it does: Adapters are wired and reported transparently when keys are present.
What it does NOT do: Core Proofbase does not require paid APIs. Optional search snippets are context until opened and verified.
Routed for: General, politics, finance, technology, celebrity/viral, and broad discovery. - Local source rules ↗No keyReputation databasedata/source-rules.json (bundled)
What it does: ~80 hand-curated publisher entries: category, base quality score (0–100), warning flags, preferred use, editorial track-record notes.
What it does NOT do: Does not decide truth automatically. Unknown domains fall back to TLD heuristics (.gov / .edu trusted, .info / .xyz penalized).
Routed for: All checks — fuels the Source Quality Score and verdict-confidence breakdown.
What we do NOT use
- No paid APIs of any kind. All evidence comes from free/public endpoints.
- No language model is used to invent verdicts, quotes, or sources. The stance comes only from real fact-checker output.
- No iframe embedding of arbitrary external sites.
- No bypassing of paywalls or login walls.
- No recursive crawling. URL extraction is single-GET, SSRF-protected.