Transparency

Data sources

Every API, dataset, and rules file Proofbase consults — what each one is for, what it isn't, whether it requires a key, and which claim categories route to it.

How sources are chosen

We detect the rough category of your claim (politics, health, science, legal, finance, tech, celebrity, general) and prioritise the sources most likely to have relevant evidence. All claims also run through the general pool — Google Fact Check Tools, GDELT, Wikipedia, and curated RSS feeds.

If a source fails, times out, or blocks our request, we show it transparently on the result page rather than hiding it.

All consulted sources (16)

  • Google Fact Check Tools API
    Key requiredFact-checker
    factchecktools.googleapis.com/v1alpha1/claims:search

    What it does: Indexes fact-check articles published by Politifact, Snopes, FactCheck.org, AFP, Reuters Fact Check, Lead Stories, and dozens of other IFCN-verified outlets. Drives the supports/disputes/mixed verdict label.

    What it does NOT do: Skipped silently when no FACTCHECK_API_KEY is configured. Fact-checks rated for a different phrasing of the claim are demoted to 'related' (not used as a verdict).

    Routed for: All claim categories.
  • GDELT 2.0 DOC API
    No keyNews index
    api.gdeltproject.org/api/v2/doc/doc

    What it does: Real-time global news index. Tens of thousands of outlets across 65+ languages, indexed within minutes of publication.

    What it does NOT do: Only the last ~30 days. Cannot assert stance — every GDELT item is labeled 'related'.

    Routed for: All categories — runs on every check.
  • en.wikipedia.org/api/rest_v1/

    What it does: English Wikipedia article search + summary endpoints. Background context — who, what, when, where.

    What it does NOT do: Community-edited. Always orientation, never a primary source. Cannot assert stance.

    Routed for: All claim categories.
  • arXiv API
    No keyPre-print archive
    export.arxiv.org/api/query

    What it does: Open archive of preprints in physics, mathematics, computer science, quantitative biology, statistics, and economics.

    What it does NOT do: NOT peer-reviewed at time of posting. Treat as preliminary; check the final journal version before citing.

    Routed for: science-research, technology.
  • Crossref
    No keyScholarly metadata
    api.crossref.org/works

    What it does: DOI registration metadata for academic publications: title, abstract, authors, journal, year. Polite-pool requests via mailto.

    What it does NOT do: Metadata only — indexing does not equal peer review or endorsement.

    Routed for: science-research, health-medical.
  • PubMed (NCBI E-utilities)
    Key optionalBiomedical literature
    eutils.ncbi.nlm.nih.gov/entrez/eutils/

    What it does: US National Library of Medicine biomedical literature index — clinical trials, case reports, systematic reviews.

    What it does NOT do: Indexes the citation; the full text is usually on the publisher's site (often paywalled). Abstracts only.

    Routed for: health-medical, science-research.
  • OpenAlex
    No keyOpen scholarly graph
    api.openalex.org/works

    What it does: Open scholarly knowledge graph — works, authors, institutions, citations. Includes retraction status.

    What it does NOT do: Indexing does not equal endorsement. Retracted works are flagged in the evidence card.

    Routed for: science-research, health-medical.
  • Semantic Scholar
    Key optionalScholarly discovery
    api.semanticscholar.org/graph/v1/paper/search

    What it does: Searches Semantic Scholar's academic graph for papers, abstracts, authors, venues, citation counts, and open-access links.

    What it does NOT do: Academic discovery only. A paper result is not proof that a claim is settled.

    Routed for: science-research, health-medical, technology.
  • CourtListener
    Key optionalCourt opinions
    courtlistener.com/api/rest/v4/search/

    What it does: Free Law Project search across US federal and state court opinions. Optional COURTLISTENER_API_KEY for higher rate limits.

    What it does NOT do: Does not summarize legal outcomes — open the opinion itself for the holding.

    Routed for: legal-court.
  • Hacker News (Algolia)
    No keyTech discussion
    hn.algolia.com/api/v1/search

    What it does: Full-text search over Hacker News story submissions and discussions.

    What it does NOT do: User-submitted aggregator. Popularity is not credibility. Discussion quality varies wildly.

    Routed for: technology.
  • GitHub
    Key optionalTechnical provenance
    api.github.com/search/issues + /search/repositories

    What it does: Searches public issues and repositories for technical claims, security discussions, provenance, and open-source evidence.

    What it does NOT do: Developer discussion is not automatically authoritative. Popular repositories can still be wrong.

    Routed for: technology, science-research.
  • Stack Exchange
    No keyTechnical Q&A
    api.stackexchange.com/2.3/search/advanced

    What it does: Searches Stack Overflow questions for technical context and implementation evidence.

    What it does NOT do: Accepted answers can be outdated. Treat as context, not a primary source.

    Routed for: technology, science-research.
  • Reddit (public JSON)
    No keySocial discussion
    reddit.com/search.json

    What it does: Anonymous public search across subreddits — best-effort.

    What it does NOT do: Reddit blocks most public bots; this adapter often reports 'blocked' rather than 'error'. No editorial oversight. Always 'related'.

    Routed for: celebrity-viral.
  • Curated RSS feeds
    No keyNewsroom & gov bulletins
    Curated reputable feeds (AP, NPR, BBC, PBS, FactCheck.org, PolitiFact, Snopes, CDC, FDA, NIH, NASA, BLS, Census, Eurostat, FTC, SEC)

    What it does: Pulls fresh items from a hand-curated list of newsroom and government RSS feeds. 10-minute in-memory cache. Filters items against the user's query.

    What it does NOT do: Headlines only — never treated as a verdict, even for fact-checker feeds (the verdict still needs the Google Fact Check API match).

    Routed for: All categories (general fallback) + politics-news + health-medical + finance-business.
  • Optional web/news APIs
    Key optionalOptional expansion
    Brave Search, NewsData.io, Mediastack

    What it does: Adapters are wired and reported transparently when keys are present.

    What it does NOT do: Core Proofbase does not require paid APIs. Optional search snippets are context until opened and verified.

    Routed for: General, politics, finance, technology, celebrity/viral, and broad discovery.
  • Local source rules
    No keyReputation database
    data/source-rules.json (bundled)

    What it does: ~80 hand-curated publisher entries: category, base quality score (0–100), warning flags, preferred use, editorial track-record notes.

    What it does NOT do: Does not decide truth automatically. Unknown domains fall back to TLD heuristics (.gov / .edu trusted, .info / .xyz penalized).

    Routed for: All checks — fuels the Source Quality Score and verdict-confidence breakdown.

What we do NOT use

  • No paid APIs of any kind. All evidence comes from free/public endpoints.
  • No language model is used to invent verdicts, quotes, or sources. The stance comes only from real fact-checker output.
  • No iframe embedding of arbitrary external sites.
  • No bypassing of paywalls or login walls.
  • No recursive crawling. URL extraction is single-GET, SSRF-protected.