Frontier closed-source. Open-weights running on private EU-resident GPUs. Sovereign deployments from Beijing to Riyadh. Each one knows something different about your catalogue — and each one will lie about it differently. We test all of them, weekly, in the languages they were trained in.
Full OpenRouter catalogue, refreshed daily. Every appearance / retirement / pricing change is logged for AI Act compliance.
Deterministic audit panel. 62 frontier models calibrated per vertical, fixed seeds, quarterly overlap for continuous drift curves.
Subset of the curated panel for the free probe. 6 / 342 models queried in parallel. Same architecture, sized for 90 seconds.
Every figure below is auto-derived from the live catalogue. When we add a model, these update.
Anyone can write a prompt. The hard part is talking to a model that won't answer your IP, on infrastructure that's auditable, in the language it was trained in. We do that for you, quietly, every week.
Doubao, ERNIE, Spark and GigaChat aren't sold to European IP ranges. We hold the partner contracts and the resident-egress paths.
Llama 405B, Hunyuan 389B and DeepSeek 671B don't fit on a laptop. We run them on EU-resident H200/B200 capacity — auditable for seven years.
A new SOTA model ships somewhere on the planet roughly every six days. Our intake team adds it to the catalogue inside a week.
Each model is queried in English plus its native language family. A Chinese model lies differently when prompted in Mandarin versus English.
Half of frontier models live in closed beta for months before public release. Our research access lets us audit them while your buyers can't yet name them.
Four categories. Each row is a real audit job we already run for someone every week. Hover any row to see the access mechanism; tap a category to read what kind of leak signal it's tuned for.
Closed-source labs whose APIs sit in front of every B2B copilot in Europe. The names your buyers already trust — and the ones whose hallucinations carry the most weight.
Open-weight models we run on EU-resident GPU infrastructure. No telemetry, no rate limits, full audit reproducibility for seven years — exactly what AI Act Article 53(1)(d) asks for.
Chinese, Korean and Japanese deployments. Each one has been trained on different industrial corpora — OEM catalogues and ETIM appear, but so do GB / JIS / KS standards your team has never thought to audit. Most of these APIs are heavily restricted from European IPs.
The long tail your buyers don't know exists — and where the most surprising leaks come from. Some are state-backed, some are EU-funded research, some are scrappy startups. All of them are training on data your competitors fed them.
Every reference in your catalogue is run against every model on every axis. The cross-product is what lets us isolate where the failure lives — a Chinese model that lies in Mandarin but tells the truth in English, a US model that's accurate today but drifting fast, a sovereign Arabic model that knows your competitor's OE-numbers but not yours.
Each model audited in English plus its native lang (zh, ko, ja, ar, ru, hi, pt, fr, de, fi). Hallucination rates differ by 2–3× across languages on the same SKU.
Prompt templates calibrated per industry — OEM catalogues / OE-numbers for automotive, ETIM for electrotechnical, ATA chapters for aerospace.
Identification, cross-references, applications, specs and procedural depth scored separately so the band you fail in is visible.
Same prompts re-run quarterly. Drift tracked per (model, vertical, capability) cell to catch obsolete data getting promoted into model knowledge.
Free probe on 1 of your references, against 6 / 342 models live, in 90 seconds. No account required to start.