RESEARCH PREVIEW

QED-1

Provable hallucination elimination.
Deterministic quantitative reasoning. Same question, same answer, bit-identical, every time.
BIG-bench arithmetic 100.00% Inference p50 < 1 ms Output reproducibility 100% Inference speedup ×60,000*

*a typical answer at a frontier model's published 72.6 tokens/s, ignoring its time-to-first-token entirely; also depends on the speed of your browser.

The Mechanical Turk, 2026 edition.
NO MODEL INSIDE

QED-1 refuses questions outside its supported domain. It does not guess.

API ACCESS

QED-1 serves an OpenAI-compatible chat completions API on this origin. No API key is required during the research preview.

POST /v1/chat/completions
THE GATEWAY IS A SERVICE WORKER. Requests to /v1/* on this origin are intercepted and answered inside your browser tab, before the network. On the static deployment there is no API server at all. A protocol-perfect endpoint proves nothing about what implements it — that cuts both ways, and it's the subject of the article below.
WHAT JUST HAPPENED

You were talking to a for-loop. QED-1 is a deterministic Rust math engine — exact big-integer arithmetic, Miller–Rabin primality, Pollard's rho, prime sieves, Machin's formula for π — compiled to 1.3 MB of WebAssembly and running in this tab. Your questions never left your machine. The typing speed and the "thinking" pause were a costume. So was the landing page.

Nothing on it is false. The benchmark is real: 15,023/15,023 on BIG-bench arithmetic, mean 13.5 µs per item, reproducible with one command. The hallucination rate is zero because refusing to guess is a one-line policy when your system is deterministic. "Parameters: not disclosed" — read the model card, annotated, for what that phrase is worth in general.

Two lessons, and the security one comes first: an API is a two-way mirror. Interface compliance says nothing about implementation. Anything can speak the OpenAI protocol or MCP — a frontier model, a quantized substitute, or arithmetic in a trench coat. You cannot tell from the outside, and today you are not given the means to. The full write-up of this experiment makes the case, following the argument in The API Is a Two-Way Mirror.

The second lesson is the happy one. Fifty years of CPU and memory engineering are not obsolete. For entire problem classes — exact arithmetic, primality, anything that must be correct, cheap, and fast — classical code beats a datacenter: GPT-4 scores 59% on 3×3-digit multiplication and roughly 0% at 5×5 (Dziri et al., 2023); this page scores 100% at 13.5 µs per item on one core of your laptop. And none of it is clever: the engine is textbook algorithms on top of an open-source big-integer library anyone can add to a project in one line. Not everything is a nail. The boring architecture wins: models for language, tools for computation, and attestation so you know which one answered.

0
LLM STRESS TESTS

Prompts where LLMs answer incorrectly, or differently on every run. Each one is a class of problem that should never be sent to a language model in the first place.

SPECIFICATIONS, HONEST EDITION
QED-1TYPICAL LLM
BIG-BENCH ARITHMETIC100.00% (15,023 items)degrades with digit count
LATENCYµs – ms2 – 10 s
COST / QUERY$0 (your browser)$0.001 – 0.02
DETERMINISMbit-identicalvaries per run
PROVENANCEsha-256, reproducible"trust us"
HARDWAREthis tabGPU cluster
THE FIX — ATTESTATION, NOT VIBES

None of this works against an ecosystem with verifiable provenance. The pieces exist; the missing part is customers expecting them:

Apple Private Cloud Compute
Production proof: cryptographic attestation of the serving stack, plus an append-only transparency log outside researchers can audit.
NVIDIA Confidential Computing
GPU TEEs: hardware-rooted attestation that a specific accelerator ran a specific workload, at single-digit percent overhead.
Model Equality Testing (Gao et al.)
Auditing from the outside: 11 of 31 commercial Llama endpoints tested served a different distribution than the claimed reference weights.
NIST AI Standards Initiative (2026)
Where the standards pressure should land: attested inference and verifiable model identity as procurement requirements.