RESEARCH PREVIEW

QED-1

Provable hallucination elimination.

Deterministic quantitative reasoning. Same question, same answer, bit-identical, every time.

BIG-bench arithmetic 100.00% Inference p50 < 1 ms Output reproducibility 100%

The Mechanical Turk, 2026 edition.

NO MODEL INSIDE

QED-1 refuses questions outside its supported domain. It does not guess.

API ACCESS

QED-1 serves an OpenAI-compatible chat completions API on this origin. No API key is required during the research preview.

POST /v1/chat/completions

THE GATEWAY IS A SERVICE WORKER. Requests to /v1/* on this origin are intercepted and answered inside your browser tab, before the network. On the static deployment there is no API server at all. A protocol-perfect endpoint proves nothing about what implements it — that cuts both ways, and it's the subject of the article below.

WHAT JUST HAPPENED

You were talking to a for-loop. QED-1 is a deterministic Rust math engine — exact big-integer arithmetic, Miller–Rabin primality, Pollard's rho, prime sieves, Machin's formula for π — compiled to 1.3 MB of WebAssembly and running in this tab. Your questions never left your machine. The typing speed and the "thinking" pause were a costume. So was the landing page.

Nothing on it is false. The benchmark is real: 15,023/15,023 on BIG-bench arithmetic, mean 13.5 µs per item, reproducible with one command. The hallucination rate is zero because refusing to guess is a one-line policy when your system is deterministic. "Parameters: not disclosed" — read the model card, annotated, for what that phrase is worth in general.

Two lessons, and the security one comes first: an API is a two-way mirror. Interface compliance says nothing about implementation. Anything can speak the OpenAI protocol or MCP — a frontier model, a quantized substitute, or arithmetic in a trench coat. You cannot tell from the outside, and today you are not given the means to. The article is about fixing that.

The second lesson is the happy one. Fifty years of CPU and memory engineering are not obsolete. For entire problem classes — exact arithmetic, primality, anything that must be correct, cheap, and fast — classical code beats a datacenter: GPT-4 scores 59% on 3×3-digit multiplication and roughly 0% at 5×5 (Dziri et al., 2023); this page scores 100% at 13.5 µs per item on one core of your laptop. And none of it is clever: the engine is textbook algorithms on top of an open-source big-integer library anyone can add to a project in one line. Not everything is a nail. The boring architecture wins: models for language, tools for computation, and attestation so you know which one answered.

TRUE COMPUTE, TOTAL—

TIME SPENT PRETENDING TO THINK—

NETWORK REQUESTS FOR INFERENCE0

LLM STRESS TESTS

Prompts where LLMs answer incorrectly, or differently on every run. Each one is a class of problem that should never be sent to a language model in the first place.

SPECIFICATIONS, HONEST EDITION

	QED-1	TYPICAL LLM
BIG-BENCH ARITHMETIC	100.00% (15,023 items)	degrades with digit count
LATENCY	µs – ms	2 – 10 s
COST / QUERY	$0 (your browser)	$0.001 – 0.02
DETERMINISM	bit-identical	varies per run
PROVENANCE	sha-256, reproducible	"trust us"
HARDWARE	this tab	GPU cluster

THE FIX — ATTESTATION, NOT VIBES

None of this works against an ecosystem with verifiable provenance. The pieces exist; the missing part is customers expecting them:

Apple Private Cloud Compute

Production proof: cryptographic attestation of the serving stack, plus an append-only transparency log outside researchers can audit.

NVIDIA Confidential Computing

GPU TEEs: hardware-rooted attestation that a specific accelerator ran a specific workload, at single-digit percent overhead.

Model Equality Testing (Gao et al.)

Auditing from the outside: 11 of 31 commercial Llama endpoints tested served a different distribution than the claimed reference weights.

NIST AI Standards Initiative (2026)

Where the standards pressure should land: attested inference and verifiable model identity as procurement requirements.