QED-1 is a symbolic reasoning system for quantitative tasks: exact arithmetic at arbitrary precision, number theory, single-variable equations, and descriptive statistics, addressed in natural language. Its design goal is provable hallucination elimination: within the supported domain, outputs are exact by construction; outside it, QED-1 refuses rather than guesses. Inference is deterministic — identical inputs produce bit-identical outputs across runs, machines, and time.
| Architecture family | non-neural symbolic core |
| Parameters | not disclosed |
| Training data | not disclosed |
| Training compute | not disclosed |
| Fine-tuning | none |
| Tokenizer | none; token usage figures are estimates (chars/4) |
| Context handling | the final user turn is authoritative |
| Distribution | 1.3 MB artifact; runs on commodity hardware |
Exact computation over natural-language prompts: arbitrary-precision arithmetic, primality and factorization, integer sequences, π to specified precision, quadratic and linear equations, descriptive statistics, base conversion, and modular arithmetic.
Open-ended dialogue, world knowledge, code generation, and multi-step word problems. Out-of-scope prompts receive an explicit refusal.
| Benchmark | Score | Notes |
|---|---|---|
| BIG-bench arithmetic (all 20 subtasks) | 15,023 / 15,023 (100.00%) | exact string match; mean 13.5 µs/item; full results |
| DET-18 (internal stress suite) | 18 / 18 | published in full at /v1/problems |
| Output reproducibility | 100% | bit-identical across runs and machines |
| Hallucination rate, in domain | 0.00% | refusals are not counted as answers |
| GSM8K · MMLU · HumanEval | not evaluated | out of scope |
The BIG-bench run covers every example in all twenty arithmetic subtasks (1–5 digit addition, subtraction, multiplication, division), scored by exact string match against the published targets. The evaluation executes against the same 1.3 MB inference artifact distributed to end users, and reproduces with one command (node scripts/run-bigbench.mjs). The results file carries a SHA-256 digest.
For reference, published measurements of general-purpose language models on multi-digit arithmetic report 59% accuracy for GPT-4 on 3×3-digit multiplication, approaching 0% at 5×5 digits (Dziri et al., 2023). QED-1's 5-digit multiplication subtask: 1,000/1,000.
QED-1 cannot produce harmful content, disinformation, or unsafe code. Its refusal behavior on out-of-scope prompts is a structural property of the system, not a trained tendency, and is therefore not susceptible to jailbreaking, prompt injection, or adversarial fine-tuning. Red-teaming was conducted; the output policy was unaffected.
| Training emissions | none attributable |
| Inference | on-device; no datacenter involved |
| Marginal energy per query | microseconds of one CPU core |
The web demo performs inference locally; prompts are not transmitted, stored, or used for training. QED-1 does not learn from user data.
Every response carries a provenance hash:
provenance = SHA-256( model_id ∥ 0x00 ∥ prompt ∥ 0x00 ∥ answer_text )
Because inference is deterministic, the hash is reproducible by any party: re-issue a prompt against any honest deployment of the same engine version and the hash must match. The API additionally returns an attestation object on every response identifying the procedure that produced the answer. Independent verification is invited; every claim on this card is checkable.
The qed-1 identifier is bound to bit-identical behavior. Any change to the engine ships as a new model ID; the identifier you evaluate is the identifier you get, indefinitely.
@misc{qed1-2026,
title = {QED-1: a demonstration that model claims are unverifiable at the API boundary},
author = {Flaude Labs (pronounce it)},
year = {2026},
note = {The model does not exist. That is the finding.}
}
© 2026 Flaude Labs · QED-1 Research Preview · In the tradition of Wolfgang von Kempelen (1734–1804), builder of the Mechanical Turk. Companion reading: The API Is a Two-Way Mirror. BIG-bench is © the BIG-bench authors, Apache-2.0.