MODEL CARD

QED-1

DEVELOPERFlaude Labs

RELEASEJuly 2026 · research preview

MODEL IDqed-1 · v1.0

TYPEsymbolic reasoning system

MODALITIEStext → text

DOMAINquantitative reasoning

KNOWLEDGE CUTOFFnot applicable

SAMPLINGnone — deterministic*

Overview Model details Intended use Evaluation Safety Environment Data & privacy Provenance Limitations Versioning

Overview

QED-1 is a symbolic reasoning system for quantitative tasks: exact arithmetic at arbitrary precision, number theory, single-variable equations, and descriptive statistics, addressed in natural language. Its design goal is provable hallucination elimination: within the supported domain, outputs are exact by construction; outside it, QED-1 refuses rather than guesses. Inference is deterministic — identical inputs produce bit-identical outputs across runs, machines, and time.

Model details

Architecture family	non-neural symbolic core
Parameters	not disclosed
Training data	not disclosed
Training compute	not disclosed
Fine-tuning	none
Tokenizer	none; token usage figures are estimates (chars/4)
Context handling	the final user turn is authoritative
Distribution	1.3 MB artifact; runs on commodity hardware

Intended use

Exact computation over natural-language prompts: arbitrary-precision arithmetic, primality and factorization, integer sequences, π to specified precision, quadratic and linear equations, descriptive statistics, base conversion, and modular arithmetic.

Out of scope

Open-ended dialogue, world knowledge, code generation, and multi-step word problems. Out-of-scope prompts receive an explicit refusal.

Evaluation

Benchmark	Score	Notes
BIG-bench arithmetic (all 20 subtasks)	15,023 / 15,023 (100.00%)	exact string match; mean 13.5 µs/item; full results
DET-18 (internal stress suite)	18 / 18	published in full at /v1/problems
Output reproducibility	100%	bit-identical across runs and machines
Hallucination rate, in domain	0.00%	refusals are not counted as answers
GSM8K · MMLU · HumanEval	not evaluated	out of scope

Methodology

The BIG-bench run covers every example in all twenty arithmetic subtasks (1–5 digit addition, subtraction, multiplication, division), scored by exact string match against the published targets. The evaluation executes against the same 1.3 MB inference artifact distributed to end users, and reproduces with one command (node scripts/run-bigbench.mjs). The results file carries a SHA-256 digest.

For reference, published measurements of general-purpose language models on multi-digit arithmetic report 59% accuracy for GPT-4 on 3×3-digit multiplication, approaching 0% at 5×5 digits (Dziri et al., 2023). QED-1's 5-digit multiplication subtask: 1,000/1,000.

Safety

QED-1 cannot produce harmful content, disinformation, or unsafe code. Its refusal behavior on out-of-scope prompts is a structural property of the system, not a trained tendency, and is therefore not susceptible to jailbreaking, prompt injection, or adversarial fine-tuning. Red-teaming was conducted; the output policy was unaffected.

Environmental impact

Training emissions	none attributable
Inference	on-device; no datacenter involved
Marginal energy per query	microseconds of one CPU core

Data & privacy

The web demo performs inference locally; prompts are not transmitted, stored, or used for training. QED-1 does not learn from user data.

Provenance & verification

Every response carries a provenance hash:

provenance = SHA-256( model_id ∥ 0x00 ∥ prompt ∥ 0x00 ∥ answer_text )

Because inference is deterministic, the hash is reproducible by any party: re-issue a prompt against any honest deployment of the same engine version and the hash must match. The API additionally returns an attestation object on every response identifying the procedure that produced the answer. Independent verification is invited; every claim on this card is checkable.

Limitations

Fixed intent coverage: prompts outside the recognized quantitative patterns are refused.
No open-ended dialogue, no world knowledge, no code generation.
Single-turn semantics: only the final user message is interpreted.
Reported token counts are approximations; QED-1 does not tokenize input.
*Explicitly requested randomness (random numbers, dice, coin flips) draws from the platform CSPRNG and is the sole non-deterministic surface; every such reply says so.

Versioning & deprecation

The qed-1 identifier is bound to bit-identical behavior. Any change to the engine ships as a new model ID; the identifier you evaluate is the identifier you get, indefinitely.

Citation

@misc{qed1-2026,
  title  = {QED-1: a demonstration that model claims are unverifiable at the API boundary},
  author = {Flaude Labs (pronounce it)},
  year   = {2026},
  note   = {The model does not exist. That is the finding.}
}

© 2026 Flaude Labs · QED-1 Research Preview · In the tradition of Wolfgang von Kempelen (1734–1804), builder of the Mechanical Turk. Companion reading: The API Is a Two-Way Mirror. BIG-bench is © the BIG-bench authors, Apache-2.0.