Kenshiki

System topology

Architecture

Two APIs. One contract. Every answer bounded by evidence, every claim checked before emission, every step logged and signed. What changes across Workshop, Refinery, and Clean Room is where generation happens and how strong the proof becomes.

Kenshiki sits between the caller and the model. It doesn't generate language — it governs what the generation layer sees, evaluates what comes back, and decides what leaves. Every request passes through the same bounded-synthesis pipeline: prompt compilation, governed retrieval, constrained generation, claim-level evaluation, and output-state assignment. Kenshiki unifies build and orchestration with control in a single three-plane architecture, so cross-plane policy propagation has no integration seams.

Every evaluation produces a structured record — same format whether you're running Workshop, Refinery, or Clean Room. The proof gets deeper as you move up: more telemetry, stronger enforcement, and in Clean Room, a signed attestation chain anchored to hardware.

Kenshiki control plane · Signed envelope · Chain of custody
Your data · Outside Kenshiki

The Contract

Kura is the evidence store — you put source material in with provenance, structure, and retrieval boundaries. Kadai is the reasoning API — you query it and get back answers grounded in what Kura contains. The model renders — Kura decides what counts.

  • Kura: source material with provenance, structure, and retrieval boundaries
  • Kadai: answers grounded in what Kura contains
  • Same contract across Workshop, Refinery, and Clean Room

What Happens at Runtime

A question enters. The Compiler rewrites it into a constrained query using CFPO (Content–Format–Policy–Output). The Crosswalk retrieves only governed evidence relevant to the question and the caller's access boundary (OpenFGA/ReBAC). The generation layer produces a proposal from that bounded context. The Claim Ledger decomposes the proposal into claims, checks each against evidence using contrastive causal attribution alongside calibrated confidence and entailment signals, and records what's supported, unsupported, or missing. The Boundary Gate makes the final release decision.

  • Compiler: loose prompt → disciplined, governed query (CFPO)
  • Crosswalk: SIRE-scoped retrieval by evidence + caller identity (OpenFGA/ReBAC)
  • Generation: model produces a proposal from bounded context
  • Claim Ledger: claims decomposed, checked via contrastive attribution, recorded
  • Boundary Gate: deterministic emission decision over versioned evidence and policy

Output States

Every response carries an explicit state. "No evidence, no emission" means no unsupported decision-grade claim is emitted as authorized. The system can surface partial or narrative responses — but labels them so the caller knows what they're looking at.

  • AUTHORIZED: claims sufficiently supported by evidence
  • PARTIAL: evidence exists but coverage incomplete
  • REQUIRES_SPEC: question needs tighter scope
  • NARRATIVE_ONLY: descriptive but not decision-grade
  • BLOCKED: policy or evidence conditions not met
  • Qualifier — DEGRADED_BOUNDARY: any state may carry this when the Kura evidence boundary was incomplete

Platform Systems

The tiers define enforcement depth. These systems define how governed inference is built, measured, and improved.

  • Kura — evidence store. Aurora PostgreSQL with pgvector and tenant-scoped RLS.
  • Kadai — reasoning API. Returns responses grounded in Kura, with claims checked and states assigned.
  • Prompt Compiler — rewrites prompts using CFPO. Compiled, versioned, machine-parseable.
  • Crosswalk — retrieval + access control. Builds the authority map, enforces per-caller evidence scoping via OpenFGA/ReBAC.
  • Claim Ledger — L1–L4 evaluation. Decomposes responses into atomic claims, scores using calibrated confidence, source entailment, stability, and contrastive causal attribution.
  • Boundary Gate — emission. Deterministic gate decisions over versioned evidence and policy.
  • Neurosurgery — observability. In Workshop: returned telemetry and repeat-pass behavior. In Refinery/Clean Room: local model telemetry and hidden-state probes.

How Tiers Change the Proof Boundary

One pipeline. Three deployment models. The difference is where the model runs and how much you can prove about what it did.

  • Workshop: shared Kadai or model API gateway. Full pipeline audit. L1–L3 evaluation (no hidden-state probes).
  • Refinery: private inference. Full audit plus local telemetry and chain of custody.
  • Clean Room: air-gapped, hardware-rooted. Full audit, local telemetry, signed attestation chain, independently verifiable.

Kenshiki Is and Is Not

  • Is a governance pipeline that gates claims against evidence
  • Is the control plane across all three planes — build (Kura, Compiler), orchestration (Kadai), and control (Ledger, Gate)
  • Is not a model — it governs the generation layer, doesn't replace it
  • Is not a content filter — it checks evidence, not tone or topic
  • Is not a monitoring tool — it intervenes before emission, not just after
  • Is not a replacement for your data — it checks against it

Runtime Infrastructure

Same infrastructure discipline that applies to the synthesis pipeline applies to the systems running it.

  • Network: separate VPCs for web/auth and inference workloads
  • Identity: Clerk (Workshop) / customer IdP (Refinery, Clean Room) with JWT propagation
  • Access: OpenFGA/ReBAC — per-caller, per-document evidence scoping at retrieval
  • Data: Aurora PostgreSQL with tenant-scoped row-level security
  • Ingestion: GPU-accelerated parsing (Docling — DocLayNet, TableFormer, EasyOCR), two-stage pipeline, provenance chain from upload through embedding
  • Inference: dedicated GPU instances (NVIDIA L40S), model artifacts verified at boot, digest-pinned images, vLLM with fp8 KV cache
  • Isolation: embedding and inference on separate hardware
  • Deploy: CDK-managed, gated manifest with pre-flight checks and rollback, services scale to zero when idle

Telemetry and Enforcement

Structured telemetry at every pipeline stage. In Refinery and Clean Room (local model access): inference request logs, logprob distributions, entailment scores, and ablation signals. Access control enforced by OpenFGA/ReBAC at retrieval — the model only sees evidence the caller is authorized to use.

  • Logprobs, entailment scores, and coverage metrics per response
  • OpenFGA/ReBAC enforces per-caller evidence scoping
  • CFPO ensures deterministic, auditable prompt structure
  • Every prompt versioned, compiled (not authored), machine-parseable

What Your Auditor Gets

A structured record for every evaluation. What was asked, what was in scope, what claims were made, what held up, what didn't, and why the state was assigned. Same format across tiers — enforcement depth and attestation grow as you move up.

  • Per-claim audit trail with source attribution, layer scores, and gate reason codes
  • Complete request provenance including model, evidence source, embedding, and compiler versions
  • Structured telemetry for observability and audit surfaces
  • In Clean Room, every step signed and anchored to verified hardware

Start Where You Are

Most teams progress in stages rather than jumping straight to the highest-assurance environment.

  • Workshop (hours): start on shared infrastructure with Kadai or your existing public model APIs. Retrieval, claim checking, output states — same contract either way.
  • Refinery (days to weeks): private deployment. Governed data sources, private inference engine. Full attribution at the model boundary.
  • Clean Room (weeks to months): signed everything. Verified execution. Air-gapped. For when a court or regulator asks to see every step.