Kenshiki Labs

System topology

Architecture

Every governed AI evaluation produces a structured record — same format whether you run Workshop, Refinery, or Clean Room.

The Kenshiki Labs platform architecture is a three-plane runtime contract — build, orchestration, and control — that produces a structured per-decision record on every governed AI response. The same audit chassis runs across all deployment tiers; what changes between Workshop, Refinery, and Clean Room is not the contract but the depth: more telemetry, stronger enforcement, and in Clean Room a signed attestation chain anchored to hardware. This is the canonical end-to-end specification operators integrate against.

Every evaluation produces a structured record — same format whether you're running Workshop, Refinery, or Clean Room. The audit record gets deeper as you move up: more telemetry, stronger enforcement, and in Clean Room, a signed attestation chain anchored to hardware.

Kenshiki Labs control plane · Signed envelope · Chain of custody
Your data · Outside Kenshiki Labs

Prepare

Before the model does anything, the request is bound to identity and to an approved evidence boundary. The Prompt Sanitizer authenticates the caller and propagates the access scope. Kura scopes retrieval to the governance corpus, with access controlled at the claim level. The model never sees data the caller is not authorized to use.

  • Identity binding via OpenFGA/ReBAC at the entry point
  • Retrieval scoped to the governance corpus and the caller's access boundary
  • SIRE exclusion gate purges out-of-scope chunks before they reach the model
  • The model is constrained by what reaches it, not by what comes back

Propose

The model is allowed to do useful reasoning and generation — but only inside the approved boundary. The Compiler rewrites the request into a CFPO-ordered prompt contract (Content–Format–Policy–Output) using five deterministic passes. Generation receives bounded context, not raw corpus access. Kenshiki is not making the model smaller; it is constraining the execution environment around it.

  • CFPO ordering matches model attention behavior
  • Five-pass deterministic rewrite — same input always produces the same contract
  • Generation receives only authorized evidence
  • The model stays whole — execution is what changes

Prove

Every decision is written to an immutable claim ledger with provenance. The Ledger decomposes the response into atomic claims and verifies each at L1–L4 depth (token confidence, source entailment, multi-draw stability, hidden-state probes). The Gate assigns one of five output states deterministically. ARBV produces signed Boundary Evidence Records that auditors can independently replay.

  • L1–L4 claim evaluation, depth varies by tier
  • Five output states — AUTHORIZED, PARTIAL, REQUIRES_SPEC, NARRATIVE_ONLY, BLOCKED
  • Signed Boundary Evidence Records, replayable by auditors and partners
  • The artifact becomes the system of record; the model stays whole

The Contract

Kura is the evidence store — you put source material in with provenance, structure, and retrieval boundaries. Kadai is the reasoning API — you query it and get back answers grounded in what Kura contains. The model renders — Kura decides what counts.

  • Kura: source material with provenance, structure, and retrieval boundaries
  • Kadai: answers grounded in what Kura contains
  • Same contract across Workshop, Refinery, and Clean Room

What Happens at Runtime

A question enters. The Compiler rewrites it into a constrained query using CFPO (Content–Format–Policy–Output). The Crosswalk retrieves only governed evidence relevant to the question and the caller's access boundary (OpenFGA/ReBAC). The generation layer produces a proposal from that bounded context. The Claim Ledger decomposes the proposal into claims, checks each against evidence using contrastive causal attribution alongside calibrated confidence and entailment signals, and records what's supported, unsupported, or missing. The Boundary Gate makes the final release decision.

  • Compiler: loose prompt → disciplined, governed query (CFPO)
  • Crosswalk: SIRE-scoped retrieval by evidence + caller identity (OpenFGA/ReBAC)
  • Generation: model produces a proposal from bounded context
  • Claim Ledger: claims decomposed, checked via contrastive attribution, recorded
  • Boundary Gate: deterministic emission decision over versioned evidence and policy

Output States

Every response carries an explicit state. "No evidence, no emission" means no unsupported decision-grade claim is emitted as authorized. The system can surface partial or narrative responses — but labels them so the caller knows what they're looking at.

  • AUTHORIZED: claims sufficiently supported by evidence
  • PARTIAL: evidence exists but coverage incomplete
  • REQUIRES_SPEC: question needs tighter scope
  • NARRATIVE_ONLY: descriptive but not decision-grade
  • BLOCKED: policy or evidence conditions not met
  • Qualifier — DEGRADED_BOUNDARY: any state may carry this when the Kura evidence boundary was incomplete

Platform Systems

The tiers define enforcement depth. These systems define how governed inference is built, measured, and improved.

  • Kura — evidence store. Aurora PostgreSQL with pgvector and tenant-scoped RLS.
  • Kadai — reasoning API. Returns responses grounded in Kura, with claims checked and states assigned.
  • Prompt Compiler — rewrites prompts using CFPO. Compiled, versioned, machine-parseable.
  • Crosswalk — retrieval + access control. Builds the authority map, enforces per-caller evidence scoping via OpenFGA/ReBAC.
  • Claim Ledger — L1–L4 evaluation. Decomposes responses into atomic claims, records confidence signals, source entailment, stability, and contrastive causal attribution.
  • Boundary Gate — emission. Deterministic gate decisions over versioned evidence and policy.
  • Neurosurgery — observability. In Workshop: returned telemetry and repeat-pass behavior. In Refinery/Clean Room: local model telemetry and hidden-state probes.

How Tiers Change the Assurance Boundary

One pipeline. Three deployment models. The difference is where the model runs and how much runtime evidence you have about what it did.

  • Workshop: shared Kadai or model API gateway. Full pipeline audit. L1–L3 evaluation (no hidden-state probes).
  • Refinery: private inference. Full audit plus local telemetry and chain of custody.
  • Clean Room: air-gapped, hardware-rooted. Full audit, local telemetry, signed attestation chain, and strong support for third-party review.

Kenshiki Labs Is and Is Not

  • Is a governance pipeline that gates claims against evidence
  • Is the control plane across all three planes — build (Kura, Compiler), orchestration (Kadai), and control (Ledger, Gate)
  • Is not a model — it governs the generation layer, doesn't replace it
  • Is not a content filter — it checks evidence, not tone or topic
  • Is not a monitoring tool — it intervenes before emission, not just after
  • Is not a replacement for your data — it checks against it

Runtime Infrastructure

Same infrastructure discipline that applies to the synthesis pipeline applies to the systems running it.

  • Network: separate VPCs for web/auth and inference workloads
  • Identity: Clerk (Workshop) / customer IdP (Refinery, Clean Room) with JWT propagation
  • Access: OpenFGA/ReBAC — per-caller, per-document evidence scoping at retrieval
  • Data: Aurora PostgreSQL with tenant-scoped row-level security
  • Ingestion: GPU-accelerated parsing (Docling — DocLayNet, TableFormer, EasyOCR), two-stage pipeline, provenance chain from upload through embedding
  • Inference: dedicated GPU instances (NVIDIA L40S), model artifacts verified at boot, digest-pinned images, vLLM with fp8 KV cache
  • Isolation: embedding and inference on separate hardware
  • Deploy: CDK-managed, gated manifest with pre-flight checks and rollback, services scale to zero when idle

Telemetry and Enforcement

Structured telemetry at every pipeline stage. In Refinery and Clean Room (local model access): inference request logs, logprob distributions, entailment scores, and ablation signals. Access control enforced by OpenFGA/ReBAC at retrieval — the model only sees evidence the caller is authorized to use.

  • Logprobs, entailment scores, and coverage metrics per response
  • OpenFGA/ReBAC enforces per-caller evidence scoping
  • CFPO ensures deterministic, auditable prompt structure
  • Every prompt versioned, compiled (not authored), machine-parseable

What Your Auditor Gets

A structured record for every evaluation. What was asked, what was in scope, what claims were made, what held up, what didn't, and why the state was assigned. Same format across tiers — enforcement depth and attestation grow as you move up.

  • Per-claim audit trail with source attribution, layer scores, and gate reason codes
  • Complete request provenance including model, evidence source, embedding, and compiler versions
  • Structured telemetry for observability and audit surfaces
  • In Clean Room, every step signed and anchored to verified hardware

Start Where You Are

Most teams progress in stages rather than jumping straight to the highest-assurance environment.

  • Workshop (hours): start on shared infrastructure with Kadai or your existing public model APIs. Retrieval, claim checking, output states — same contract either way.
  • Refinery (days to weeks): private deployment. Governed data sources, private inference engine. Full attribution at the model boundary.
  • Clean Room (weeks to months): signed everything. Attested execution. Air-gapped. For when a court or regulator asks to inspect every step.