Kenshiki Labs

System topology

Architecture

Every governed AI evaluation produces a structured record — same format whether you run Workshop, Refinery, or Clean Room.

The Kenshiki Labs platform architecture is a three-plane runtime contract — build, orchestration, and control — that produces a structured per-decision record on every governed AI response. The same audit chassis runs across all deployment tiers; what changes between Workshop, Refinery, and Clean Room is not the contract but the depth: more telemetry, stronger enforcement, and in Clean Room a signed attestation chain anchored to hardware. This is the canonical end-to-end specification operators integrate against.

Every evaluation produces a structured record — same format whether you're running Workshop, Refinery, or Clean Room. The audit record gets deeper as you move up: more telemetry, stronger enforcement, and in Clean Room, a signed attestation chain anchored to hardware.

Kenshiki Labs control plane · Signed envelope · Chain of custody
Your data · Outside Kenshiki Labs

The Contract

Kura is the evidence store — you put source material in with provenance, structure, and retrieval boundaries. Kadai is the reasoning API — you query it and get back answers grounded in what Kura contains. The model renders — Kura decides what counts.

  • Kura: source material with provenance, structure, and retrieval boundaries
  • Kadai: answers grounded in what Kura contains
  • Same contract across Workshop, Refinery, and Clean Room

What Happens at Runtime

A question enters. The Compiler rewrites it into a constrained query using CFPO (Content–Format–Policy–Output). The Crosswalk retrieves only governed evidence relevant to the question and the caller's access boundary (OpenFGA/ReBAC). The generation layer produces a proposal from that bounded context. The Claim Ledger decomposes the proposal into claims, checks each against evidence using contrastive causal attribution alongside calibrated confidence and entailment signals, and records what's supported, unsupported, or missing. The Boundary Gate makes the final release decision.

  • Compiler: loose prompt → disciplined, governed query (CFPO)
  • Crosswalk: SIRE-scoped retrieval by evidence + caller identity (OpenFGA/ReBAC)
  • Generation: model produces a proposal from bounded context
  • Claim Ledger: claims decomposed, checked via contrastive attribution, recorded
  • Boundary Gate: deterministic emission decision over versioned evidence and policy

Output States

Every response carries an explicit state. "No evidence, no emission" means no unsupported decision-grade claim is emitted as authorized. The system can surface partial or narrative responses — but labels them so the caller knows what they're looking at.

  • AUTHORIZED: claims sufficiently supported by evidence
  • PARTIAL: evidence exists but coverage incomplete
  • REQUIRES_SPEC: question needs tighter scope
  • NARRATIVE_ONLY: descriptive but not decision-grade
  • BLOCKED: policy or evidence conditions not met
  • Qualifier — DEGRADED_BOUNDARY: any state may carry this when the Kura evidence boundary was incomplete

Platform Systems

The tiers define enforcement depth. These systems define how governed inference is built, measured, and improved.

  • Kura — evidence store. Aurora PostgreSQL with pgvector and tenant-scoped RLS.
  • Kadai — reasoning API. Returns responses grounded in Kura, with claims checked and states assigned.
  • Prompt Compiler — rewrites prompts using CFPO. Compiled, versioned, machine-parseable.
  • Crosswalk — retrieval + access control. Builds the authority map, enforces per-caller evidence scoping via OpenFGA/ReBAC.
  • Claim Ledger — L1–L4 evaluation. Decomposes responses into atomic claims, records confidence signals, source entailment, stability, and contrastive causal attribution.
  • Boundary Gate — emission. Deterministic gate decisions over versioned evidence and policy.
  • Neurosurgery — observability. In Workshop: returned telemetry and repeat-pass behavior. In Refinery/Clean Room: local model telemetry and hidden-state probes.

How Tiers Change the Assurance Boundary

One pipeline. Three deployment models. The difference is where the model runs and how much runtime evidence you have about what it did.

  • Workshop: shared Kadai or model API gateway. Full pipeline audit. L1–L3 evaluation (no hidden-state probes).
  • Refinery: private inference. Full audit plus local telemetry and chain of custody.
  • Clean Room: air-gapped, hardware-rooted. Full audit, local telemetry, signed attestation chain, and strong support for third-party review.

Kenshiki Labs Is and Is Not

  • Is a governance pipeline that gates claims against evidence
  • Is the control plane across all three planes — build (Kura, Compiler), orchestration (Kadai), and control (Ledger, Gate)
  • Is not a model — it governs the generation layer, doesn't replace it
  • Is not a content filter — it checks evidence, not tone or topic
  • Is not a monitoring tool — it intervenes before emission, not just after
  • Is not a replacement for your data — it checks against it

Runtime Infrastructure

Same infrastructure discipline that applies to the synthesis pipeline applies to the systems running it.

  • Network: separate VPCs for web/auth and inference workloads
  • Identity: Clerk (Workshop) / customer IdP (Refinery, Clean Room) with JWT propagation
  • Access: OpenFGA/ReBAC — per-caller, per-document evidence scoping at retrieval
  • Data: Aurora PostgreSQL with tenant-scoped row-level security
  • Ingestion: GPU-accelerated parsing (Docling — DocLayNet, TableFormer, EasyOCR), two-stage pipeline, provenance chain from upload through embedding
  • Inference: dedicated GPU instances (NVIDIA L40S), model artifacts verified at boot, digest-pinned images, vLLM with fp8 KV cache
  • Isolation: embedding and inference on separate hardware
  • Deploy: CDK-managed, gated manifest with pre-flight checks and rollback, services scale to zero when idle

Telemetry and Enforcement

Structured telemetry at every pipeline stage. In Refinery and Clean Room (local model access): inference request logs, logprob distributions, entailment scores, and ablation signals. Access control enforced by OpenFGA/ReBAC at retrieval — the model only sees evidence the caller is authorized to use.

  • Logprobs, entailment scores, and coverage metrics per response
  • OpenFGA/ReBAC enforces per-caller evidence scoping
  • CFPO ensures deterministic, auditable prompt structure
  • Every prompt versioned, compiled (not authored), machine-parseable

What Your Auditor Gets

A structured record for every evaluation. What was asked, what was in scope, what claims were made, what held up, what didn't, and why the state was assigned. Same format across tiers — enforcement depth and attestation grow as you move up.

  • Per-claim audit trail with source attribution, layer scores, and gate reason codes
  • Complete request provenance including model, evidence source, embedding, and compiler versions
  • Structured telemetry for observability and audit surfaces
  • In Clean Room, every step signed and anchored to verified hardware

Start Where You Are

Most teams progress in stages rather than jumping straight to the highest-assurance environment.

  • Workshop (hours): start on shared infrastructure with Kadai or your existing public model APIs. Retrieval, claim checking, output states — same contract either way.
  • Refinery (days to weeks): private deployment. Governed data sources, private inference engine. Full attribution at the model boundary.
  • Clean Room (weeks to months): signed everything. Attested execution. Air-gapped. For when a court or regulator asks to inspect every step.