Kenshiki Labs

Evidence store

Kura

Stores what counts as real. SIRE-tagged evidence corpus with deterministic identity, retrieval boundaries, and provenance for every chunk.

Kura is the Prepare API — the write side of the evidence boundary that turns authoritative source material into governed retrieval context before the model ever sees it. Every request enters through the Prompt Sanitizer, which authenticates the caller and binds identity via OpenFGA/ReBAC before anything else fires. Kura ingests sources through a two-stage pipeline — GPU-accelerated parsing (Docling with DocLayNet layout analysis, TableFormer for tables, EasyOCR for images) followed by CPU-side enrichment that attaches SIRE identity (Subject, Included, Relevant, Excluded), clause IDs, normative markers, and provenance. Every chunk carries a SHA-256 source hash and HMAC-SHA-256 watermark for tamper-evident verification without database access. Retrieval is hybrid (pgvector + tsvector ranking) and enforces the SIRE exclusion gate before any chunk reaches the model. Tenant-scoped row-level security is live today; caller-specific OpenFGA/ReBAC retrieval enforcement is the next boundary. Corpus Explorer is the inspection surface where users verify the underlying text behind any cited clause, querying the same governed Elasticsearch alias the runtime retrieval layer uses. Without Kura, every downstream decision is an assertion without evidence: the Compiler cannot scope what the model sees, the Ledger has nothing to verify against, the Gate has no basis.

Without Kura, every downstream decision is an assertion without evidence. The Compiler cannot scope what the model sees. The Ledger has nothing to check claims against. The Gate has no basis. No evidence, no grounded answer.

How Kura becomes governed model context

Read this left to right. Source material enters the evidence boundary, becomes policy-bearing chunks, and only then becomes bounded retrieval context. Kura stops at that handoff. It does not generate the final answer or verify claims; it makes governed evidence and identity available for downstream orchestration.

Kura Evidence Lifecycle
Source material becomes governed retrieval context before the model sees anything.
Step 1 of 5Source
Step 2 of 5Extract
Step 3 of 5Enrich
Step 4 of 5Store
Step 5 of 5Retrieve

Why Kura Exists

Standard RAG retrieves whatever is nearest in embedding space and hands it to the model. No authority boundary, no provenance, no access control, no way to inspect what the model was allowed to see. Governed inference requires a governed evidence boundary.

  • RAG without authority boundaries is retrieval, not governance
  • The model must not see evidence the caller cannot access
  • Post-generation scoring cannot fix what was never in scope
  • Every claim in the Ledger traces back to a specific chunk with provenance

What Kura Does

Transforms source documents into a queryable, tamper-evident knowledge base. Every chunk carries provenance from upload through embedding.

  • SHA-256 source hash, idempotent upsert, version-aware change detection
  • Section-aware chunking on heading boundaries with merge for undersized chunks
  • HMAC-SHA-256 watermarks per chunk — verification without database access
  • Embedding via text-embedding-3-large (512d Matryoshka)
  • Tenant provenance on every row, enforced by CHECK constraints

Pre-loaded Regulatory Corpus

Every Kenshiki Labs environment ships with a governed evidence base covering major AI governance standards, compliance frameworks, and industry-specific regulatory guidance — 2,200+ chunks, pre-tagged with SIRE identity and relationship mappings. Each framework is mapped through the Ontic Compliance Catalog to enforceable obligations, so the SIRE gate knows which evidence must exist before a governed request can proceed. Governed inference works on day one. Add your own documents on top.

  • EU AI Act (Regulation 2024/1689)
  • EU GDPR
  • HIPAA Administrative Simplification
  • PCI DSS 4.0.1
  • ISO/IEC 27001:2022 — Information Security
  • ISO/IEC 42001:2023 — AI Management System
  • ISO/IEC 23894:2023 — AI Risk Management
  • NIST AI Risk Management Framework 1.0
  • NIST AI 600-1 — Generative AI Profile
  • NIST Cybersecurity Framework 2.0
  • AICPA Trust Services Criteria (SOC 2)
  • DOJ Evaluation of Corporate Compliance Programs
  • 28 industry verticals: Financial Services, Healthcare, Defense & Intelligence, Government, Legal, Energy, Life Sciences, Education, and more
  • Ontic Compliance Catalog maps each framework to abstract obligations — the SIRE gate enforces them before retrieval

SIRE Identity System

SIRE (Subject, Included, Relevant, Excluded) is deterministic identity metadata embedded in source frontmatter during ingestion. It defines what each source covers, relates to, and must never answer. Only Excluded enforces — the other three inform discovery.

  • Subject: anchors the source to a domain (e.g., soc_2_trust_services_criteria, eu_ai_act)
  • Included: enriches search with covered terminology (e.g., 'conformity assessment', 'cardholder data')
  • Relevant: maps cross-source topology (e.g., ISO 27001 → SOC 2; NIST AI RMF → EU AI Act)
  • Excluded: hard boundary (e.g., SOC 2 excludes 'sox', 'gaap', 'hipaa')
  • Exclusion gate purges matching chunks at retrieval — case-insensitive, word-boundary match
  • SIRE proposals generated by keyword frequency scan, then manually curated before application

How to use Kura

POST /v2/documents with your source files. Kura parses, chunks, embeds, and tags them. Retrieval happens automatically when Kadai processes a governed request — or call the retrieval API directly. The same API works across all three tiers.

  • Ingest: POST /v2/documents with PDF, DOCX, JSON, Markdown, YAML, or CSV. Kura handles extraction, SIRE tagging, chunking, and embedding.
  • Retrieve: GET /v2/documents to list. Retrieval for governed responses is automatic through KadaiKura scopes by the caller boundary and source identity.
  • Workshop: Kura runs on shared Kenshiki Labs Aurora PostgreSQL with pgvector. Ingest via REST API from anywhere. Pre-loaded regulatory corpus available on day one.
  • Refinery: Kura runs inside your private deployment. Ingestion endpoints are internal to your VPC. Evidence stays inside your boundary.
  • Clean Room: Kura runs on local Aurora-compatible database inside the air gap. Documents ingested via secure media transfer — no network path.
  • Same ingestion API, same SIRE tagging, same retrieval interface, same ReBAC access control — the caller code does not change between tiers
  • Full API reference: /articles/governed-intelligence-api

Sanitizer

Every governed request enters through the Prompt Sanitizer. Before Kura scopes evidence or the Compiler builds the prompt contract, the Sanitizer authenticates the caller, binds their identity via OpenFGA/ReBAC, and propagates the access boundary downstream. No anonymous request reaches the runtime.

  • Authenticates the caller — Clerk for Workshop, customer IdP for Refinery and Clean Room
  • Binds caller identity to the request via OpenFGA/ReBAC
  • Propagates the access boundary to Kura retrieval and downstream telemetry
  • Every audit record traces back to the identity established here

Corpus Explorer

Corpus Explorer is the inspection surface for the governed corpus. When a Kadai response cites a clause, this is where you read the underlying text. It queries the same retrieval index the runtime uses, so what you see is what the system saw when it answered.

  • Full-text and facet search over the pre-loaded regulatory corpus and your own ingested documents
  • Clause-level citations resolve back to the exact retrieval anchor
  • Authority tier and provenance metadata visible on every hit
  • Reader-only — does not paraphrase, blend, or interpret

Who this is for

Corpus engineers

data stewards who curate, version, and maintain authoritative source collections inside the evidence boundary. Responsible for ingestion, SIRE tagging, and evidence quality.

Every downstream system

Compiler draws evidence for zone mapping. Ledger checks claims against it. Gate relies on it for emission policy. Kadai returns answers bounded by what Kura contains.

Kura — the governed evidence store — is the evidence boundary. SIRE (Subject, Included, Relevant, Excluded) identity tags scope what each source covers and what it must never answer. Every chunk carries provenance chains, SHA-256 hashes, and HMAC-SHA-256 watermarks. Compiler — the prompt-assembly engine —, Ledger — the integrity-protected inference audit trail —, and Gate — the emission policy boundary — all depend on Kura as the source of governed evidence.