Kenshiki Labs

Regulatory Corpus Search

Corpus Explorer

Search the regulatory corpus the model is bound to. Inspect the exact source, version, and clause behind every Kadai response — without leaving the system.

Corpus Explorer is the verification surface for the Kura evidence boundary. Operators and auditors search the SIRE-tagged regulatory corpus — by source, version, jurisdiction, or clause — to verify that a cited reference exists, says what the answer claims, and came from the regulation the citation names. The burden of verification moves from the reader to the system; replay is one click, not a third-party PDF and a string-match by eye.

Without Corpus Explorer: users have to trust that a cited clause actually exists, actually says what the answer claims it says, and actually came from the regulation the citation names. There is no replayable way to verify a citation without leaving the system, opening a third-party PDF, and string-matching by eye. The burden of verification falls on the reader, and in practice it rarely happens.

What Corpus Explorer does

Corpus Explorer exposes the live regulatory index that backs governed retrieval. You enter a query, optionally pin filters (authority tier, source regulation, node type), and get back a ranked list of structural nodes — articles, paragraphs, points, definition blocks — with their anchors, canonical citations, and governance metadata. Every hit links to a chunk view that shows the full text plus the graph of provisions that reference it.

  • Full-text + facet search over the AI governance corpus — currently the canonical five regimes (EU AI Act, GDPR, ISO 42001, ISO 23894, ISO 27001), expanding as new frameworks ingest (NIST AI RMF, DoD RAI, state AI laws, ISO 5338/25059/ 24029, OECD principles, sector-specific guidance)
  • Pinned filters for authority tier, source regulation, and node type
  • Clause citations (`regulation/eu_ai_act/article/article-6`) that resolve back to the exact retrieval anchor
  • Chunk-level inspection with paragraph, point, and definition block structure preserved

Who this is for

Compliance and policy analysts

use Corpus Explorer to audit which regulatory provisions ground a given claim, search across the full corpus by clause or keyword, and verify that the system's citations match the source text.

Tenant users reviewing a governed answer

click a citation in a Kadai response and land on the exact chunk with clause path, authority tier, and facet context, so "trust, but verify" takes seconds instead of hours.

Search the ground truth that grounds every governed answer.

Corpus Explorer is Kenshiki’s reader for the regulatory corpus behind governed retrieval. Every Kadai answer that cites a clause points at a node in this index. Corpus Explorer lets operators, analysts, and end users read that node directly — in its canonical structure, with its authority tier, and with the neighbouring provisions it belongs to.

Most AI search tools blur the line between source and synthesis. They retrieve a chunk, paraphrase it, and ship the paraphrase as if it were the source. That is fine for a consumer assistant. It is not fine for a system that has to defend its citations to compliance, legal, or a regulator.

The critical principle: the corpus is the authority, not the summary.

Kadai synthesises. The Claim Ledger verifies. The Gate releases. Corpus Explorer is the part of the control plane that lets a human open the box and read the primary text — the same text the runtime read when it produced the answer.


The Problem: Citations Without Verification

Most AI systems today produce citations that are hard to verify:

  • The cited page does not exist, or has moved.
  • The paraphrase diverges from the source in ways a non-specialist cannot spot.
  • The clause number is correct but belongs to a different regulation.
  • The retrieval layer returned a neighbouring provision and the model stitched it to the wrong clause.
  • The citation looks precise but was manufactured by the model.

For consumer chat, this is tolerable. For a system that an operator has to defend in a model-risk review, it is disqualifying.

The real question is not:

Did the model cite a source?

The real question is:

Can I open the source, right now, and see the exact text the system saw?

Corpus Explorer is Kenshiki’s answer.


Kenshiki’s Answer: Read What the Retriever Read

Corpus Explorer queries the exact Elasticsearch index that the governed retrieval layer queries at runtime. No separate export. No shadow copy. No reshaping for human consumption. The hits a user sees in Corpus Explorer are the hits the runtime would have returned for the same query, filtered through the same authorized view.

Every node in the index carries:

  1. A stable anchorregulation/eu_ai_act/article/article-6 — content-addressed and reindex-safe.
  2. Structural metadata — chapter, article, paragraph, point — so the clause path is explicit, not implied.
  3. An authority tier — binding law, normative guidance, reference material — so users know how much weight the text carries.
  4. A source regulation — so cross-corpus results are never silently blended.
  5. Neighbouring edges — parent anchor, referenced anchors, referring anchors — so navigating context is a click, not a search.

Instead of saying:

The system cited Article 6 of the EU AI Act.

Corpus Explorer enables a stronger claim:

Here is the exact anchor, the full clause text, the paragraph and point it belongs to, its authority tier, and every provision that references it — all pulled from the same index the runtime used at answer time.

That is the difference between a citation and a verifiable reference.


From Query to Verified Reference

Query

Authorized View Filter (pre-retrieval)

Elasticsearch corpus_units_current_aigov

Ranked Hits + Facet Aggregations

Unit Detail (text + clause path + canonical group)

Stable Unit ID → Paste into Kadai / Brief / Runbook

Corpus Explorer turns regulatory search from a PDF hunt into an evidence-generating workflow.

The query defines the scope. The authorized view filters what the caller can see. Elasticsearch returns ranked structural nodes. Facet aggregations show distribution across corpora, tiers, and node types. Chunk detail opens the primary text. The anchor is a stable, pasteable reference.


What Corpus Explorer Covers

The index currently spans the canonical AI governance regimes — and expands as new frameworks come into force.

Live today:

  • EU AI Act (Regulation 2024/1689) — chapters, articles, paragraphs, points, and definition blocks.
  • GDPR (Regulation 2016/679) — data subject rights, controller obligations, and cross-border transfer provisions.
  • ISO/IEC 42001 — AI management system standard.
  • ISO/IEC 23894 — AI-specific risk management technical baseline.
  • ISO/IEC 27001 — information security management system controls.

Ingest in progress / queued:

  • NIST AI RMF 1.0 and 2.0 — US federal and enterprise AI risk management baseline.
  • DoD Responsible AI Strategy & Implementation Pathway — federal contracting and grant traceability requirements.
  • SEC and FTC AI guidance and enforcement releases — the source material behind every “AI washing” enforcement action.
  • ISO/IEC 5338, 25059, 24029 — AI lifecycle, quality, and robustness standards.
  • State AI laws — California, Colorado, New York, Illinois, Texas and the others as they pass.
  • OECD AI Principles, Council of Europe AI Convention, UK AISI frameworks, Singapore Model AI Governance Framework, Canada AIDA, China algorithmic-recommendation and generative AI rules.
  • Sector-specific frameworks — FDA AI/ML guidance, EU DORA, automotive UN R155/R156, financial-sector AI guidance.

The crosswalk graph between these corpora compounds non-linearly. Edges scale with cross-regime pairs, not with single-regime nodes, so adding NIST to a graph that already contains EU AI Act and ISO 42001 creates new edges to both — and every subsequent corpus does the same. The current edge count is a floor that grows with every ingest cycle.

Every node is indexed by the same pipeline, with the same provenance fields, the same authority tier, and the same SIRE-tagged retrieval boundary that governs runtime retrieval at answer time.


Built for Verifiable Grounding

Corpus Explorer is designed for environments where every citation in a governed answer must be traceable to primary text:

  • Compliance review — pull the exact text behind a cited obligation without leaving the governance surface.
  • Model-risk audits — confirm that retrieval is returning the right corpus, the right authority tier, and the right clause for a given query class.
  • Operator triage — when an end user disputes a governed answer, land on the source in seconds.
  • Policy drafting — search the corpus by clause to find neighbouring provisions that should be considered together.

What Corpus Explorer Does — and Does Not — Claim

Corpus Explorer does

  • Expose the exact same Elasticsearch index the runtime retrieval layer uses.
  • Return structural hits with stable anchors, authority tiers, and clause citations.
  • Respect the authorized view — callers only see chunks their role is permitted to retrieve.
  • Surface facet distributions so users understand the shape of results, not just the top hits.
  • Preserve source regulation boundaries — a cross-corpus query still returns per-corpus grouping.

Corpus Explorer does not

  • Paraphrase, summarize, or rewrite regulatory text.
  • Merge chunks from different regulations into a single synthesized answer.
  • Bypass the authorized view or return chunks the caller’s role is not permitted to see.
  • Replace legal interpretation — it surfaces primary text, not legal advice.
  • Assert that the corpus is a complete enumeration of applicable law — it is the indexed set.

Corpus Explorer is not a legal research engine.

It is the verification surface behind every governed answer — designed so that “trust, but verify” takes seconds.


The Kenshiki Difference

Most AI systems treat the corpus as a detail — something the retrieval layer hides behind a paraphrase. Kenshiki treats the corpus as a product surface.

The same Elasticsearch index the runtime queries is the index the operator searches. The same authority tier the model sees is the tier the analyst sees. The same anchors the system cites are the anchors the user can paste into the next prompt.

That is what Corpus Explorer provides:

A direct, verifiable reader for the regulatory ground truth behind every governed answer.

Not a generic search box. Not a paraphrase of the law. Not a shadow copy that drifts from the runtime view.

Primary regulatory text — continuously indexed, consistently authorized, and always inspectable.