Kenshiki Labs

Assurance layer

Adversarial Resilience & Boundary Verification (ARBV)

Continuous adversarial testing of AI authorization boundaries. Signed Boundary Evidence Records partners and auditors can replay independently.

ARBV (Adversarial Resilience & Boundary Verification) is the assurance layer that turns AI governance from policy assertion into continuously tested, cryptographically verifiable boundary enforcement. Every adversarial test run produces a signed Boundary Evidence Record — independently replayable by auditors, regulators, and partners — proving the boundary held under semantic, retrieval, model, and policy-level pressure.

Without ARBV, AI governance remains a claim buyers have to trust. Red-team reports, dashboards, and policy text may be useful, but they do not produce independently replayable evidence that the system will not authorize what it should not authorize.

What ARBV evidences

ARBV turns governance from written policy into an evidence-generating system. The policy defines the boundary, formal checks validate it, semantic adversaries attack it, the Claim Authorization Architecture enforces it, and Boundary Evidence Records preserve what happened.

  • Formal authorization invariants for allowed and forbidden states
  • Adversarial tests for semantic, retrieval, model, and policy pressure
  • Release-blocking severity classes for dangerous authorization flips
  • Signed, replayable evidence that selected results can be independently checked

Who this is for

Governance and platform teams

define policy invariants, run adversarial boundary tests, review release-blocking flips, and hand signed evidence records to auditors and partners.

Buyers, auditors, and model-risk reviewers

receive a replayable evidence trail showing which boundaries were tested, which model/policy/retrieval versions were in scope, and whether critical authorization escapes were contained.

ARBV — Adversarial Resilience & Boundary Verification — is Kenshiki Labs’ assurance layer for deterministic authorization boundaries. It formally specifies what the system may authorize, attacks those boundaries, blocks dangerous flips, and produces replayable Boundary Evidence Records.

The Assurance Ledger

The four-pillar ARBV claim — formally specified, adversarially attacked, hardware-attested, independently replayable — is not a marketing assertion. It is the bar every authorization boundary in the Kenshiki stack is measured against. Not all boundaries meet all four pillars today. The ones that don’t are roadmap, and every status below is either backed by a commit hash or paired with a target milestone.

Legend: ✅ live and verifiable · 🟡 partial / in progress · ⏳ roadmap target named · ❌ not yet scoped

Boundary × Pillar coverage

BoundaryFormally specifiedAdversarially attackedHardware-attestedIndependently replayable
AuthorizedCorpusView — pre-filter for governed retrieval🟡 primitive shipped (aws-goober@5fd1c7b+dc2a82c); not yet wired into /v2/search retrieval path (RFC-005 T1A-5)⏳ §6.4 symmetry conformance suite (10K (user, query) pairs) scoped; not yet in CI❌ not scoped⏳ gated on T1A-5 integration; primitive itself is deterministic and hashable
/v2/search retrieval (RFC-004 structure-first)✅ five-stage EvidencePacket spec (RFC-004 structure-first-retrieval.yaml, ~4300 lines)🟡 ARBV loop described in protocol; suite not release-blocking in CI❌ not scoped✅ integrity-protected EvidencePacket (HMAC-SHA256) with request_id; outcome=refused is a first-class recorded success
/v2/chat governed orchestration✅ RFC-004 T2-A/B/C/D done; orchestrator result carries verified/confidence/attestation envelope🟡 adversarial testing manual today; release-blocking CI gate on roadmap❌ software-attested (Ed25519) but not bound to platform TEE✅ signed response with request_id; attestation chain present when gates passed
Oracle chunk retrieval (Aurora oracle_chunks)✅ Every row carries rebac_tuples JSONB (Zanzibar-style), content_tuples, manifest_digest, chunk_digest, embedding_digest❌ adversarial testing against retrieval not yet scoped🟡 Ed25519 manifest signatures (manifest_sig_alg, manifest_key_id) — software-attested key material, not TEE-attestedchunk_digest + manifest_digest + embedding_digest per row; third party can recompute against pinned manifest
ReBAC policy layer (Layer 1 capability-split)✅ 4 roles + 3 principals committed aws-llm-content@9d182c5 + lagrange-glass@0c121c08✅ 9/9 negative boundary tests pass (writer cannot read, reader cannot escalate, validator cannot write) on every ES bootstrap❌ not scoped🟡 test matrix is re-runnable but not externally signed
Claim Ledger emit⏳ spec live; emit implementation being audited against spec⏳ release-blocking ARBV suite targeted❌ not scoped⏳ Ed25519 per-entry signing targeted
Boundary Gate (Kenshiki Gate)🟡 five output states defined (AUTHORIZED, UNAUTHORIZED, ABSTAIN, EXCLUDED, DEGRADED); formal invariants in progress⏳ flip-detection suite scoped (critical flips EXCLUDED→AUTHORIZED release-blocking)❌ not scoped🟡 gate decisions present in response envelope but not yet a signed per-decision artifact
Portal auth callback (Keycloak/OIDC)assertVerifiedEmailClaim + realm-mismatch guard + portal-principal resolution (PR #79, landed on main at e083950)✅ test coverage for verified-email rejection, realm mismatch, happy path❌ hardware attestation not scoped (software signature verification only)❌ not currently signing the authentication event for external replay

Current known gaps

These are called out explicitly in the corpus-spec §15.2 gap analysis and tracked in RFC-005:

  • G1 — AuthorizedCorpusView is a primitive, not yet the retrieval pre-filter. Today’s ReBAC filter in gates.py runs post-retrieval, which leaks total_hits counts for unauthorized documents. The spec §6.1 explicitly forbids this. The fix (T1A-5) wires the primitive into structure_first.py and removes the post-filter. Not a canary blocker in single-tenant mode (post-filter equals pre-filter semantically for one tenant); ships alongside or before Tier 3 to make symmetry testable.
  • G2 — SearchAudit durable table missing. Goober emits request-level structured logs to CloudWatch, but there is no queryable audit table with authorized_view_hash + result_chunk_ids per request. Tracked for Tier 3.
  • G3 — Cursor-signed PITs not implemented. /v2/search returns the full response in one shot; no ES _pit use; no HMAC-protected cursor bound to (user_id, authorized_view_hash, corpus_version_hash). Tracked for Tier 3.
  • G4 — Citation graph store missing. No graph store exists (§1.4 CrossRef); oracle_chunks has no cross_refs column. Tracked for M2.
  • Hardware attestation — not yet scoped across any boundary. The ARBV protocol documents attestation as a future property; no current boundary is TEE-bound. Target environment is Clean Room deployment with Nitro Enclave attestation of the policy version and retrieval binary. This is the column most likely to stay ❌ through the Tier 1 build.

How to verify

Every commit hash in this ledger resolves to a public or customer- accessible repository. Gaps are tracked in:

  • docs/rfc-004/structure-first-retrieval.yaml (AGORA retrieval protocol)
  • docs/corpus-spec.mdx §15–§16 (gap analysis + dependency-ordered build plan)
  • RFC-005 T1A-5 (AuthorizedCorpusView wiring)

If a commit hash in this table does not resolve, or a “live” claim is not actually live in the referenced repository, the claim is false. The ledger is the self-correcting surface — falsification breaks in public, not in marketing review.

Update cadence

The ledger is reviewed at every release cut. A boundary that moves from 🟡 to ✅ requires a PR that updates this table alongside the implementation change. A boundary that regresses (rare, but possible during refactors) has its row downgraded the same day, with the commit that caused the regression named.

Last reviewed: 2026-04-24. Next review: next release cut.