Kenshiki Labs

Research Paper

Evidence Contracts: A Primitive for Non-Repudiable AI Agent Systems

Existing AI governance artifacts — model cards, policy files, audit logs — are descriptive: they describe a system or record events after the fact. The evidence contract is a primitive: a signed, append-only authorization record that binds one specific AI action to its bounded evidence set, declared policy state, principal context, and verifiable lineage. Evidence contracts make consequential AI actions institutionally legible — reconstructable on demand, bounded by what was authorized, and materially harder to repudiate after emission.

3,669 words · ~18 min

Stephen Fishburn Kenshiki Labs

Abstract

AI governance regimes increasingly require organizations to document model-driven behavior, but existing approaches — model cards, policy files, evaluation artifacts, and logs — are descriptive rather than evidentiary. We introduce the evidence contract: a signed, append-only assertion that a specific AI system action was authorized under a specific policy state, by a specific principal, over a bounded evidence context. Evidence contracts compose into an AI chain of custody, a runtime substrate for what we call deterministic admissibility: the property that a system output can be reconstructed from a tamper-evident chain of authorized evidence transformations without requiring trust in the model’s latent state.

We ground the proposal in a production governance envelope already emitted by Kenshiki Labs’ runtime. That envelope records request identity, integrity-gate status, model and build metadata, tenant attestation, completion hashes, and signature material. Kenshiki’s runtime additionally enforces identity- and relationship-aware evidence boundaries using Keycloak for principal identity and OpenFGA for tuple-based authorization over documents and chunks. These mechanisms demonstrate that the operational substrate for evidence contracts already exists in production. But the current artifact remains a proto-contract: it preserves execution and integrity better than it preserves typed authorization semantics. In particular, it lacks uniform action typing, explicit evidence-boundary serialization, and parent-child chaining across semantically consequential stages.

We define the evidence contract primitive, specify composition rules for verifiable contract chains, derive a worked example from a live production trace, and position the primitive against provenance systems, policy-engine decision logs, transparent logs, and ex ante governance documentation. We also provide a field-mapping appendix showing that each formal contract field is either already present in production or trivially derivable from existing runtime state. We argue that evidence contracts are the missing layer between runtime governance telemetry and institutional accountability for AI-mediated action.


1. Introduction

AI governance today operates in two weakly connected modes. The first is ex ante documentation: organizations produce policies, model cards, risk registers, control narratives, and management-system artifacts. The second is ex post observation: organizations retain logs, metrics, evaluation outputs, and incident records that help them infer what likely happened after the fact. Both matter. Neither suffices when AI outputs support regulated decisions, operational actions, or other institutionally consequential acts.

What is missing is a runtime primitive that binds a specific action to bounded authority.

This paper introduces that primitive: the evidence contract. An evidence contract is a signed, append-only assertion that a system action occurred under a declared principal, a declared policy state, and a bounded evidence context. Unlike ordinary observability telemetry, it is not merely descriptive. Unlike ex ante documentation, it does not speak at the class level of “the system.” It binds a specific action instance.

The proposal is not purely hypothetical. Kenshiki’s production runtime already emits signed governance envelopes containing request identity, timestamps, integrity-gate status, model and build metadata, completion hashes, tenant attestation, and signature material. The runtime also already enforces identity- and relationship-aware evidence boundaries through Keycloak-backed identity, OpenFGA authorization, document identity, and chunk-level tuple scoping. These artifacts show that high-assurance AI systems can already produce much of the substrate required for action-level governance. The remaining step is semantic formalization: to turn governance telemetry into typed authorization records that can compose across multi-stage workflows.

The central claim of the paper is:

High-assurance AI requires not merely observability, but non-repudiable authorization artifacts for semantically consequential actions.

We use the term deterministic admissibility in a narrow technical-governance sense: the property that an output can be reconstructed from a tamper-evident chain of authorized evidence transformations. This paper does not fully resolve the legal admissibility of AI outputs under evidentiary doctrine. That broader legal argument is deferred. Here, the claim is narrower: evidence contracts support institutional auditability and reconstruction by shifting trust from opaque model state to the integrity of an authorization chain.

Contributions

This paper makes five contributions.

  1. It defines evidence contracts as a typed authorization primitive distinct from observability telemetry and ex ante governance documentation.
  2. It specifies composition rules that make chains of contracts verifiable across retrieval, filtering, verification, and response emission.
  3. It introduces deterministic admissibility as a technical-governance property supported by such chains.
  4. It grounds the proposal in a production governance envelope already emitted by Kenshiki Labs, distinguishing proto-contracts from formal contracts.
  5. It derives a worked example from a live production trace and provides a field-mapping appendix identifying the remaining semantic gaps required to make runtime governance artifacts institutionally legible.

2. From Governance Telemetry to Evidence Contracts

The distinction between governance telemetry and evidence contracts is foundational.

A governance telemetry artifact records execution facts: that a request occurred, that a model answered, that a gate passed or failed, that a signed envelope was emitted. Kenshiki’s existing artifacts already do this well. A request-scoped record can include request ID, timestamps, response text, model identity, build SHA, latency measurements, gate results, completion hash, signing metadata, chain status, and tenant attestation. Verified response forms can also include evidence items, fallback reasons, access-decision metadata, and integrity-gate results.

An evidence contract adds what governance telemetry does not yet reliably preserve: a typed statement of who was authorized to do what, over which bounded evidence set, under which policy state, producing which outcome, with which lineage relation to prior actions.

A useful analogy is the difference between a packet capture and a signed financial transaction. Both record something real. Only one directly expresses an institutionally legible authorization event. Evidence contracts are therefore not better logs. They are the formalization layer that turns runtime governance into action-level accountable records.

In Kenshiki’s case, that formalization layer sits on top of mechanisms already in production: Keycloak-backed user identity, document identity, OpenFGA relationship authorization, and chunk-level tuple storage for user attributes and evidence scoping. What the system computes internally is already close to a contract-grade authorization state. The missing step is to emit that state in a normalized, signed, chainable form.


Evidence contracts sit adjacent to several established lines of work, but are not reducible to any one of them.

Provenance models. W3C PROV and related provenance systems describe derivation and lineage between entities, activities, and agents. They are the closest conceptual relative. But provenance models are primarily descriptive. They do not, by themselves, bind an action to a policy state, typed outcome semantics, or an authorization decision over an admissibility boundary. Evidence contracts therefore extend provenance from “what derived from what” to “what action was authorized under what bounded authority.”

Model documentation artifacts. Model cards, datasheets, and related documentation practices improve transparency and governance at the class level. They describe systems, datasets, limitations, and intended use. They do not bind individual runtime actions to evidence and policy. Evidence contracts operate at the opposite granularity: the action instance rather than the system class.

Governance frameworks and regulatory regimes. Frameworks such as NIST AI RMF, ISO 42001, and recordkeeping obligations in emerging AI regulation call for documentation, traceability, and accountability. But they do not specify a cryptographic primitive that makes runtime governance non-repudiable. Evidence contracts aim to fill this gap by supplying an action-level artifact that such frameworks can point to rather than merely imply.

Audit-log integrity and transparent logs. Work on secure audit logging, history trees, certificate transparency, Sigstore, and in-toto demonstrates how tamper-evident records and attestations can be chained and verified. These are close technical cousins. But in-toto, for example, is centered on software supply-chain provenance, while evidence contracts are centered on runtime authorization over evidence-bearing AI actions. The shared concern is integrity; the difference is semantic scope.

Policy engines and decision logs. Systems such as OPA or Cedar can emit decision logs showing whether a policy check passed. These logs are valuable but typically remain localized decision artifacts. Evidence contracts differ by making the decision itself a typed node in a larger chain of custody spanning principal identity, retrieval scope, evidence filtering, synthesis permissions, and response emission.

The gap, then, is not absence of provenance, signatures, or policy logs as such. It is absence of a unifying runtime artifact that combines these into a bounded authorization record for AI-mediated action.


4. Defining the Evidence Contract Primitive

An evidence contract is a signed, append-only assertion that a semantically consequential AI-system action was authorized under a specific policy state and bounded evidence context.

At minimum, a contract contains:

  • Principal: user, service, tenant, delegated actor, and purpose.
  • Action: a normalized namespaced action type such as prompt.compile, evidence.retrieve, evidence.filter, claim.verify, gate.decide, answer.emit, or fallback.emit.
  • Resource scope: document IDs, chunk IDs, or other governed resources implicated in the action.
  • Evidence boundary: candidate evidence, admitted evidence, excluded evidence, and completeness state.
  • Authorization provenance: decision engine, policy basis, relation or attribute class, and versioned policy model.
  • Execution context: model identity, build ID, request lineage, and relevant runtime scope.
  • Outcome: authorized, filtered, denied, fallback, escalated, authorized_degraded_boundary, or other typed results.
  • Integrity metadata: hash, signature, key ID, timestamp, parent reference, and chain position.

4.1 Proto-Contract vs Formal Contract

Kenshiki’s current governance envelope is best understood as a proto-contract. It already records request identity, timestamps, gate results, model/build identity, completion hash, signature presence, chain status, tenant attestation, and policy metadata. Its governed response schemas additionally model evidence items, fallback behavior, access-decision metadata, and authorization context.

A formal evidence contract extends this proto-contract in three ways:

  1. It makes action semantics explicit and normalized.
  2. It serializes evidence boundaries and authorization provenance as first-class fields.
  3. It links stage-specific contracts into a verifiable parent-child chain.

This distinction is central to the paper. The claim is not that evidence contracts already exist fully formed in production. The claim is that the production substrate already exists, and the paper formalizes the missing semantics.

4.2 Composition Rules

A chain of evidence contracts is valid only if its contracts satisfy explicit composition rules.

Hash-link integrity. For a chain C=c1,,cnC = \langle c_1, \dots, c_n \rangle, each contract ci+1c_{i+1} must reference the cryptographic digest of cic_i, unless ci+1c_{i+1} is a genesis contract in a new chain segment.

Evidence monotonicity. A child contract’s admitted evidence boundary must be a subset of, or an explicitly transformed derivative of, its parent’s authorized evidence scope. If evidence is excluded or filtered, the contract must record the operation and basis.

Policy consistency. Contracts in a chain must reference a coherent policy-bundle version unless an explicit re-authorization contract marks a transition to a new policy state.

Principal propagation. A child contract must either inherit authority from the parent contract’s outcome or explicitly record a delegated or re-authorized principal state.

Typed outcome determinacy. Each contract must emit a single normalized outcome code. Ambiguous disjunctions are disallowed.

The rules above describe linear chains. Fan-out and fan-in patterns — common in agent workflows — generalize naturally: a child contract may reference multiple parent digests, and composition rules apply pairwise to each parent relation. Full treatment of DAG-shaped contract graphs is deferred.

These rules turn “chain of custody” from metaphor into primitive.


5. Worked Example: What the Envelope Records Today and What the Contract Adds

Consider a production response to the user query, “What governance controls are active?” The current trace records a unique request ID, response text, model identity, provider/runtime metadata, gate outcome, response-mode headers, completion hash, signing metadata, and tenant attestation. The answer text itself identifies the semantically relevant runtime stages: prompt compilation, evidence retrieval, claim verification, and gate decision.

The system therefore already records:

  • that a request occurred,
  • that a response was emitted,
  • which model/build produced it,
  • which integrity and SLO gates fired,
  • and that a signed envelope exists.

What the evidence contract adds is a typed, stage-specific authorization chain:

  • prompt.compile
  • evidence.retrieve
  • claim.verify
  • gate.decide
  • answer.emit

For this same request, the current envelope shows no KB match and a passing integrity gate. A contract chain would make those semantics explicit. The retrieval contract would state that no governed KB evidence was admitted. The gate contract would state that the response remained authorized under the applicable fallback policy despite that absence. The emission contract would then bind the final answer to that explicit authorization state rather than merely to a response hash.

This becomes even clearer in Kenshiki’s governed response schemas. There, a verified answer may carry evidence items and an access decision; a filtered answer may include allowed content and policy-basis metadata; a fallback response may explicitly state that evidence was filtered out. Those structures already suggest the right contract outcomes: authorized, authorized_filtered, and fallback_filtered_out. Evidence contracts simply normalize those outcomes and bind them into a signed chain.


6. Threat Model

Evidence contracts are valuable only insofar as they defend against real failure modes. The relevant threat model includes at least six classes of attack or breakdown.

Insider tampering. An operator may alter logs, delete unfavorable records, or rewrite evidence scopes after the fact. Evidence contracts defend against silent mutation through signatures, append-only linkage, and hash verification, but still require secure storage and retention controls.

Model substitution. A system may claim one model identity while serving another. Contracts can bind declared model identity and build metadata, but verification of actual runtime execution may require orthogonal attestation mechanisms.

Policy rollback or re-labeling. A system may sign an action under one policy state and later claim a different governing version. Contracts mitigate this only if policy-bundle identifiers are immutable and retrievable.

Replay and timestamp manipulation. A valid contract may be replayed in a different request context or with distorted time semantics. Parent references, request scope, and nonces reduce but do not eliminate this threat.

Signing-key compromise. If the signing key is compromised, contracts may retain syntactic validity while losing trust. Hardware-backed custody, key rotation, and transparency logs remain necessary complements.

Collusion across principals. A signing service and tenant actor may collude to authorize an invalid action. Evidence contracts do not solve collusion by themselves; they make collusive paths more attributable.

The point is not that evidence contracts solve all assurance problems. It is that they make a broad class of integrity and accountability failures detectable and reviewable in ways current runtime telemetry does not.


7. Reference Implementation: Identity, Authorization, and Bounded Evidence

Kenshiki’s implementation is strongest when described not as a bulleted feature list, but as an end-to-end governed path.

A user authenticates through Keycloak, establishing principal identity and session context. The request then enters a governed runtime in which evidence access is evaluated against document and chunk identity through OpenFGA. User attributes are represented as tuples at the chunk level, allowing retrieval scope to be constrained not merely by coarse document membership but by relationship- and attribute-aware evidence boundaries. The system can then emit a verified answer, a filtered answer, or a fallback depending on what the policy permits, while also producing a signed governance envelope that binds request identity, model/build state, gate outcomes, and completion integrity.

What exists today is therefore already close to a contract-grade system:

  • identity-aware principals,
  • relationship-aware resource authorization,
  • chunk-level admissibility logic,
  • evidence-aware response forms,
  • and signed runtime envelopes.

What is still missing is not runtime governance itself. It is compact, signed, chainable emission of the authorization state the system already computes.

That is why evidence contracts are the right next abstraction. They do not replace Keycloak, OpenFGA, document identity, or chunk-level tuples. They formalize their outputs into a verifiable record of bounded authorization.


8. Deterministic Admissibility

We define deterministic admissibility narrowly.

An output oo is deterministically admissible under a contract chain C=c1,,cnC = \langle c_1, \dots, c_n \rangle iff:

  1. each cic_i verifies under its declared signature and key identifier;
  2. each non-genesis ci+1c_{i+1} references the cryptographic digest of its parent contract;
  3. each cic_i‘s evidence boundary is consistent with the outcome of its parent contract or with an explicit re-authorization transition;
  4. each cic_i‘s policy state is retrievable at its declared version;
  5. the final contract in the chain authorizes emission of oo.

This is an auditability property, not a truth guarantee.

A perfectly reconstructed chain may still culminate in an incorrect answer. Evidence contracts do not prove semantic correctness. They prove that the path by which an answer became authorized is tamper-evident, reconstructable, and bounded by declared evidence and policy. That distinction matters. It prevents the paper from overclaiming and clarifies why evidence contracts are a governance primitive rather than a universal solution to AI reliability.


9. Example Contract

A normalized contract derived from the existing runtime might look like this:

{
  "contract_id": "ec_de41c5c2_answer_emit_v1",
  "chain_position": "genesis",
  "principal": {
    "tenant_id": "<redacted-uuid>",
    "subject_type": "assistant_service",
    "acting_as": null,
    "purpose": "governed response emission",
    "idp": "keycloak"
  },
  "action": {
    "type": "answer.emit",
    "request_id": "de41c5c2-4348-467b-a8e0-ed73cff06c23"
  },
  "resource_scope": {
    "documents": [],
    "chunks": []
  },
  "authorization": {
    "engine": "openfga",
    "decision": "authorized",
    "policy_basis": [],
    "policy_bundle": "policybundle/<redacted-version>"
  },
  "evidence_boundary": {
    "candidate_chunks": [],
    "admitted_chunks": [],
    "excluded_chunks": [],
    "completeness": "no_governed_kb_match"
  },
  "execution_context": {
    "model_id": "Qwen/Qwen3-14B",
    "build_sha": "5dfbf9a9d2f29ace10f0957b7ae4f569106de902"
  },
  "outcome": {
    "status": "authorized",
    "response_mode": "answer",
    "completion_hash": "1c7e40765e396560d0b9cad243c9b67f4d88b4925d9b17f87745d71efd764ab7"
  },
  "integrity": {
    "sig_alg": "ed25519",
    "key_id": "harness-key-01",
    "timestamp": "2026-04-18T02:48:16.429105+00:00",
    "chain_status": "verified"
  }
}

This example is partly drawn from live fields and partly normalized to reflect semantics already present in the surrounding system. That is precisely the point of the paper’s proto-contract framing.


10. Limitations

Evidence contracts do not prove source truth, semantic correctness, fairness, or legitimacy. They do not replace evaluation, oversight, or human judgment. They do not eliminate the need for secure key custody, transparency logs, or robust identity infrastructure.

What they do provide is narrower and still important: a runtime primitive for proving that an AI-mediated action was authorized under bounded evidence and policy and can be reconstructed later without collapsing into trust in the model’s opaque interior.


11. Conclusion

Kenshiki’s current runtime already emits signed governance envelopes and already enforces identity- and relationship-aware evidence boundaries through Keycloak, OpenFGA, document identity, and chunk-level tuple scoping. Those facts move the paper out of theory-land. The system already computes most of the hard governance state. What remains is to emit that state as typed, composable authorization records.

That is the role of evidence contracts.

They convert runtime governance from “we logged what happened” into “we can prove what was authorized.” In that form, they provide the missing substrate between AI observability and institutional accountability.


Appendix A. Field Mapping from Production Telemetry to Formal Evidence Contract

The table below demonstrates that each formal contract field is either already present in Kenshiki’s runtime artifacts or trivially derivable from existing system state. Rows marked Gap identify semantics that must be emitted explicitly to move from proto-contract to formal contract.

Current production field / stateProto-contract roleFormal contract fieldStatus
requestId / request_idRequest identityaction.request_idPresent
conversationId / messageIdMessage lineageexecution_context.message_lineageDerivable
startedAt, completedAtTemporal anchoringintegrity.timestamp, execution_context.started_atPresent
assistant response textEmitted action payloadoutcome.emitted_content_ref or payload hashDerivable
gate: PASSGate outcomeoutcome.status / authorization.decisionPresent
integrity gate statusIntegrity decisionauthorization.integrity_gatePresent
SLO gate statusAdvisory runtime stateauthorization.slo_statePresent
model IDRuntime execution identityexecution_context.model_idPresent
build SHARuntime build identityexecution_context.build_shaPresent
provider/runtime metadataExecution contextexecution_context.providerPresent
completion hashArtifact integrityoutcome.completion_hashPresent
system prompt hashPrompt provenanceexecution_context.prompt_hashPresent
signing key ID / sig algSignature metadataintegrity.key_id, integrity.sig_algPresent
tenant attestationTenant scopingprincipal.tenant_id / attestation blockPresent
Keycloak identity statePrincipal identityprincipal.subject, principal.idpDerivable
authorization_context.subjectActing principalprincipal.subjectPresent in schema
authorization_context.acting_asDelegationprincipal.acting_asPresent in schema
authorization_context.purposePurpose bindingprincipal.purposePresent in schema
OpenFGA allow/filter/deny resultAuthorization decisionauthorization.decisionDerivable
relation / tuple basisDecision provenanceauthorization.policy_basisDerivable
OpenFGA model/store versionPolicy provenanceauthorization.engine_versionGap
document IDsResource scoperesource_scope.documentsDerivable
chunk IDsFine-grained resource scoperesource_scope.chunksDerivable
candidate chunksRetrieval universeevidence_boundary.candidate_chunksGap
admitted chunksAdmissible evidence setevidence_boundary.admitted_chunksDerivable
excluded chunksExclusion boundaryevidence_boundary.excluded_chunksGap
fallback reasonOutcome typingoutcome.status / outcome.reasonPresent
response modeResponse classificationoutcome.response_modePresent
evidence itemsEvidence payloadevidence_boundary.evidence_itemsPresent
evidence hash / version IDImmutable evidence bindingevidence_boundary.evidence_hashesPresent when available
policy modeCoarse policy stateauthorization.policy_modePresent
policy bundle/versionExact rule stateauthorization.policy_bundleGap
parent hash referenceContract linkageintegrity.parent_digestGap
chain positionPosition in contract graphintegrity.chain_positionGap
typed action namesStage semanticsaction.typeGap
completeness / degraded boundary stateBoundary qualityevidence_boundary.completenessGap

The highest-value missing fields are therefore not raw governance data, but explicit serialization of semantics the system already computes: typed action names, parent references, policy-bundle versions, candidate versus excluded chunk sets, and completeness state for bounded evidence.


Appendix B. Citation Placeholders

  • Provenance systems and W3C PROV
  • Model Cards and Datasheets
  • NIST AI RMF, ISO 42001, EU AI Act recordkeeping
  • Secure audit logging, history trees, certificate transparency
  • in-toto, Sigstore, transparent attestations
  • OPA, Cedar, and policy decision logs
  • Zanzibar-style ReBAC and OpenFGA lineage

These should be resolved into a full reference list before SSRN submission.