Strict Assurance Checklist
Plain language: This is the reviewer checklist for deciding whether a strict report and its sibling runtime manifest can be accepted as assurance evidence.
Overview
| Aspect | Details |
|---|---|
| Purpose | Reviewer checklist for accepting strict assurance evidence. |
| Audience | Maintainers, release reviewers, CI gate owners. |
| Contract scope | Current strict assurance behavior, claim set invarlock-weight-edit-regression-v1, report v1. |
| Source of truth | src/invarlock/core/assurance_contract.py, src/invarlock/reporting/verify_contract.py, docs/assurance/14-trust-model.md. |
Use this checklist before accepting a strict report as assurance evidence. When a checkbox cannot be ticked, see Failure Examples for the matching non-pass shape and Troubleshooting for numbered error codes.
Quick Start
invarlock verify --assurance strict reports/eval/evaluation.report.json
A green exit from this command satisfies the report/manifest checks that are machine-checkable from the submitted evidence. The remaining items are reviewer judgment about policy allowances and bundle contents.
Machine-Checked Command Surface
-
invarlock evaluateran with--assurance strictor the default strict mode. -
--profilewasciorrelease. -
--tierwasbalancedorconservative. - Runtime execution was container-backed.
- Unverified provenance was not allowed.
Reviewer-Confirmed Policy Context
- Network and remote-code allowances were reviewed and recorded.
- The original evaluate command and staged bundle contents match the release/review intent.
Guard Chain
- The observed guard chain is exactly:
invariants -> spectral -> rmt -> variance -> invariants. - No guard evidence is missing; the single
invariantsevidence block covers both pre/post invariant stages in the current report contract. - No guard was skipped, duplicated outside the canonical chain, or marked monitor-only for a pass.
- Unsupported guard/model statuses are explicit and block assurance.
Guard Fallback Policy
- Numeric measurement fallbacks are recorded as diagnostics or events; a neutral fallback value alone is not acceptable evidence.
- Spectral estimator failures, non-tensor weights, non-finite weights, and
quantized-weight skips include structured
spectral_sigma_fallback_*diagnostics. - RMT correction failures are emitted as
rmt_correct_failederror events and do not silently erase the original outlier. - Variance guard preparation/finalization failures fail closed unless an explicit monitor-only policy is recorded in the report.
- Reviewer-facing reports expose fallback diagnostics under the relevant guard result, and strict assurance blocks unsupported or degraded guard states.
Metrics And Windows
- Final and baseline paired arrays have equal lengths.
- Window match fraction is
1.0. - Window overlap fraction is
0.0. -
ratio_vs_baselineequals the exponentiated paired delta log-loss. -
display_ciequalsexp(ci)for paired ppl-like metrics. - Bootstrap coverage satisfies the selected tier floor.
Provenance
-
runtime.manifest.jsonis present and verified. - Runtime image provenance is digest-pinned or explicitly non-assurance.
- Tokenizer hash and provider digest match the baseline/subject contract.
- Policy digest and resolved policy are present in the report.
Report Verdict
- Top-level
assurance.modeisstrict. - Generated report has
assurance.verdictset topending_verifier. - Generated report has
assurance.report_local_verdictset topass. - Generated report has
assurance.verified_assurance_verdictset topending. -
assurance.fallback_fields_usedisfalse. -
assurance.runtime_provenance_verifiedisfalsebefore verifier confirmation. -
assurance.blocking_reasonsis empty. -
invarlock verify --assurance strictexits successfully and reportsresults[*].verification.runtime_provenance.status = "verified".
Related Documentation
- Trust Model — Strict pass scope
- Assurance Case Overview — Claims, evidence, and tests
- Runtime Provenance Guide — Manifest requirements
- Failure Examples — Common non-pass shapes
- Troubleshooting — Numbered error codes
- Reports Reference — Full v1 schema
- One Run Lifecycle — Where each gate runs