Research Note · Verification · Assurance
Fail-Closed Verification for Weight-Edit Evaluation
A verifier is only useful if it rejects incomplete evidence. InvarLock's verification path is designed to stop stronger claims when the evidence bundle is missing or inconsistent.
Research Note: a clean result should require a complete bundle
Highlights
- Verification is only meaningful if incomplete evidence produces rejection, not a softer-looking pass.
- InvarLock's verify path checks more than schema shape: it also enforces pairing, ratio math, and runtime provenance expectations.
- The runtime manifest matters because container-backed evaluation outputs are supposed to travel with their execution evidence.
One of the easiest ways to make a verification story look better than it is is to let incomplete evidence degrade gracefully. A report is missing a sidecar. A baseline artifact cannot be found. Pairing is weaker than expected. Instead of rejecting the claim, the system simply keeps going and leaves a cleaner-looking result than the evidence deserves.
That is the behavior a serious evaluation boundary should avoid.
Fail-closed verification keeps the claim boundary narrow. If the evidence bundle is incomplete or inconsistent, the stronger claim should stop there rather than being turned into a reassuring summary.
What “Fail Closed” Means Here
In this context, fail closed does not mean “crash on everything.” It means that verification should refuse to validate a report bundle when the conditions required for the public claim are not present.
The CLI reference is explicit about this posture. invarlock verify is not described as a convenience formatter. It verifies report JSONs against schema, pairing math, profile gates, and the required runtime provenance for container-backed evaluation outputs. In CI and Release profiles, that path is intentionally strict.
That strictness is the point. If a result depends on paired evaluation, deterministic provenance, and container-backed execution, then those conditions should not become optional at the last step.
What The Verifier Actually Checks
The reports reference gives a compact view of the verification path:
- schema
- pairing
- ratio math
- measurement contracts
- runtime provenance
That list matters because it separates “the file parsed” from “the evidence is valid enough to support the claim.”
The assurance overview reinforces the same idea from the claim side. It ties public claims to runtime enforcement and observability, not to prose alone. Pairing, bootstrap sanity, deterministic evaluation, and primary-metric contracts are all written as enforceable conditions rather than advisory notes.
Why The Runtime Manifest And Baseline Report Matter
Fail-closed verification is not only about the main report file.
The public docs repeatedly say that container-backed evaluations emit runtime.manifest.json next to evaluation.report.json, and that they should be archived and verified together. The artifact-layout reference makes the same point operational: baseline and subject reports, the evaluation report, and the runtime manifest belong together because verification depends on that evidence surface staying intact.
This keeps the evidence bundle intact for later verification. If the report travels without the runtime provenance sidecar or without the baseline material needed for pairing checks, the system should not quietly pretend the bundle is still complete.
How Hard-Abort Errors Protect The Claim Boundary
The troubleshooting guide states the fail-closed posture in operational terms.
Pairing errors like E001, digest mismatches like E002 or E006, and verification failures such as E601 are not described as minor quality warnings in CI and Release. They are hard-abort conditions. The CLI reference says the same thing at the command level: exit code 3 is reserved for hard aborts when evidence would be invalid.
That detail matters. A verification command that can distinguish “this report parsed” from “this report is valid enough to gate a release” is enforcing a meaningful boundary. A command that normalizes missing or contradictory evidence into a softer pass is mostly doing presentation work.
What Fail-Closed Verification Still Does Not Do
This is still a narrow claim.
Fail-closed verification does not prove the model is good. It does not replace human review. It does not guarantee deployment security or broader governance hygiene. What it does is narrower: it protects the evidence boundary around the specific public claim InvarLock is making.
That is enough. A verifier does not need to solve the whole problem to be useful. It needs to stop weak evidence from being mistaken for strong evidence.
Claim Surface
The practical structure is:
- evaluation emits
evaluation.report.json - container-backed paths also emit
runtime.manifest.json - verification checks schema, pairing, math, and contract consistency
- CI and Release profiles hard-abort when required evidence is missing or mismatched
That is what “fail closed” should mean here: stronger conclusions require complete evidence.
Limitations
- This post explains public verification semantics; it does not add a new empirical study.
- Fail-closed verification improves evidence integrity without proving the broader system is secure.
- The companion figure is a method sketch, not a system-wide certification badge.
Further Reading
- InvarLock docs hub
- Assurance Case Overview
- CLI Reference
- Reports Reference
- Artifact Layout
- Troubleshooting
Source Docs On GitHub
More from the blog
Continue through recent releases and implementation notes.
Synthesis
The Minimum Evidence Surface for Trustworthy Weight-Edit Results
A trustworthy weight-edit result needs more than a benchmark delta. It needs a bounded claim, an exactly paired comparison, and verification that rejects incomplete evidence.
Release
Evidence packs and explicit runtime provenance
InvarLock 0.8.0 moves the public bundle surface to evidence packs, pins docs to versioned release paths, and makes container-vs-host runtime provenance explicit across evaluate and verify.
Release
Tag-based publishing with slimmer release verification
InvarLock 0.7.2 simplifies the public release surface around immutable source tags plus the PyPI wheel and sdist, with docs and verification gates aligned around that path.