Research Note · Verification · Assurance

Fail-Closed Verification for Weight-Edit Evaluation

A verifier is only useful if it rejects incomplete evidence. InvarLock's verification path is designed to stop stronger claims when the evidence bundle is missing or inconsistent.

April 20, 2026

5 min read

InvarLock Team

Research Note: a clean result should require a complete bundle

Highlights

Verification is only meaningful if incomplete evidence produces rejection, not a softer-looking pass.
InvarLock's verify path checks more than schema shape: it also enforces pairing, ratio math, and runtime provenance expectations.
The runtime manifest matters because container-backed evaluation outputs are supposed to travel with their execution evidence.

One of the easiest ways to make a verification story look better than it is is to let incomplete evidence degrade gracefully. A report is missing a sidecar. A baseline artifact cannot be found. Pairing is weaker than expected. Instead of rejecting the claim, the system simply keeps going and leaves a cleaner-looking result than the evidence deserves.

That is the behavior a serious evaluation boundary should avoid.

Fail-closed verification keeps the claim boundary narrow. If the evidence bundle is incomplete or inconsistent, the stronger claim should stop there rather than being turned into a reassuring summary.

What “Fail Closed” Means Here

In this context, fail closed does not mean “crash on everything.” It means that verification should refuse to validate a report bundle when the conditions required for the public claim are not present.

The CLI reference is explicit about this posture. invarlock verify is not described as a convenience formatter. It verifies report JSONs against schema, pairing math, profile gates, and the required runtime provenance for container-backed evaluation outputs. In CI and Release profiles, that path is intentionally strict.

That strictness is the point. If a result depends on paired evaluation, deterministic provenance, and container-backed execution, then those conditions should not become optional at the last step.

What The Verifier Actually Checks

The reports reference gives a compact view of the verification path:

schema
pairing
ratio math
measurement contracts
runtime provenance

That list matters because it separates “the file parsed” from “the evidence is valid enough to support the claim.”

The assurance overview reinforces the same idea from the claim side. It ties public claims to runtime enforcement and observability, not to prose alone. Pairing, bootstrap sanity, deterministic evaluation, and primary-metric contracts are all written as enforceable conditions rather than advisory notes.

Why The Runtime Manifest And Baseline Report Matter

Fail-closed verification is not only about the main report file.

The public docs repeatedly say that container-backed evaluations emit runtime.manifest.json next to evaluation.report.json, and that they should be archived and verified together. The artifact-layout reference makes the same point operational: baseline and subject reports, the evaluation report, and the runtime manifest belong together because verification depends on that evidence surface staying intact.

This keeps the evidence bundle intact for later verification. If the report travels without the runtime provenance sidecar or without the baseline material needed for pairing checks, the system should not quietly pretend the bundle is still complete.

How Hard-Abort Errors Protect The Claim Boundary

The troubleshooting guide states the fail-closed posture in operational terms.

Pairing errors like E001, digest mismatches like E002 or E006, and verification failures such as E601 are not described as minor quality warnings in CI and Release. They are hard-abort conditions. The CLI reference says the same thing at the command level: exit code 3 is reserved for hard aborts when evidence would be invalid.

That detail matters. A verification command that can distinguish “this report parsed” from “this report is valid enough to gate a release” is enforcing a meaningful boundary. A command that normalizes missing or contradictory evidence into a softer pass is mostly doing presentation work.

What Fail-Closed Verification Still Does Not Do

This is still a narrow claim.

Fail-closed verification does not prove the model is good. It does not replace human review. It does not guarantee deployment security or broader governance hygiene. What it does is narrower: it protects the evidence boundary around the specific public claim InvarLock is making.

That is enough. A verifier does not need to solve the whole problem to be useful. It needs to stop weak evidence from being mistaken for strong evidence.

Claim Surface

The practical structure is:

evaluation emits evaluation.report.json
container-backed paths also emit runtime.manifest.json
verification checks schema, pairing, math, and contract consistency
CI and Release profiles hard-abort when required evidence is missing or mismatched

That is what “fail closed” should mean here: stronger conclusions require complete evidence.

Limitations

This post explains public verification semantics; it does not add a new empirical study.
Fail-closed verification improves evidence integrity without proving the broader system is secure.
The companion figure is a method sketch, not a system-wide certification badge.

Fail-Closed Verification for Weight-Edit Evaluation

Highlights

What “Fail Closed” Means Here

What The Verifier Actually Checks

Why The Runtime Manifest And Baseline Report Matter

How Hard-Abort Errors Protect The Claim Boundary

What Fail-Closed Verification Still Does Not Do

Claim Surface

Limitations

Further Reading

Source Docs On GitHub

More from the blog

The Minimum Evidence Surface for Trustworthy Weight-Edit Results

Evidence packs and explicit runtime provenance

Tag-based publishing with slimmer release verification