Back to blog

What Evidence Packs Still Do Not Prove

Ink/charcoal doodle: a verified evidence pack stops at a scope boundary while smaller unanswered question cards sit outside it.

Even a strict, signed, PASS evidence pack is strong evidence packaging. It still does not answer every scientific, safety, or deployment question a reader might wish it answered.

3 min read
InvarLock Team

Evidence Note: stronger packaging should narrow overclaim, not invite it

Highlights

  • Evidence packs are strong at portability, integrity, and package-level re-verification.
  • They are not universal proof of representativeness, safety, or deployment readiness.
  • Saying that plainly makes the evidence story stronger, not weaker.

The June evidence-pack post, Evidence Packs, Not Screenshots, argued for portable evidence over screenshot-style presentation. The archive synthesis then showed how that portability fits into a re-checkable model-edit decision.

That argument still stands.

This follow-up matters because strong packaging can create a second kind of risk: readers start attributing guarantees to the pack that the pack never actually made.

The public InvarLock docs are narrower than that, and the blog should stay aligned with them.

What Evidence Packs Really Secure

The evidence-pack docs make a real promise. A pack can carry reports, a reviewer summary, manifest, checksums, a package-native signature bundle, and a package-native verification path. Current package-native verification expects a signed manifest.signature.json, validates checksum binding and digest-backed references, checks checksums.sha256, and runs nested report verification. Strict mode adds no-extra-files discipline and strict report assurance.

That is meaningful. It means the evidence is portable, structured, and checkable after the original run environment is gone.

It is worth saying this first because the limitations only make sense if the genuine strengths are kept visible.

What Evidence Packs Do Not Prove

The assurance case overview gives the cleanest boundary.

InvarLock's scope is intentionally about regression risk from weight edits under a chosen baseline, configuration, and documented tiers. It is explicitly not a universal claim about content safety, alignment, prompt-level attacks, deployment hardening, or arbitrary generalization outside the documented support surface.

An evidence pack does not widen that scope. It packages and verifies evidence inside that scope. It does not silently answer questions that the underlying assurance case declares out of scope.

Why Underlying Report Quality Still Matters

Even a strong pack depends on what it contains.

If the bundled reports are narrow, underpowered, or mismatched to the question a reviewer cares about, packaging them well does not change that fact. A strict evidence pack can show that the bundle is intact and re-verifiable. It cannot turn a weak underlying evaluation into a strong scientific result.

This is the central conceptual distinction: package integrity is not the same thing as claim sufficiency.

Why This Distinction Improves Credibility

There is a clear upside to stating the limit plainly.

If a team acts as though evidence packs prove everything, the next skeptical reader will find the boundary anyway and trust the rest less. If the team says up front that evidence packs secure portability, integrity, and re-verification inside a narrow scope, the evidence story becomes more durable.

This is one of the places where restraint is a direct credibility gain.

Practical Reading Rule

When you receive an evidence pack, ask two separate questions:

  • what does the pack let me verify about integrity, provenance, and report structure?
  • what does the underlying evaluation still leave unproven?

Those questions belong together, but they are not the same question.

Limitations

Sources