Synthesis
The Minimum Evidence Surface for Trustworthy Weight-Edit Results
A trustworthy weight-edit result needs more than a benchmark delta. It needs a bounded claim, an exactly paired comparison, and verification that rejects incomplete evidence.
Updates from the InvarLock project, including release changes, CLI updates, docs changes, and practical notes on how to read or ship evaluation evidence.
Latest post: April 27, 2026 across 8 active tags.
Synthesis
A trustworthy weight-edit result needs more than a benchmark delta. It needs a bounded claim, an exactly paired comparison, and verification that rejects incomplete evidence.
Release
InvarLock 0.8.0 moves the public bundle surface to evidence packs, pins docs to versioned release paths, and makes container-vs-host runtime provenance explicit across evaluate and verify.
Research Note
A verifier is only useful if it rejects incomplete evidence. InvarLock's verification path is designed to stop stronger claims when the evidence bundle is missing or inconsistent.
Release
InvarLock 0.7.2 simplifies the public release surface around immutable source tags plus the PyPI wheel and sdist, with docs and verification gates aligned around that path.
Research Note
A model-edit benchmark number is only as strong as the comparison behind it. Pairing makes the comparison inspectable.
Release
InvarLock 0.7.1 makes wheel-only verify/report workflows first-class, ships a public contract bundle, and tightens supply-chain and release-validation gates.
Release
InvarLock 0.7.0 adds first-class GPT-OSS support, pilot Ministral 3 8B/14B presets, and a CUDA-capable attested runtime path for GPU hosts.
Research Note
A narrow claim can be stronger than a broad one. InvarLock is about auditable regression risk from weight edits, not general model safety.
Release
InvarLock 0.6.0 adds a shipped Gemma 4 E2B text lane, phase-1 multimodal evaluation, and a unified `--assurance attested|trusted-local` workflow.
Release
InvarLock 0.5.1 adds a push-gated tiny attested smoke lane, a scheduled GPT-2 canary lane, and package-native Ed25519 proof-pack signatures.
Release
InvarLock 0.5.0 adds offline release-verification bundles, package-native proof-pack verification, and a simplified public CLI centered on evaluate, verify, and report.
Release
InvarLock 0.4.0 stabilizes contracts around policies, proof packs, and evaluation provenance while tightening verification, CI, and coverage enforcement.
Release
Split-module coverage thresholds now protect critical CLI/reporting paths while config, plugin, report, overhead, and observability edge cases fail closed more reliably.
Release
A focused hardening release: safer AWQ plugin discovery, stronger quantization clipping behavior, and broader report-schema acceptance for edge payloads.
Release
Proof packs add new showcase and evidence artifacts, while CI and release flows become more deterministic and easier to validate repeatedly.
Release
A stability-focused release: cleaner report output, safer offline proof-pack flows, and CI/test hardening after the report rename.
Release
A terminology reset (report/evaluate), stricter proof-pack verification, and a clean upgrade path for Hugging Face Transformers v5.
Release
Adapters move to role-based routing, proof packs become easier to inspect (v2 layout), and reporting output gets a readability upgrade.
Release
Reports now record and enforce estimator measurement contracts under CI/release profiles, and proof pack suites can cleanly split calibration vs execution.
Release
Proof packs gain a deterministic bash test suite and better runtime helpers, window selection becomes stable/offline, and perplexity runs get safer around bad token IDs.
Release
CI/release baseline pairing is fail-closed (pairing evidence is required), and adapters reduce peak memory during retries via chunked snapshot/restore.
Release
Token-weighted paired bootstrap lands across the pipeline, strictness toggles expand, and CI/release pairing expectations become explicit and enforceable.
Release
`invarlock calibrate` arrives, determinism utilities mature, and regression harness + golden tracking help prevent silent policy drift.
Release
Fixes a GPU memory leak during reload fallback, hardens B200 scripts, and adds practical controls for acceptance ranges and overhead measurement.
Release
First-class quantization metadata, safer device movement across quantized models, auto-routing based on checkpoint info, and major test coverage expansion.
Release
The initial public release on GitHub and PyPI: core evaluate pipeline, guard chain, schema v1, and the first docs/CLI surface.
Announcement
A quick introduction to InvarLock: evaluate LLM weight edits with statistical guarantees and auditable proof packs.