Release
Token-weighted paired statistics and stricter release gates
Token-weighted paired bootstrap lands across the pipeline, strictness toggles expand, and CI/release pairing expectations become explicit and enforceable.
Release: InvarLock 0.3.3 - Paired bootstrap, strictness toggles, and clearer failures
Highlights
- Token-weighted paired Δlog-loss bootstrap support (core + primary metric + variance guard).
- Window pairing enforcement becomes more explicit (overlap/duplicates/mismatch detection).
- Strictness toggles and report metadata improvements for clearer evaluation outcomes.
0.3.3 tightens the statistical backbone of paired evaluation. The paired Δlog-loss bootstrap work isn’t just a “numbers” change—it’s about making drift conclusions more faithful to what was actually evaluated (token-weighted and paired, not loosely aggregated).
It also makes CI/release expectations blunt and explicit: perfect pairing, non-overlapping windows, and coverage floors aren’t “best effort” anymore—they’re enforced. That’s a theme in this release: fewer fuzzy edges, more things you can confidently point to.
And when things do go wrong, reports carry better context (including evaluation soft-fail metadata), which helps turn failures into something you can diagnose instead of something you just re-run blindly.
For the immutable release record, read the tagged CHANGELOG.md for v0.3.3.
More from the blog
Continue through recent releases and implementation notes.
Synthesis
The Minimum Evidence Surface for Trustworthy Weight-Edit Results
A trustworthy weight-edit result needs more than a benchmark delta. It needs a bounded claim, an exactly paired comparison, and verification that rejects incomplete evidence.
Release
Evidence packs and explicit runtime provenance
InvarLock 0.8.0 moves the public bundle surface to evidence packs, pins docs to versioned release paths, and makes container-vs-host runtime provenance explicit across evaluate and verify.
Research Note
Fail-Closed Verification for Weight-Edit Evaluation
A verifier is only useful if it rejects incomplete evidence. InvarLock's verification path is designed to stop stronger claims when the evidence bundle is missing or inconsistent.