Back to blog

Release

Stronger paired stats + stricter controls when you need them

Ink/charcoal doodle: paired bars pass through stricter controls.

Token-weighted paired bootstrap lands across the pipeline, strictness toggles expand, and CI/release pairing expectations become explicit and enforceable.

1 min read
InvarLock Team

Release: InvarLock 0.3.3 — More explainable failures (and fewer mysteries)

Highlights

  • Token-weighted paired Δlog-loss bootstrap support (core + primary metric + variance guard).
  • Window pairing enforcement becomes more explicit (overlap/duplicates/mismatch detection).
  • Strictness toggles and report metadata improvements for clearer evaluation outcomes.

0.3.3 tightens the statistical backbone of paired evaluation. The paired Δlog-loss bootstrap work isn’t just a “numbers” change—it’s about making drift conclusions more faithful to what was actually evaluated (token-weighted and paired, not loosely aggregated).

It also makes CI/release expectations blunt and explicit: perfect pairing, non-overlapping windows, and coverage floors aren’t “best effort” anymore—they’re enforced. That’s a theme in this release: fewer fuzzy edges, more things you can confidently point to.

And when things do go wrong, reports carry better context (including evaluation soft-fail metadata), which helps turn failures into something you can diagnose instead of something you just re-run blindly.

For more details, see CHANGELOG.md.

More from the blog

Continue through recent releases and implementation notes.