Deterministic evidence packs and safer perplexity runs
Evidence packs gain a deterministic bash test suite and better runtime helpers, window selection becomes stable/offline, and perplexity runs get safer around bad token IDs.
Release: InvarLock 0.3.5 - Offline window selection, runtime helpers, and token-ID guards
Highlights
- Evidence pack bash test suite + runtime helpers for capturing artifacts during long runs.
- WikiText-2 window stratification switches to a deterministic offline byte-level n-gram scorer.
- Perplexity sanitizes out-of-range token IDs, while B200 defaults reduce queue and cache friction.
0.3.5 is a trust-the-machinery release. Evidence packs now have their own bash test suite with deterministic command mocks and optional coverage checks, which is exactly the kind of unglamorous work that prevents subtle breakage later. Runtime helpers and pack build/verify helpers also make it easier to capture the right artifacts during long runs without improvising ad hoc scripts.
On evaluation stability: window stratification for WikiText-2 moves to a deterministic offline scorer. That keeps window selection consistent across model families and avoids implicit downloads—two things that matter a lot when you’re trying to make runs comparable.
Perplexity evaluation is more defensive too, masking out-of-range token IDs rather than triggering device-side asserts. On B200 workflows, dynamic scheduling becomes the only validation path, dependency promotion is centralized to reduce queue lock contention, generated configs avoid slow CPU spectral calibration by default, and Hugging Face caches move under the work directory to avoid small root partitions. The old INVARLOCK_SCORES_BATCH_SIZE variable is removed because the new scorer no longer batches on device.
For the immutable release record, read the tagged CHANGELOG.md for v0.3.5.
More in Release
Continue through nearby posts in the same reading thread.
Release
Measurement contracts for CI and release verification
Reports now record and enforce estimator measurement contracts under CI/release profiles, and evidence pack suites can cleanly split calibration vs execution.
Release
Fail-closed baseline pairing with lower-memory retries
CI/release baseline pairing is fail-closed (pairing evidence is required), and adapters reduce peak memory during retries via chunked snapshot/restore.
Release
Evidence packs v2 and role-based adapter routing
Adapters move to role-based routing, evidence packs become easier to inspect (v2 layout), and reporting output gets a readability upgrade.