Deterministic evidence packs and safer perplexity runs

Evidence packs gain a deterministic bash test suite and better runtime helpers, window selection becomes stable/offline, and perplexity runs get safer around bad token IDs.

January 2, 2026

2 min read

InvarLock Team

Release: InvarLock 0.3.5 - Offline window selection, runtime helpers, and token-ID guards

Highlights

Evidence pack bash test suite + runtime helpers for capturing artifacts during long runs.
WikiText-2 window stratification switches to a deterministic offline byte-level n-gram scorer.
Perplexity sanitizes out-of-range token IDs, while B200 defaults reduce queue and cache friction.

0.3.5 is a trust-the-machinery release. Evidence packs now have their own bash test suite with deterministic command mocks and optional coverage checks, which is exactly the kind of unglamorous work that prevents subtle breakage later. Runtime helpers and pack build/verify helpers also make it easier to capture the right artifacts during long runs without improvising ad hoc scripts.

On evaluation stability: window stratification for WikiText-2 moves to a deterministic offline scorer. That keeps window selection consistent across model families and avoids implicit downloads—two things that matter a lot when you’re trying to make runs comparable.

Perplexity evaluation is more defensive too, masking out-of-range token IDs rather than triggering device-side asserts. On B200 workflows, dynamic scheduling becomes the only validation path, dependency promotion is centralized to reduce queue lock contention, generated configs avoid slow CPU spectral calibration by default, and Hugging Face caches move under the work directory to avoid small root partitions. The old INVARLOCK_SCORES_BATCH_SIZE variable is removed because the new scorer no longer batches on device.

For the immutable release record, read the tagged CHANGELOG.md for v0.3.5.

Deterministic evidence packs and safer perplexity runs

Highlights

More in Release

Measurement contracts for CI and release verification

Fail-closed baseline pairing with lower-memory retries

Evidence packs v2 and role-based adapter routing