Welcome to InvarLock
A quick introduction to InvarLock: evaluate LLM weight edits with statistical guarantees and auditable evidence packs.
Post: What InvarLock is, what it checks, and how to try it.
Highlights
- Evaluate edited weights against a baseline with paired metrics and confidence intervals.
- GuardChain checks for “unsafe to compare” measurement mismatches and quality drift.
- Evidence packs capture the artifacts you need to verify and share results.
If you edit model weights (quantization, pruning, fine-tuning, merges), you eventually hit the same question: did this change silently break anything that matters? “It loads” isn’t enough, and single-number metrics often miss the failure modes you’ll regret later.
InvarLock is designed for that moment. It produces an evaluation report that is both human-readable and machine-verifiable, so you can make upgrade decisions with evidence—not vibes.
Quickstart
Install InvarLock via pip:
pip install "invarlock[hf]"
Run your first evaluation:
INVARLOCK_ALLOW_NETWORK=1 invarlock evaluate \\
--baseline gpt2 \\
--subject gpt2 \\
--adapter auto \\
--profile dev
That produces an evaluation report and (optionally) an evidence pack you can archive, verify, and share.
What’s next
- Design-partner refinement of the private deployment review workflow
- Broader adapter and framework coverage
- Better “what changed?” analytics over time
To go deeper, start with the docs. For questions and feedback, email [email protected].
If your team wants to help shape the private deployment path, start with design partners.