Welcome to InvarLock

A quick introduction to InvarLock: evaluate LLM weight edits with statistical guarantees and auditable evidence packs.

November 30, 2025

2 min read

InvarLock Team

Post: What InvarLock is, what it checks, and how to try it.

Highlights

Evaluate edited weights against a baseline with paired metrics and confidence intervals.
GuardChain checks for “unsafe to compare” measurement mismatches and quality drift.
Evidence packs capture the artifacts you need to verify and share results.

If you edit model weights (quantization, pruning, fine-tuning, merges), you eventually hit the same question: did this change silently break anything that matters? “It loads” isn’t enough, and single-number metrics often miss the failure modes you’ll regret later.

InvarLock is designed for that moment. It produces an evaluation report that is both human-readable and machine-verifiable, so you can make upgrade decisions with evidence—not vibes.

Quickstart

Install InvarLock via pip:

pip install "invarlock[hf]"

Run your first evaluation:

INVARLOCK_ALLOW_NETWORK=1 invarlock evaluate \\
  --baseline gpt2 \\
  --subject gpt2 \\
  --adapter auto \\
  --profile dev

That produces an evaluation report and (optionally) an evidence pack you can archive, verify, and share.

What’s next

Design-partner refinement of the private deployment review workflow
Broader adapter and framework coverage
Better “what changed?” analytics over time

To go deeper, start with the docs. For questions and feedback, email [email protected].

If your team wants to help shape the private deployment path, start with design partners.