Catch hidden regressions before deployment
Catch regressions from quantization, pruning, fine-tuning, or merges before release. InvarLock compares an edited checkpoint against a fixed baseline and emits a verifiable report for CI and review.
Install
$pip install "invarlock[hf]"From evaluation to verification
INVARLOCK v<version> · EVALUATE
Baseline: <checkpoint>
Subject: <checkpoint>
Guards: invariants → spectral → rmt → variance
Output: reports/eval/evaluation.report.json
Verify: invarlock verify reports/eval/evaluation.report.json- Pin paired windows from baseline to subject.
- Compute the primary metric with a confidence interval.
- Verify the report before a release decision is made.
Verify the artifact, not just the claim
Verification re-checks schema integrity, paired-window math, and policy-linked evidence before you promote an edited checkpoint.
Targeted regression detection
Detects targeted regressions under pinned evaluation conditions.
Reproducible evidence
Produces reproducible, machine-readable evidence for review and CI gating.
Explicit boundary
Does not prove global model correctness or the absence of all failures.
Verification command
invarlock verify reports/eval/evaluation.report.json
Checks:
- schema compliance
- paired-window math
- policy + measurement-contract pairingInspect real artifacts
Run, verify, and ship with hard evidence
A deterministic three-step path from checkpoint edit to evaluation report and deployment decision.
Paired windows
Deterministic baseline-vs-subject comparisons stay reproducible across runs.
Guard pipeline
Invariants, spectral, RMT, and variance checks stay in the decision path.
Verification output
Release review gets a report plus an explicit verify step instead of screenshots.
Step 1
Provide your checkpoints
Point InvarLock at your baseline (original) model and subject (edited) checkpoint. Supports HuggingFace models, local files, and more.
invarlock evaluate \
--baseline gpt2 \
--subject gpt2-quantized \
--adapter autoStep 2
Run the evaluation pipeline
InvarLock runs paired evaluation windows through the guard pipeline: invariants, spectral analysis, RMT checks, and variance enforcement.
Status: PASS
Gates: 12/12 passed
Primary metric ratio: 0.98
Confidence interval: [0.96, 1.00]Step 3
Gate your CI pipeline
Use verification exit codes and evaluation reports to gate deployments. Ship with reviewable evidence or catch regressions before they reach production.
invarlock verify reports/eval/evaluation.report.json
if [ $? -eq 0 ]; then
echo "Safe to deploy!"
fiInspect one real evaluation cycle
The CLI runs a baseline-vs-subject comparison, applies guard checks, emits an evaluation report, and produces verification-ready output your release process can enforce.
- Deterministic evaluation: same inputs produce the same decision trail.
- Clear pass/fail status: exit codes, thresholds, and verification signals for CI policy gates.
- Evaluation reports: structured output for automation, audit, and proof-pack handoff.
Need the artifact too?
Inspect example reports or render HTML from evaluation.report.json when you need a reviewer-facing handoff artifact after the CLI run.
Three moments where verifiable evidence matters most
Use the same evaluation report and verification flow when checkpoint edits are fresh, release review is strict, or artifacts must travel beyond the team that created them.
After quantization, pruning, or LoRA
Catch regressions while the change is still reversible
Run paired evaluation immediately after weight edits so silent regressions show up before the checkpoint reaches a release branch or downstream benchmark.
- Deterministic baseline vs subject windows
- Reviewable evaluation report for every edit pass
- Explicit regression boundaries instead of intuition
Before release sign-off
Give reviewers an inspectable artifact trail
Use verification and evaluation reports in CI so release review is based on pinned evidence, not screenshots, dashboards, or informal notebook output.
- Verification step for schema, pairing, and gates
- CI-friendly exit codes for fail-closed promotion
- Shareable evidence for approvers and auditors
When evidence must travel
Bundle reproducibility into proof packs
Package evaluation outputs, verification artifacts, and manifests into proof packs when evidence has to move across teams, environments, or hardware topologies.
- Portable artifact trail for external review
- Proof-pack workflow for reproducible handoff
- Clear separation between guarantees and non-guarantees
Use the CLI now, or join the pilot waitlist
The open-source CLI, docs, and GitHub repo are available now. Use the waitlist below if you want pilot announcements or product updates beyond the current open-source path.
Prefer the open-source path right away?