Catch hidden regressions before deployment

Catch regressions from quantization, pruning, fine-tuning, or merges before release. InvarLock compares an edited checkpoint against a fixed baseline and emits a verifiable report for CI and review.

Quickstart View sample report

Install

$pip install "invarlock[hf]"

From evaluation to verification

INVARLOCK v<version> · EVALUATE
Baseline: <checkpoint>
Subject:  <checkpoint>
Guards: invariants → spectral → rmt → variance
Output: reports/eval/evaluation.report.json
Verify: invarlock verify reports/eval/evaluation.report.json

Pin paired windows from baseline to subject.
Compute the primary metric with a confidence interval.
Verify the report before a release decision is made.

Verify the artifact, not just the claim

Verification re-checks schema integrity, paired-window math, and policy-linked evidence before you promote an edited checkpoint.

Targeted regression detection

Detects targeted regressions under pinned evaluation conditions.

Reproducible evidence

Produces reproducible, machine-readable evidence for review and CI gating.

Explicit boundary

Does not prove global model correctness or the absence of all failures.

Verification command

invarlock verify reports/eval/evaluation.report.json

Checks:
  - schema compliance
  - paired-window math
  - policy + measurement-contract pairing

Inspect real artifacts

Example Reports

Reading a report

Proof Packs

Run, verify, and ship with hard evidence

A deterministic three-step path from checkpoint edit to evaluation report and deployment decision.

See the full feature breakdown

Paired windows

Deterministic baseline-vs-subject comparisons stay reproducible across runs.

Guard pipeline

Invariants, spectral, RMT, and variance checks stay in the decision path.

Verification output

Release review gets a report plus an explicit verify step instead of screenshots.

Step 1

Provide your checkpoints

Point InvarLock at your baseline (original) model and subject (edited) checkpoint. Supports HuggingFace models, local files, and more.

invarlock evaluate \
  --baseline gpt2 \
  --subject gpt2-quantized \
  --adapter auto

Step 2

Run the evaluation pipeline

InvarLock runs paired evaluation windows through the guard pipeline: invariants, spectral analysis, RMT checks, and variance enforcement.

Status: PASS
Gates: 12/12 passed
Primary metric ratio: 0.98
Confidence interval: [0.96, 1.00]

Step 3

Gate your CI pipeline

Use verification exit codes and evaluation reports to gate deployments. Ship with reviewable evidence or catch regressions before they reach production.

invarlock verify reports/eval/evaluation.report.json
if [ $? -eq 0 ]; then
  echo "Safe to deploy!"
fi

Inspect one real evaluation cycle

The CLI runs a baseline-vs-subject comparison, applies guard checks, emits an evaluation report, and produces verification-ready output your release process can enforce.

Deterministic evaluation: same inputs produce the same decision trail.
Clear pass/fail status: exit codes, thresholds, and verification signals for CI policy gates.
Evaluation reports: structured output for automation, audit, and proof-pack handoff.

View quickstart guide Open quickstart notebook

invarlock demo

Need the artifact too?

Inspect example reports or render HTML from evaluation.report.json when you need a reviewer-facing handoff artifact after the CLI run.

See example reports

Three moments where verifiable evidence matters most

Use the same evaluation report and verification flow when checkpoint edits are fresh, release review is strict, or artifacts must travel beyond the team that created them.

After quantization, pruning, or LoRA

Catch regressions while the change is still reversible

Run paired evaluation immediately after weight edits so silent regressions show up before the checkpoint reaches a release branch or downstream benchmark.

Deterministic baseline vs subject windows
Reviewable evaluation report for every edit pass
Explicit regression boundaries instead of intuition

Before release sign-off

Give reviewers an inspectable artifact trail

Use verification and evaluation reports in CI so release review is based on pinned evidence, not screenshots, dashboards, or informal notebook output.

Verification step for schema, pairing, and gates
CI-friendly exit codes for fail-closed promotion
Shareable evidence for approvers and auditors

When evidence must travel

Bundle reproducibility into proof packs

Package evaluation outputs, verification artifacts, and manifests into proof packs when evidence has to move across teams, environments, or hardware topologies.

Portable artifact trail for external review
Proof-pack workflow for reproducible handoff
Clear separation between guarantees and non-guarantees

Use the CLI now, or join the pilot waitlist

The open-source CLI, docs, and GitHub repo are available now. Use the waitlist below if you want pilot announcements or product updates beyond the current open-source path.

See real artifacts Read the quickstart

Prefer the open-source path right away?

View on GitHub Read the docs