Everything you need for model change evidence

InvarLock provides baseline-vs-subject evaluation with paired windows, guard checks, evaluation reports, verification, and proof-pack-ready artifacts.

Quickstart Inspect evidence

Why this is different from ordinary eval tooling

The difference is not just another score. InvarLock makes the evaluation output verifiable, portable, and usable in release review.

Verification step

Ordinary evals: Scores and charts without a standard verification step.

InvarLock: Evaluation report plus a verify command for schema, paired-window math, and release checks.

Provenance

Ordinary evals: Evidence often gets spread across CI logs, notebooks, and screenshots.

InvarLock: Pairing data, policy digests, and report metadata travel with the artifact.

Review handoff

Ordinary evals: Harder for reviewers to re-check pairing and release gates.

InvarLock: Proof-pack-ready evidence can move between teams, environments, and release workflows.

Evaluate edited checkpoints

Establish a deterministic baseline-vs-subject comparison before you look at a release verdict.

Guard Pipeline

Multi-layered validation beyond perplexity

Every edited model runs through the guard pipeline before report output is finalized.

What this gives you

Structural invariant checks ensure model architecture integrity
Spectral analysis guards detect stability issues in weight matrices
Random Matrix Theory (RMT) identifies statistical outliers
Variance enforcement catches harmful distribution shifts

Guard output

# Guard pipeline layers
1. Invariants    -> Architecture checks
2. Spectral      -> Stability analysis
3. RMT           -> Outlier detection
4. Variance      -> Distribution guards

Paired Evaluation Windows

Deterministic, reproducible comparisons

InvarLock uses paired evaluation windows with deterministic pairing metadata to ensure statistically valid, reproducible results.

What this gives you

Token-weighted comparisons between baseline and subject
Deterministic pairing ensures reproducibility
Calibrated evaluation windows for accuracy
Paired delta-log-loss metrics with confidence intervals

Pairing record

Baseline: gpt2 -> Subject: gpt2-q4
Windows: <paired windows>
Metric: delta-log-loss (paired)
CI: <confidence interval>

Understand the decision

Move from raw measurements to bounded claims, portable artifacts, and verifiable handoff evidence.

Statistical Boundaries

Quantify degradation without overstating the claim

Get confidence intervals on primary metrics plus explicit decision boundaries so reviewers know what the evidence supports and where it stops.

What this gives you

BCa bootstrap confidence intervals
Regression budget enforcement
Plain-language guarantees and non-guarantees
Primary metric ratio tracking
Policy digest included in evaluation report metadata

Decision summary

INVARLOCK v<version> · EVALUATE
Status: PASS · Gates: <passed>/<total> passed
Primary metric ratio: <ratio>
Output: reports/eval/evaluation.report.json

Evaluation Reports + Verification

Validate artifacts before release

Use evaluation.report.json as a standard handoff artifact, run invarlock verify, and carry the same evidence model into proof-pack workflows.

What this gives you

Machine-readable report schema
Pairing and policy metadata included in output
invarlock verify checks in CI and release profiles
Proof-pack workflows bundle portable evidence for handoff
HTML rendering available with invarlock report html

Artifact handoff

# Validate an evaluation report
invarlock verify reports/eval/evaluation.report.json

# Render report for sharing
invarlock report html \
  -i reports/eval/evaluation.report.json \
  -o reports/eval/evaluation.html

Ship inside your environment

Carry the same evidence model into CI and private infrastructure without changing how teams review release risk.

CI/CD Integration

Gate deployments on robustness checks

Seamlessly integrate InvarLock into your deployment pipelines with stable exit codes and structured JSON outputs.

What this gives you

Exit code 0 for passing evaluations (safe to deploy)
Machine-readable evaluation.report.json outputs
CI/CD pipeline examples for GitHub Actions and Jenkins
Automated quality gates for release review

Pipeline gate

# In your CI pipeline
invarlock evaluate \
  --baseline base.pt \
  --subject quantized.pt

if [ $? -eq 0 ]; then
  echo "Safe to deploy"
  ./deploy.sh
fi

Offline-First Design

Your models stay private

Network access is disabled by default. Enable downloads explicitly per command to ensure your model weights never leave your infrastructure.

What this gives you

Network disabled by default
Explicit opt-in for downloads with INVARLOCK_ALLOW_NETWORK=1
On-premise deployment ready
Air-gapped environment support

Network policy

# Network disabled (default)
invarlock evaluate --baseline ./local.pt

# Explicit network enable
INVARLOCK_ALLOW_NETWORK=1 \
  invarlock evaluate --baseline gpt2

Use the CLI now, or join the pilot waitlist

The open-source CLI, docs, and GitHub repo are available now. Use the waitlist below if you want pilot announcements or product updates beyond the current open-source path.

See real artifacts Read the quickstart

Prefer the open-source path right away?

View on GitHub Read the docs