Everything you need for model change evidence
InvarLock provides baseline-vs-subject evaluation with paired windows, guard checks, evaluation reports, verification, and proof-pack-ready artifacts.
Why this is different from ordinary eval tooling
The difference is not just another score. InvarLock makes the evaluation output verifiable, portable, and usable in release review.
Evaluate edited checkpoints
Establish a deterministic baseline-vs-subject comparison before you look at a release verdict.
Guard Pipeline
Multi-layered validation beyond perplexity
Every edited model runs through the guard pipeline before report output is finalized.
What this gives you
- Structural invariant checks ensure model architecture integrity
- Spectral analysis guards detect stability issues in weight matrices
- Random Matrix Theory (RMT) identifies statistical outliers
- Variance enforcement catches harmful distribution shifts
Guard output
# Guard pipeline layers
1. Invariants -> Architecture checks
2. Spectral -> Stability analysis
3. RMT -> Outlier detection
4. Variance -> Distribution guardsPaired Evaluation Windows
Deterministic, reproducible comparisons
InvarLock uses paired evaluation windows with deterministic pairing metadata to ensure statistically valid, reproducible results.
What this gives you
- Token-weighted comparisons between baseline and subject
- Deterministic pairing ensures reproducibility
- Calibrated evaluation windows for accuracy
- Paired delta-log-loss metrics with confidence intervals
Pairing record
Baseline: gpt2 -> Subject: gpt2-q4
Windows: <paired windows>
Metric: delta-log-loss (paired)
CI: <confidence interval>Understand the decision
Move from raw measurements to bounded claims, portable artifacts, and verifiable handoff evidence.
Statistical Boundaries
Quantify degradation without overstating the claim
Get confidence intervals on primary metrics plus explicit decision boundaries so reviewers know what the evidence supports and where it stops.
What this gives you
- BCa bootstrap confidence intervals
- Regression budget enforcement
- Plain-language guarantees and non-guarantees
- Primary metric ratio tracking
- Policy digest included in evaluation report metadata
Decision summary
INVARLOCK v<version> · EVALUATE
Status: PASS · Gates: <passed>/<total> passed
Primary metric ratio: <ratio>
Output: reports/eval/evaluation.report.jsonEvaluation Reports + Verification
Validate artifacts before release
Use evaluation.report.json as a standard handoff artifact, run invarlock verify, and carry the same evidence model into proof-pack workflows.
What this gives you
- Machine-readable report schema
- Pairing and policy metadata included in output
- invarlock verify checks in CI and release profiles
- Proof-pack workflows bundle portable evidence for handoff
- HTML rendering available with invarlock report html
Artifact handoff
# Validate an evaluation report
invarlock verify reports/eval/evaluation.report.json
# Render report for sharing
invarlock report html \
-i reports/eval/evaluation.report.json \
-o reports/eval/evaluation.htmlShip inside your environment
Carry the same evidence model into CI and private infrastructure without changing how teams review release risk.
CI/CD Integration
Gate deployments on robustness checks
Seamlessly integrate InvarLock into your deployment pipelines with stable exit codes and structured JSON outputs.
What this gives you
- Exit code 0 for passing evaluations (safe to deploy)
- Machine-readable evaluation.report.json outputs
- CI/CD pipeline examples for GitHub Actions and Jenkins
- Automated quality gates for release review
Pipeline gate
# In your CI pipeline
invarlock evaluate \
--baseline base.pt \
--subject quantized.pt
if [ $? -eq 0 ]; then
echo "Safe to deploy"
./deploy.sh
fiOffline-First Design
Your models stay private
Network access is disabled by default. Enable downloads explicitly per command to ensure your model weights never leave your infrastructure.
What this gives you
- Network disabled by default
- Explicit opt-in for downloads with INVARLOCK_ALLOW_NETWORK=1
- On-premise deployment ready
- Air-gapped environment support
Network policy
# Network disabled (default)
invarlock evaluate --baseline ./local.pt
# Explicit network enable
INVARLOCK_ALLOW_NETWORK=1 \
invarlock evaluate --baseline gpt2Use the CLI now, or join the pilot waitlist
The open-source CLI, docs, and GitHub repo are available now. Use the waitlist below if you want pilot announcements or product updates beyond the current open-source path.
Prefer the open-source path right away?