Getting Started

Overview

AspectDetails
PurposeInstall InvarLock and complete the core evaluate → verify → report flow.
AudienceNew users setting up their first local or CI evaluation.
Python3.12+ recommended (CI uses 3.13).
Installpip install invarlock for verification/reporting; add invarlock[hf] only for Hugging Face-backed evaluation.
Next stepQuickstart for copy-paste commands.

This guide covers installation, environment setup, and the smallest useful InvarLock workflow: compare a baseline against a subject, verify the container-backed report, and render HTML for review. The same top-level loop also underpins the included image-text path when you use the explicit multimodal preset and provider configuration. The minimal install is enough for doctor, verify, and report html; use invarlock[hf] only when you need evaluate to load Hugging Face models. Treat evaluate -> verify -> report html as the first path to get green before you reach for deeper report-analysis commands.

Install InvarLock

# Minimal core (no torch; CLI + schema/verification tools)
pip install invarlock

# Recommended for model-loading and evaluation workflows
pip install "invarlock[hf]"

# Full extras bundle
pip install "invarlock[all]"

Install via pipx

pipx install --python python3.12 "invarlock[hf]"

Initialize Environment

conda create -n invarlock python=3.12 -y
conda activate invarlock
pip install "invarlock[hf]"

Verify Installation

invarlock doctor

Network Access

InvarLock blocks outbound network by default. When you need to download models or datasets, opt in per command with --allow-network:

invarlock evaluate --allow-network \
  --baseline gpt2 \
  --subject distilgpt2 \
  --adapter auto \
  --profile ci

For offline use, pre-download assets and enforce offline reads with HF_DATASETS_OFFLINE=1. You can also relocate your Hugging Face cache via HF_HOME and HF_DATASETS_CACHE.

First Evaluation

The default evaluate path runs model-loading steps inside the runtime container and emits runtime.manifest.json beside the evaluation report.

INVARLOCK_DEDUP_TEXTS=1 invarlock evaluate --allow-network \
  --baseline gpt2 \
  --subject distilgpt2 \
  --adapter auto \
  --profile ci \
  --report-out reports/eval

Repo maintainers can still add --preset configs/... when they intentionally want a repo-owned preset, but the wheel-first onboarding path should start with direct flags and the built-in adapter defaults.

Verify And Render

invarlock verify reports/eval/evaluation.report.json
invarlock report html -i reports/eval/evaluation.report.json -o reports/eval/evaluation.html

These commands validate the paired math, schema, and runtime provenance, then render a shareable HTML artifact from the same report.

Artifact model:

ArtifactProduced byPrimary consumers
evaluation.report.jsoninvarlock evaluate, invarlock report generate --format reportinvarlock verify, invarlock report html, invarlock report validate, invarlock report explain --evaluation-report, invarlock advanced runtime-verify
report.jsonBaseline/subject run directories under runs/...invarlock report generate, invarlock report explain --subject-report ... --baseline-report ...

Execution Modes

  • evaluate defaults to the runtime container (--execution-mode container).
  • Use --execution-mode host only for host-side workflows that intentionally bypass container execution.
  • verify expects runtime.manifest.json next to container-backed evaluation reports.

Learning Paths

PersonaPath
First-time userGetting Started → QuickstartCompare & evaluate
Python developerGetting Started → Primary Metric SmokeAPI Guide
Custom data userGetting Started → Bring Your Own DataConfig Gallery
Validation engineerGetting Started → Evidence PacksEvidence Packs Internals
Security auditorGetting Started → Threat ModelBest Practices

Advanced Workflows

The simplified public CLI keeps the core path at the top level. Non-core surfaces live under invarlock advanced:

  • invarlock advanced evidence-pack ...
  • invarlock advanced policy ...
  • invarlock advanced plugins ...
  • invarlock advanced calibrate ...

Installed packages also include the evidence-pack verifier, so bundles can be inspected without cloning the repository:

invarlock advanced evidence-pack verify <pack> --strict

Optional adapter and backend installs use Python extras such as pip install "invarlock[awq,gptq]"; they are not managed through CLI install or uninstall commands. On Python 3.13+ stacks, gptq may still require a vendor wheel or a supported older interpreter because upstream auto-gptq packaging remains narrower than the core InvarLock support matrix.

Device Support

InvarLock defaults to --device auto, probing CUDA → MPS → CPU in that order. All guard calculations and reports are device-agnostic; CUDA is recommended for larger release-tier workloads, while CPU and MPS remain useful for local smoke and portability runs.

  • invarlock doctor reports detected accelerators.
  • Use --device cpu to force portability runs.
  • Use --profile ci_cpu for a reduced-window CPU preset when you need a fast validation lane.

Next Steps

I want to...Start here
evaluate my own edited checkpoint workflowCompare & evaluate (BYOE)
understand the CLI commandsQuickstart
bring my own evaluation datasetBring Your Own Data
see example outputsExample Reports
understand what's in a reportReading a report
use InvarLock programmaticallyAPI Guide
understand the assurance scopeAssurance Case
set up secure production deploymentSecurity Best Practices