CLI Reference

Overview

AspectDetails
PurposeCommand-line interface for evaluation, verification, reporting, and advanced maintenance flows.
AudienceOperators running InvarLock from a terminal or CI.
Primary commandsevaluate, verify, report, doctor, advanced, version.
Runtime verifierinvarlock advanced runtime-verify for direct runtime manifest checks.
Requiresinvarlock[hf] for model-loading workflows; extra backends are installed via Python extras.
NetworkOffline by default; use evaluate --allow-network when a run needs model or dataset downloads.
Source of truthsrc/invarlock/cli/app.py, src/invarlock/cli/commands/*.py.

Most users only need a narrow top-level surface:

  1. invarlock evaluate
  2. invarlock verify
  3. invarlock report html

Everything else is either diagnostics (doctor) or explicitly advanced (invarlock advanced ...).

First-Touch Surfaces

These entrypoints are the ones users hit first when orienting themselves in a fresh install or wheel-only environment:

SurfaceWhy it matters
invarlock --helpTop-level discovery of the supported public command set
invarlock --versionConfirms the installed package and schema pairing
invarlock report --helpShows the report subcommands without requiring run artifacts
invarlock advanced --helpLists the advanced maintenance namespace before drilling into subcommands
invarlock advanced calibrate --helpEstablishes that calibration lives under advanced rather than the core loop
invarlock advanced runtime-verify --helpWheel-native runtime-manifest verification for existing report bundles

Quick Start

# Install the Hugging Face-backed evaluation stack
pip install "invarlock[hf]"

# Compare a baseline against a subject
invarlock evaluate --allow-network \
  --baseline gpt2 \
  --subject distilgpt2 \
  --baseline-adapter auto --subject-adapter auto \
  --profile ci \
  --assurance strict

# Validate the container-backed evaluation bundle
invarlock verify --assurance strict reports/eval/evaluation.report.json

# Render shareable HTML
invarlock report html -i reports/eval/evaluation.report.json -o reports/eval/evaluation.html
invarlock report explain --evaluation-report reports/eval/evaluation.report.json
invarlock report export -i reports/eval/evaluation.report.json --format mlflow-tags

Security Defaults

  • evaluate defaults to --execution-mode container, which delegates model-loading work into the runtime container.
  • evaluate defaults to --assurance strict, which requires CI/release profile, balanced/conservative tier, canonical guard order, complete evidence, and verified runtime provenance.
  • Use --execution-mode host only for host-side workflows that intentionally bypass the container boundary. Host mode is non-assurance unless --assurance off is explicit.
  • verify expects runtime.manifest.json beside container-backed evaluation outputs and fails closed when required runtime provenance is missing.
  • verify --assurance report is the default: strict is enforced when the report claims strict. Use verify --assurance strict to require strict on every report input.
  • Network access remains opt-in through evaluate --allow-network.

Task To Command Map

TaskCommandOutput
Compare baseline vs subjectinvarlock evaluatereports/eval/evaluation.report.json plus runtime.manifest.json for container-backed runs
Validate an evaluation reportinvarlock verifyExit code plus human or JSON verification output
Render HTML from an evaluation reportinvarlock report htmlHTML file
Explain gate decisions from an evaluation bundle or explicit run reportsinvarlock report explainHuman-readable explanation
Export evidence to CI and registry handoff formatsinvarlock report exportMLflow tag JSON, model-card Markdown, or release-review Markdown
Inspect environment healthinvarlock doctorHuman or JSON diagnostics
Evidence-pack, policy, plugin, or calibration workflowsinvarlock advanced ...Advanced artifacts and diagnostics

Artifact Outputs Matrix

CommandWrites runs/Writes reports/Notes
invarlock evaluateYes (--out, default runs/)Yes (--report-out, default reports/eval)Produces the paired evaluation report bundle
invarlock verifyNoNoReads existing evaluation report JSON
invarlock report htmlNoYes (--output)Renders HTML from an existing report
invarlock report explainNoNoExplains evaluation.report.json directly; also accepts explicit --subject-report and --baseline-report when you need to rebuild from raw run artifacts
invarlock report exportNoOptional (--output)Exports MLflow tags, model-card Markdown, or release-review Markdown from an existing evaluation report
invarlock doctorNoNoDiagnostics only
invarlock advanced evidence-packDepends on subcommandDepends on subcommandAdvanced evidence packaging
invarlock advanced policyDepends on subcommandNoAdvanced policy-pack tooling
invarlock advanced pluginsNoNoRead-only plugin discovery and explanation
invarlock advanced calibrateYesYesAdvanced tier-policy calibration workflows

Top-Level Command Index

CommandPurpose
invarlock evaluateCompare baseline and subject checkpoints with deterministic pairing
invarlock verifyVerify evaluation reports against schema, pairing, and runtime provenance rules
invarlock reportExplain, render, and validate existing report artifacts
invarlock doctorDiagnose environment and configuration issues
invarlock advancedAdvanced evidence-pack, policy, plugin, and calibration workflows
invarlock versionShow the installed version
invarlock advanced runtime-verifyVerify an evaluation report against its sibling runtime.manifest.json

Exit codes: 0=success, 1=generic failure, 2=usage/schema/config failure, 3=hard abort for profile-aware fail-closed paths.

Stable vs Experimental Commands

Stability classCommandsContract
Stable core workflowinvarlock evaluate, invarlock verify, invarlock report html, invarlock report explain, invarlock report export, invarlock report validate, invarlock doctor, invarlock versionDocumented command names, documented options, exit-code meaning, and artifact paths are stable within the current CLI policy.
Stable JSON automationinvarlock doctor --json, invarlock verify --json, invarlock advanced runtime-verify --json, invarlock advanced plugins list --json, invarlock advanced plugins adapters --json, invarlock advanced evidence-pack verify --json, invarlock advanced policy verify --jsonRequired envelope fields and format_version values are stable; optional fields are additive.
Stable advanced verifiersinvarlock advanced runtime-verify, invarlock advanced evidence-pack inspect, invarlock advanced evidence-pack verify, invarlock advanced policy build, invarlock advanced policy verify, invarlock advanced plugins list, invarlock advanced plugins adaptersPublic operational commands outside the core user loop. Their documented behavior is maintained, while additional subcommands may evolve faster.
Experimental or maintainer-onlyinvarlock advanced calibrate, repo scripts under scripts/, package-internal config runners, undocumented flags, and local harness entrypointsUseful for development, calibration, and release work; not covered by the public CLI stability contract until promoted here.

invarlock evaluate

Purpose: compare a baseline against a subject and emit an evaluation report.

Common options:

  • --baseline: baseline checkpoint path or model ID
  • --subject: subject checkpoint path or model ID
  • --baseline-report: reuse a stored baseline report by passing the explicit report.json file path that captured the baseline windows. Reused reports must match the requested baseline model, profile, tier, adapter family, assurance mode, and dataset/window-plan fields.
  • --baseline-adapter: baseline-side adapter name or auto
  • --subject-adapter: subject-side adapter name or auto
  • --profile: ci, release, or another included profile
  • --tier: tier label for policy context
  • --preset: optional repo preset path
  • --out: run-artifact directory
  • --report-out: evaluation report directory
  • --execution-mode container|host: execution policy for evaluate. container keeps model loading inside the runtime container; host allows host-side execution and produces host artifacts that should be verified with verify --runtime-provenance host --assurance off.
  • --assurance strict|off: strict is the default assurance contract; off is for exploratory/dev reports outside the assurance-evidence surface.
  • --edit-config: optional demo/smoke edit overlay such as quant_rtn

Example:

INVARLOCK_DEDUP_TEXTS=1 invarlock evaluate --allow-network \
  --baseline gpt2 \
  --subject distilgpt2 \
  --baseline-adapter auto --subject-adapter auto \
  --profile ci \
  --report-out reports/eval

invarlock verify

Purpose: verify existing evaluation report JSON files end to end. This command checks report schema, primary-metric recomputation, paired-window consistency, policy gates, strict-assurance claims when requested, and runtime provenance through the report's sibling runtime.manifest.json.

Arguments:

  • REPORTS...: one or more evaluation report JSON paths or directories containing canonical evaluation.report.json

Common options:

  • --baseline: optional baseline report for comparison flows
  • --tolerance: float tolerance for recompute checks
  • --profile: profile-aware validation mode
  • --assurance report|strict|off: report enforces strict only for reports claiming strict; strict requires every input to claim and pass strict; off skips strict assurance policy checks.
  • --warning-policy pass|fail: keep guard warnings advisory (pass, default) or fail verification when baseline-relative guard warnings are present (fail).
  • --fail-on-warnings: alias for --warning-policy fail.
  • --runtime-provenance container|host: runtime provenance policy for the supplied report artifacts
  • --json: emit a single JSON envelope

Example:

invarlock verify --json reports/eval/evaluation.report.json

Use strict warning mode when you want to fail an otherwise policy-passing edit because a guard signal changed relative to the baseline:

invarlock verify --fail-on-warnings reports/eval/evaluation.report.json

invarlock report

Purpose: operate on existing report artifacts through explicit subcommands.

Core subcommands:

  • invarlock report generate
    • Generate human-readable report output from existing run reports
    • Options: --run, --compare-run-report, --baseline-run-report, --format, --output
  • invarlock report html
    • Render an evaluation report to HTML
    • Options: -i/--input, -o/--output, --embed-css, --force
  • invarlock report explain
    • Explain gates and primary-metric behavior from the preferred evaluation bundle input, or from explicit subject/baseline run reports when needed
    • Options: --evaluation-report, --subject-report, --baseline-report
  • invarlock report export
    • Export an existing evaluation report for CI and registry handoff surfaces
    • Formats: mlflow-tags, model-card-md, release-review-md
    • Options: -i/--evaluation-report, --format, -o/--output, --policy-profile, --report-url, --evidence-url, --verify-result, --force
    • --verify-result uses only the verifier result item whose id matches the resolved evaluation report path; stale verifier JSON is rejected.
  • invarlock report validate
    • Validate a report JSON against the v1 schema
  • Directory inputs are command-specific:
    • report generate and report explain accept directories containing canonical report.json
    • report html and report validate accept directories containing canonical evaluation.report.json
    • report explain --evaluation-report accepts directories containing canonical evaluation.report.json
    • verify accepts directories containing canonical evaluation.report.json and optional baselines containing canonical report.json or evaluation.report.json
      • A directory containing only report.json is a raw run directory, not a verifier bundle. Generate evaluation.report.json first with invarlock report generate --run <subject report.json> --baseline-run-report <baseline report.json> --format report -o <output-dir>.
      • If a directory contains both canonical filenames, it is ambiguous and rejected; pass the exact file path instead.

Example:

invarlock report html -i reports/eval/evaluation.report.json -o reports/eval/evaluation.html
invarlock report explain --evaluation-report reports/eval/evaluation.report.json
invarlock report explain \
  --subject-report runs/subject/report.json \
  --baseline-report runs/baseline/report.json

invarlock doctor

Purpose: environment diagnostics that remain light-import safe.

Common options:

  • --json
  • --profile
  • --tier
  • --baseline-report
  • --subject-report
  • --strict
  • Report inputs accept an explicit JSON file path or a directory containing canonical report.json or evaluation.report.json; ambiguous directories with both canonical files are rejected and require an explicit file path.

Example:

invarlock doctor --json

invarlock advanced

Purpose: advanced and maintenance-oriented workflows that are intentionally outside the core user loop, except for the explicitly versioned JSON contracts listed below.

Subcommands:

  • invarlock advanced evidence-pack
    • Inspect, build, sign/keygen, and verify evidence packs
  • invarlock advanced policy
    • Build and verify policy-pack artifacts
  • invarlock advanced plugins
    • Read-only plugin discovery and explanation
  • invarlock advanced calibrate
    • Tier-policy calibration and sweep tooling
  • invarlock advanced runtime-verify
    • Low-level runtime-manifest verification for an existing report

Examples:

invarlock advanced evidence-pack verify <pack> --strict --report-assurance strict
invarlock advanced policy verify policy-pack.json --json
invarlock advanced plugins list --json
invarlock advanced calibrate --help
invarlock advanced runtime-verify --report reports/eval/evaluation.report.json --manifest reports/eval/runtime.manifest.json

Plugins & Entry Points

invarlock advanced plugins lists built-in and optional adapters, guards, edits, datasets, and related entry points without mutating the active Python environment.

Available read-only flows include:

  • invarlock advanced plugins list
  • invarlock advanced plugins adapters
  • invarlock advanced plugins guards
  • invarlock advanced plugins edits

Optional backends are installed through normal Python packaging, for example:

pip install "invarlock[hf]"
pip install "invarlock[awq,gptq]"

Plugin install and uninstall commands are not part of the CLI surface.

invarlock advanced runtime-verify

Purpose: low-level runtime provenance verification for an existing evaluation report and runtime manifest. This command validates the manifest contract, container execution fields, image digest presence, and the report SHA-256 binding. It is scoped to runtime provenance; invarlock verify owns primary-metric gates, paired-window math, and strict-assurance report policy.

Common options:

  • --report: path to evaluation.report.json
  • --manifest: path to runtime.manifest.json
  • --json: emit a machine-readable runtime-verify-v1 envelope

Example:

invarlock advanced runtime-verify \
  --report reports/eval/evaluation.report.json \
  --manifest reports/eval/runtime.manifest.json

JSON Output

Stable machine-readable output is available on these surfaces:

CommandFormat versionStability
invarlock doctor --jsondoctor-v1Required envelope fields are stable.
invarlock verify --jsonverify-v1Required envelope fields and exit-code meaning are stable.
invarlock advanced runtime-verify --jsonruntime-verify-v1Runtime-manifest verification envelope is stable.
invarlock advanced plugins list --jsonplugins-v1Plugin catalog envelope and contract catalog keys are stable.
invarlock advanced plugins adapters --jsonplugins-v1Adapter rows and contract catalog keys are stable.
invarlock advanced evidence-pack verify --jsonevidence-pack-verify-v1Evidence-pack verification envelope is stable.
invarlock advanced policy verify --jsonpolicy-pack-verify-v1Policy-pack verification envelope is stable.

These commands emit a single JSON object suitable for CI parsing. Within a format version, new optional fields may be added and consumers should ignore unknown fields. Removing a required field, renaming a required field, changing a field type, or changing pass/fail exit-code meaning requires a new format version.

Command Layout

  • The public top level is evaluate, verify, report, doctor, advanced, and version.
  • Evidence-pack, policy, plugin, and calibration workflows live under invarlock advanced ....
  • Host execution for the core evaluation path is expressed as --execution-mode host.
  • Internal delegated config execution uses a package-internal config-runner module, not a public CLI command.
  • Optional runtime backends are installed with Python extras instead of CLI install and uninstall commands.