Example Reports

Overview

AspectDetails
PurposeShow how to generate and interpret InvarLock reports.
AudienceUsers learning the evaluation workflow.
Outputsevaluation.report.json, evaluation_report.md, report.json, and runtime.manifest.json for container-backed outputs.
Requiresinvarlock[hf] for HF adapter workflows.

InvarLock emits both machine-readable reports and human-friendly summaries. Use the steps below to reproduce representative artifacts from this repository version.

Read The Bundle First

For most reviewers, the primary artifact is evaluation.report.json, not the lower-level run reports. Use it as the front door:

invarlock verify reports/quant8_demo/evaluation.report.json
invarlock report html -i reports/quant8_demo/evaluation.report.json -o reports/quant8_demo/evaluation.html
invarlock report explain --evaluation-report reports/quant8_demo/evaluation.report.json

Artifact model:

ArtifactWhat it containsTypical next step
evaluation.report.jsonPaired evaluation outcome, validation block, policy/provenance summaryverify, report html, report explain --evaluation-report
report.jsonOne run's raw metrics, guard telemetry, and execution artifactsreport generate, explicit report explain --subject-report ... --baseline-report ...

1. Generate a report Bundle

The command below shows the default runtime-container path. It writes a container-backed runtime.manifest.json next to evaluation.report.json. Public host-side workflows use --execution-mode host and should verify the resulting report with invarlock verify --runtime-provenance host .... This reproduction uses repo-owned preset and overlay files so it matches the example artifacts checked into this repository version; wheel-only installs should start with Getting Started for the first evaluation run, then come back here once they already have an evaluation bundle.

INVARLOCK_DEDUP_TEXTS=1 invarlock evaluate --allow-network \
  --baseline sshleifer/tiny-gpt2 \
  --subject  sshleifer/tiny-gpt2 \
  --adapter auto \
  --profile release \
  --tier balanced \
  --preset configs/presets/causal_lm/wikitext2_512.yaml \
  --edit-config configs/overlays/edits/quant_rtn/8bit_full.yaml \
  --out runs/quant8_demo \
  --report-out reports/quant8_demo

The command writes evaluation.report.json, evaluation_report.md, and runtime.manifest.json under reports/quant8_demo/. Each report contains:

  • Model and edit metadata (model id, adapter, commit hash, edit plan)
  • Drift / perplexity / RMT verdicts with paired bootstrap confidence intervals
  • Guard diagnostics (spectral, variance, invariants) including predictive-gate notes
  • Policy digest capturing tier thresholds and calibration choices

2. Create a Narrative Summary

# The report already includes a markdown summary:
cat reports/quant8_demo/evaluation_report.md

# To regenerate markdown from run reports, pass edited + baseline:
invarlock report generate \
  --run <edited_report.json> \
  --baseline-run-report <baseline_report.json> \
  --format markdown

The markdown report mirrors the report content but highlights:

  • Baseline vs edited perplexity series
  • Guard outcomes with links to supporting metrics
  • Checklist of gates (PASS/FAIL) suitable for change-control review

3. Shareable Attachments

HTML report chrome:

HTML report chrome anatomy showing header, summary chips, quick links rail, and canonical report body.

That layout is intentional: reviewers should be able to confirm overall status, jump directly to the gate or provenance section they care about, and still read the unchanged canonical report content underneath.

For audits, collect the following files:

FilePurpose
runs/<name>/**/report.jsonExecution log, metrics, and guard telemetry
reports/<name>/evaluation.report.jsonMachine-readable evaluation report
reports/<name>/runtime.manifest.jsonRuntime provenance for container-backed outputs
reports/<name>/evaluation_report.mdHuman-friendly summary for reviewers

Reports remain valid only for the same baseline reference, pairing assumptions, dataset/tokenizer context, and scoped claim surface, and only while invarlock verify --json reports/<name>/evaluation.report.json continues to pass against the adjacent runtime.manifest.json.