Example Reports

Overview

AspectDetails
PurposeShow how to generate and interpret InvarLock reports.
AudienceUsers learning the evaluation workflow.
Outputsevaluation.report.json, evaluation_report.md, report.json.
Requiresinvarlock[hf] for HF adapter workflows.

InvarLock emits both machine-readable reports and human-friendly summaries. Use the steps below to reproduce representative artifacts from the current release.

1. Generate a report Bundle

INVARLOCK_ALLOW_NETWORK=1 INVARLOCK_DEDUP_TEXTS=1 invarlock evaluate \
  --baseline sshleifer/tiny-gpt2 \
  --subject  sshleifer/tiny-gpt2 \
  --adapter auto \
  --profile release \
  --tier balanced \
  --preset configs/presets/causal_lm/wikitext2_512.yaml \
  --edit-config configs/overlays/edits/quant_rtn/8bit_full.yaml \
  --out runs/quant8_demo \
  --report-out reports/quant8_demo

The command writes evaluation.report.json and evaluation_report.md under reports/quant8_demo/. Each report contains:

  • Model and edit metadata (model id, adapter, commit hash, edit plan)
  • Drift / perplexity / RMT verdicts with paired bootstrap confidence intervals
  • Guard diagnostics (spectral, variance, invariants) including predictive-gate notes
  • Policy digest capturing tier thresholds and calibration choices

2. Create a Narrative Summary

# The report already includes a markdown summary:
cat reports/quant8_demo/evaluation_report.md

# To regenerate markdown from run reports, pass edited + baseline:
invarlock report --run <edited_report.json> --baseline <baseline_report.json> --format markdown

The markdown report mirrors the report content but highlights:

  • Baseline vs edited perplexity series
  • Guard outcomes with links to supporting metrics
  • Checklist of gates (PASS/FAIL) suitable for change-control review

3. Shareable Attachments

For audits, collect the following files:

FilePurpose
runs/<name>/**/report.jsonExecution log, metrics, and guard telemetry
reports/<name>/evaluation.report.jsonMachine-readable evaluation report
reports/<name>/evaluation_report.mdHuman-friendly summary for reviewers

Reports remain valid only for the same baseline reference, pairing assumptions, dataset/tokenizer context, and scoped claim surface, and only while invarlock verify reports/<name>/evaluation.report.json continues to pass.