Report Outline

This page defines the renderer-neutral structure for current InvarLock evaluation reports. It connects the canonical evaluation.report.json payload to human-readable renderers such as Markdown, HTML, evidence-pack summaries, and benchmark comparison pages through the same section model.

The outline is implemented by invarlock.reporting.report_outline.build_evaluation_report_outline.

Purpose

The outline keeps report renderers aligned around the same information architecture. Reports can include:

policy failures, warning-mode guard movement, and strict warning policies
causal, MLM, seq2seq, image-text, and MoE evidence lanes
primary-metric tail checks and measured accuracy floors
public assurance-basis reports with runtime manifests and model revisions
guard-value evidence and benchmark-style bare-vs-guarded comparisons

Renderers should use this shared outline for visible section order.

Canonical Section Order

Section	Purpose	Typical source blocks
Decision	Overall verdict, evidence mode, model/edit identity, warning count.	`validation`, `assurance`, `meta`, `primary_metric`, `guard_warnings`
Primary Metric	Task metric, final value, baseline-relative comparison, CI, tail gate.	`primary_metric`, `primary_metric_tail`, `validation`
Policy Gates	Hard verify gates and thresholds.	`validation`, `policy_digest`, `resolved_policy`
Guard Signals	Guard observations and warnings separate from hard failures.	`guard_warnings`, `invariants`, `spectral`, `rmt`, `variance`, `moe`
Benchmark Comparison	Optional bare-vs-guarded scenario deltas.	`benchmark_comparison`, `benchmark`, `guard_effect_benchmark`
Evidence And Provenance	Dataset, windows, runtime/policy/provider digests, device, seed.	`dataset`, `provenance`, `policy_digest`, `meta`, `artifacts`
Technical Appendix	Verbose raw measurements, resolved policy, plugins, artifacts.	`plugins`, `resolved_policy`, `policy_provenance`, `system_overhead`, `classification`, `structure`, `artifacts`

The benchmark section is omitted when no benchmark block is present.

Renderer Rules

Keep policy failures, guard warnings, and guard-value evidence distinct.
Keep primary metric interpretation task-aware: ppl-like metrics use ratios; accuracy uses percentage-point deltas.
Put benchmark deltas after guard signals, not in provenance or appendix.
Keep verbose policy YAML, plugin provenance, and raw artifacts in the technical appendix unless they are needed to explain the verdict.
Treat the outline as the source for visible section order in future Markdown and HTML renderers.