Back to blog

Report outlines, guard warnings, and wider public evidence

Ink/charcoal doodle: report sections flow through a guard-warning gate into checked public-evidence lanes.

InvarLock 0.11.0 makes human reports more consistent, separates guard warnings from hard failures, and expands the public evidence surface across model families.

2 min read
InvarLock Team

Release: InvarLock 0.11.0 - Shared report outlines, guard-warning semantics, and expanded published-basis evidence

Highlights

  • Human-readable Markdown and HTML reports now share a renderer-neutral outline, so the visible Decision, Primary Metric, Policy Gates, Guard Signals, Evidence And Provenance, and appendix sections line up across surfaces.
  • Guard movement is first-class report evidence: baseline-relative guard warnings are advisory by default, while strict warning policy can make those warnings fail verification.
  • The public evidence set is broader, with expanded published-basis lanes for dense, seq2seq, image-text, and MoE families, plus structured JSON-answer handling for VQA-style image-text runs.

0.11.0 is mostly about making the evidence easier to read without weakening the machine contract underneath it. JSON reports, runtime manifests, digests, and signed metadata remain the stable verification surface. The human-facing layer now has a clearer outline and a more consistent report vocabulary.

The new Report Outline reference is the best starting point. It explains the shared section model that Markdown reports, HTML reports, evidence-pack summaries, and benchmark comparison pages can render from. The companion Reading a report guide shows how reviewers should scan the HTML export: ledger row, section rail, verdict, metric, gates, guard signals, provenance, and appendix.

Guard warnings are the other major reader-facing shift. The release separates hard policy failures from baseline-relative guard movement that remains inside the hard budget. That distinction matters in public claims: a warning is visible evidence, not a silent pass, and teams can still choose --fail-on-warnings when their workflow treats any guard movement as release-blocking. The Reports Reference and CLI Reference are the synced places to check how those report and verification surfaces are described.

0.11.0 also expands the public evidence story. The Public Evidence Walkthrough now sits alongside updated Model Family Catalog and Model Adapters references for dense, seq2seq, image-text, and MoE lanes. The changelog records new published-basis coverage for families including Mistral/Ministral, Qwen, Granite, DeepSeek R1 Qwen variants, Phi-4, SmolLM3, OLMo 2, Falcon lanes, FLAN-T5, OLMoE, Mixtral, and Gemma image-text variants.

For image-text evidence, the release adds structured JSON-answer extraction for vision_text evaluation. In practical terms, public VQA-style runs can ask for compact {"answer": "..."} outputs without losing exact-answer scoring. The release also adds an adequacy gate for public image-text published-basis evidence: enough measured accuracy, enough final examples, and concise answer-shaped generations when prediction records are embedded.

For the immutable release record, read the tagged CHANGELOG.md for v0.11.0.

More in Release

Continue through nearby posts in the same reading thread.