Back to blog

What Belongs in evaluation.report.json

Ink/charcoal doodle: a required report core surrounded by optional evidence blocks with a runtime manifest sidecar.

An evaluation report is strongest when it is treated as a stable evidence contract: a small required core, meaningful optional blocks, and a clear boundary around what still lives outside the JSON.

4 min read
InvarLock Team

Research Note: a report file should behave like a contract, not a dump

Highlights

  • evaluation.report.json has a small required core and a larger reviewer-facing optional surface.
  • The required core is stable because downstream verification and tooling depend on it.
  • The report is a key evidence object, but it is not the whole archive by itself.

It is easy to talk about a report file as if it were just export format. That framing is too weak for a system that expects other people to review, parse, and re-check the result later.

InvarLock's public reports reference goes further than that.

The docs define evaluation.report.json as a contract surface with a stable required core, a bounded optional surface, and a direct relationship to invarlock verify. That is a very different role from "some JSON we happened to save."

The Stable Required Core

The reports reference is explicit about the minimum contract. A valid v1 report must carry schema_version, run_id, meta, dataset, artifacts, plugins, and primary_metric.

That required core is not arbitrary. It captures the smallest stable surface needed to identify the run, describe the paired data surface, locate artifacts, snapshot the plugin environment, and state the canonical primary-metric result.

This is the part of the report that other tools can rely on with the highest confidence. It is the minimum shape that lets the file behave like a real evidence contract instead of an informal snapshot.

Optional Blocks That Still Matter

The optional surface is where readers often get lazy.

validation, policy_digest, resolved_policy, primary_metric_tail, confidence, provenance, and related blocks are not always required by the schema, but they are still reviewer-critical when present. The reading guide makes this obvious: policy configuration, measurement contracts, provenance digests, and confidence labels are the fields that often tell a reviewer why the result should or should not be trusted.

So "optional" here does not mean "decorative." It means the contract is keeping a stable core while allowing evidence-rich extensions to evolve.

Why The Report Is Not The Whole Bundle

The artifact layout docs draw a boundary that is worth making explicit: container-backed outputs emit runtime.manifest.json next to evaluation.report.json, and archives should retain the baseline and subject report.json files as well.

That matters because the report is a derived evaluation object. It is central, but it is not self-sufficient. A reviewer who wants to re-check pairing, provenance, and runtime provenance needs more than the final JSON alone.

This is the right design. The report should be the main summary contract without pretending to be the entire retained record.

Why A Stable Contract Helps Review And Tooling

The report-to-verify flow is the practical reason this matters.

invarlock verify is not reading the file as a pretty export. It is using it as the surface for schema checks, pairing checks, ratio math, and required runtime provenance. That gives the report contract real operational weight.

The same structure also helps downstream tooling. Parsers, HTML renderers, and review UIs can rely on the required blocks while gracefully taking advantage of richer optional evidence when it exists.

What The Report Still Does Not Contain

The claim should stay narrow.

evaluation.report.json does not contain the whole archive. It does not replace the baseline report. It does not replace runtime.manifest.json. And it does not remove the need to preserve the surrounding evidence layout when the goal is later re-verification.

The right way to value the file is not as a total record. It is as the stable center of a larger evidence bundle.

Claim Map

The practical reading is:

  • keep the required core stable
  • use optional blocks to expose richer policy and provenance evidence
  • verify the report as a contract surface
  • archive the report with its adjacent manifest and source reports

That is a much better model than "just save the JSON."

Limitations

  • This post explains the public report contract; it does not add a fresh report example.
  • Optional blocks remain important even though they are not all required by the schema.

Sources

More in Research Note

Continue through nearby posts in the same reading thread.