Runtime Manifests and Why Provenance Must Travel With the Result
A strong evaluation result should carry its runtime provenance with it. In InvarLock, that means the runtime manifest travels next to the report and is rechecked by invarlock verify.
Research Note: provenance is weakest when it stays behind in the runtime
Highlights
runtime.manifest.jsonis a runtime provenance sidecar, not decoration.- It binds a report to its runtime context and is enforced by
invarlock verifyon container-backed outputs. - Keeping the manifest next to the report is part of the evidence contract, not an archival preference.
Many systems record runtime context somewhere, but leave it stranded in logs, hidden state, or ephemeral CI metadata. That makes later review harder than it should be.
InvarLock's public manifest surface goes further than that.
The public contracts reference names runtime.manifest.json as a versioned contract. It sits next to evaluation.report.json, carries specific fields about the report, config, execution mode, and runtime flags, and is part of what later verification expects to see.
What The Runtime Manifest Actually Binds
The contract is concrete about the job. The manifest records the report path and hash, config path and source, execution mode, and a small runtime block describing the image reference, image digest, and runtime permissions such as network and remote-code allowance.
That is enough to support runtime-provenance review.
It tells a reviewer which report the manifest is describing, what configuration surface produced it, whether the run happened in the container path or host-bypass path, and which runtime affordances were left on. That is a far better posture than asking a reviewer to "trust our CI settings."
Why Verify Needs It
The public docs do not treat the manifest as optional decoration. Both the reports reference and the CLI reference say that container-backed evaluation outputs are expected to carry runtime.manifest.json next to evaluation.report.json, and invarlock verify checks that adjacency.
That matters because provenance only helps if it survives into the same place where the result is later interpreted.
Once the manifest travels with the report, verification can fail closed when the runtime provenance sidecar is missing. That is better than silently accepting a report whose runtime boundary can no longer be reconstructed.
Why A Sidecar Is Better Than Hidden Runtime State
The artifact layout docs make the design choice practical: archive the report and manifest together.
This is important because the manifest is not trying to replace the report. It is trying to travel with it. The report remains the evaluation summary. The manifest remains the runtime provenance surface. Keeping them adjacent preserves both interpretability and auditability.
That sidecar design is stronger than burying provenance in CI logs or a database that the reviewer may never see.
What The Manifest Still Does Not Prove
The runtime manifest makes a deliberately narrow promise.
The runtime manifest does not prove that the metric conclusion is correct. It does not prove that the dataset was the right one. It does not prove content safety, alignment, or deployment readiness. It proves something smaller and still valuable: the report is accompanied by a stated runtime context that can be rechecked later.
That is the right level of ambition for runtime provenance.
Claim Map
The practical path is:
- emit
evaluation.report.json - emit adjacent
runtime.manifest.json - keep them together in the archive
- run
invarlock verifyagainst that pair later
That is a much stronger provenance story than leaving execution details behind in the machine that ran the job.
Limitations
- This note is about the runtime-manifest contract, not a new benchmark result.
- Provenance is necessary for rechecking runtime context, but it does not by itself settle the quality claim.
Sources
More in Research Note
Continue through nearby posts in the same reading thread.
Research Note
Evidence Packs, Not Screenshots
A screenshot can communicate a result. An evidence pack can be inspected, checked, and re-verified later. That is the difference between presentation and portable evidence.
Research Note
What Belongs in evaluation.report.json
An evaluation report is strongest when it is treated as a stable evidence contract: a small required core, meaningful optional blocks, and a clear boundary around what still lives outside the JSON.
Research Note
Calibration Is the Product Surface, Not a Side Utility
Calibration is not just analysis around the product. It changes how thresholds are derived, when correction paths may turn on, and which policy values later govern reports.