Calibration Is the Product Surface, Not a Side Utility

Calibration is not just analysis around the product. It changes how thresholds are derived, when correction paths may turn on, and which policy values later govern reports.

May 25, 2026

5 min read

InvarLock Team

Historical context: This dated note records the InvarLock product model available when it was published. InvarLock v0.13.0 replaced the current operator workflow with a closed request, canonical signed evidence, and independent verifier replay; use the v0.13.0 release note and current documentation for present-day instructions.

Synthesis: what May implies about where calibration really lives

Highlights

Threshold derivation: null sweeps emit tiers_patch_spectral_null.yaml with calibrated family_caps.* keys. See the null-sweeps note.
Enablement discipline: variance equalization only turns on when paired Delta logNLL clears min_effect_lognll under the tier's sidedness rule. See the variance-enablement note.
Policy continuity: patched keys surface as resolved_policy.* in later reports, giving the sweep -> patch -> policy -> report path a traceable identity. See the tier-policy note.

The three linked notes all start from calibration, but they do not stay there in the narrow sense. Null sweeps explain how thresholds are derived from null behavior. Variance enablement explains how a correction path has to earn enablement. Tier policy explains how sweep outputs become reviewable policy patches and later show up in reports.

Taken together, those posts imply a stronger and more practical conclusion: calibration is part of the product surface.

That framing has a specific boundary. It does not mean calibration is the whole product. It means calibration changes the live boundary that operators actually interact with and later audit.

Here, "product surface" means the operator-visible thresholds, gates, patch paths, and report fields that actually govern evaluation and review.

1. Calibration Sets The Derivation Story

The first May post matters because it changes how threshold selection should be interpreted. The calibration CLI reference describes null sweeps as a workflow that measures clean no-op behavior and emits a patch-shaped recommendation. Threshold setting is no longer just taste plus experience. It becomes part of the formal operational story.

That is already product-surface behavior. A system whose thresholds are empirically derived and patchable exposes a different review interface than one whose thresholds live mostly in inherited defaults.

2. Calibration Sets The Enablement Story

The second May post adds a different kind of boundary. Variance equalization is not simply available because it exists in the implementation. On the public surface, it turns on only when predictive evidence clears tier-specific sidedness and minimum-effect rules.

That means calibration is not only about where thresholds come from. It is also about when a correction path may become active at all. The variance guard predictive gate is therefore a product-facing gate, not just an analysis note.

Again, that is not peripheral analysis. It is control over a live runtime behavior.

3. Calibration Sets The Policy Story

The third May post completes the path. Sweep outputs do not end in a human summary. They end in reviewable tiers_patch_*.yaml files that can be merged into runtime tier policy, then exposed later as resolved_policy.* in reports.

This is the clearest reason calibration belongs on the product surface. Its outputs survive. They do not disappear after an experiment review meeting. They become part of the policy that future evaluations inherit.

A system that exposes policy this way is telling you something important: calibration is part of how the product defines itself operationally.

Why This Framing Matters

Calling calibration a side utility encourages the wrong habits. It suggests that calibration can be deferred, hidden, or treated as a private research detail while the "real" product lives elsewhere.

The public InvarLock docs point in the opposite direction. The tier-policy catalog distinguishes calibrated values from explicit policy choices, and the guards reference makes resolved policy visible in the report surface. Calibration determines part of the threshold surface, part of the enablement surface, and part of the policy surface. That means it belongs inside serious operator review, release review, and evidence interpretation.

This framing is also useful for readers. It tells them where to look when they want to understand why a threshold exists, why a correction path stayed off, or why a report resolved to a given policy value.

What Calibration Still Does Not Solve

This synthesis has a deliberately conservative boundary.

Calibration does not turn every decision into empiricism. The tier-policy catalog is explicit that some values are calibrated and others are policy choices. Calibration also does not eliminate the need for review, transfer checks, or recalibration when window budgets, hardware, or model families change. The tier v1 calibration note is still local to a specific published evidence surface.

So the useful conclusion is not "calibration solves the whole system." It is smaller and more useful: calibration is a real part of the system boundary that operators need to understand and govern.

The Calibration Checklist

For the current public surface, calibration belongs on the product boundary when it provides:

an empirical derivation story rather than threshold folklore
explicit enablement rules rather than hidden adaptive defaults
a reviewable policy patch rather than summary-only interpretation
downstream visibility through resolved_policy.*

If those pieces are missing, calibration still may exist. It is simply weaker as an operational surface.

Limitations

Measurement claims are limited to the linked calibration notes.
"Product surface" means the operator-visible threshold, gate, patch, and report set, narrower than "entire product" by design.
Window-budget, hardware, and family-transfer questions still need their own evidence; calibration belongs on the product surface without implying that any specific calibrated value generalizes.

Calibration Is the Product Surface, Not a Side Utility

Highlights

1. Calibration Sets The Derivation Story

2. Calibration Sets The Enablement Story

3. Calibration Sets The Policy Story

Why This Framing Matters

What Calibration Still Does Not Solve

The Calibration Checklist

Limitations

Sources

More in Research Note

What Belongs in evaluation.report.json

From Sweep Outputs to Tier Policy

Runtime Manifests and Why Provenance Must Travel With the Result