Tier Policy Tuning CLI (Calibration)

Scope note: this page covers Tier Policy Tuning via invarlock advanced calibrate .... It outputs tiers_patch_*.yaml recommendations for runtime/tiers.yaml. For evidence-pack run-scoped preset derivation (CALIBRATION_RUN -> GENERATE_PRESET), see Evidence Pack Internals.

Overview

AspectDetails
PurposeRun policy-tuning sweeps to empirically derive guard thresholds and tier policy recommendations.
AudienceOperators recalibrating tier policies for additional model families or revised guard contracts.
Primary commandsinvarlock advanced calibrate null-sweep, invarlock advanced calibrate ve-sweep.
Requiresinvarlock[hf] for HF workflows; base config YAML for each sweep type.
NetworkOffline by default; use --allow-network on calibration commands when a sweep needs model or dataset downloads.
Source of truthsrc/invarlock/cli/commands/calibrate.py, src/invarlock/calibration/.

Smoke-sized configs are also shipped for maintainers who want to exercise the calibration command surface without a full policy-tuning campaign: configs/calibration/null_sweep_smoke.yaml and configs/calibration/rmt_ve_sweep_smoke.yaml. These are intended for smoke coverage and operational validation, not for published calibration evidence.

Quick Start

The commands below use the runtime container by default. Add --allow-host-execution only for host-side calibration workflows that intentionally bypass that boundary.

# Run spectral null-sweep (noop edit) to calibrate κ/alpha
invarlock advanced calibrate null-sweep \
  --allow-network \
  --config configs/calibration/null_sweep_ci.yaml \
  --out reports/calibration/null_sweep \
  --tier balanced --tier conservative \
  --n-seeds 10

# Run VE sweep (quant_rtn edit) to calibrate min_effect_lognll
invarlock advanced calibrate ve-sweep \
  --allow-network \
  --config configs/calibration/rmt_ve_sweep_ci.yaml \
  --out reports/calibration/ve_sweep \
  --tier balanced --tier conservative \
  --n-seeds 10

For smoke-only runs, swap the configs above for the shipped smoke configs and keep the run small:

invarlock advanced calibrate null-sweep \
  --allow-network \
  --config configs/calibration/null_sweep_smoke.yaml \
  --out reports/calibration/null_sweep_smoke

invarlock advanced calibrate ve-sweep \
  --allow-network \
  --config configs/calibration/rmt_ve_sweep_smoke.yaml \
  --out reports/calibration/ve_sweep_smoke

Concepts

  • Policy-tuning sweeps: Run multiple seeds/tiers to build empirical distributions for threshold recommendations.
  • Null sweep: Uses a no-op edit to measure baseline spectral behavior and derive false-positive-controlled κ caps and α levels.
  • VE sweep: Uses a real edit (e.g., quant_rtn) to measure variance guard predictive gate behavior and recommend min_effect_lognll.
  • Artifacts: Each sweep emits JSON (machine), CSV (spreadsheet), Markdown (human), and a tiers_patch_*.yaml recommendation file.
  • Artifact contract: The file names above are treated as stable public outputs and may be consumed directly by verification, review, and policy-pack workflows.

Published Basis vs Included Configs

Published assurance basis covers GPT-2 and BERT profiles. The repo also includes pilot calibration configs for additional families such as Mistral 7B, Qwen2 7B, Qwen2.5 7B, and Qwen2.5 14B under configs/calibration/, but those configs are not part of the published assurance basis until supporting artifacts are attached.

Policy-Tuning Sweep → Tier Policy Flow

Calibration sweep flow from base config through null and ve tuning into policy output.

Reference

Policy-Tuning Commands

CommandPurposeKey outputs
invarlock advanced calibrate null-sweepCalibrate spectral κ/alpha from null (noop) runs.null_sweep_report.json, tiers_patch_spectral_null.yaml
invarlock advanced calibrate ve-sweepCalibrate VE min_effect_lognll from real edit runs.ve_sweep_report.json, tiers_patch_variance_ve.yaml

null-sweep

Runs a null (no-op edit) sweep and calibrates spectral κ/alpha empirically.

Usage: invarlock advanced calibrate null-sweep --config <CONFIG> --out <OUT> [options]

OptionDefaultDescription
--configconfigs/calibration/null_sweep_ci.yamlBase null-sweep YAML (noop edit).
--outreports/calibration/null_sweepOutput directory for calibration artifacts.
--tierAll tiersTier(s) to evaluate (repeatable).
--seed--seed-start + rangeSeed(s) to run (repeatable). Overrides --n-seeds/--seed-start.
--n-seeds10Number of seeds to run.
--seed-start42Starting seed.
--profileciRun profile (ci, release, ci_cpu, dev).
--deviceAutoDevice override.
--safety-margin0.05Safety margin applied to κ recommendations.
--target-any-warning-rate0.01Target run-level spectral warning rate under the null.

Outputs:

  • null_sweep_report.json — Machine-readable sweep summary with per-tier recommendations.
  • null_sweep_runs.csv — Per-run metrics (max z-scores, candidate counts, etc.).
  • null_sweep_summary.md — Human-readable Markdown summary.
  • tiers_patch_spectral_null.yaml — Recommended spectral_guard settings for tiers.yaml.

ve-sweep

Runs VE predictive-gate sweeps and recommends min_effect_lognll per tier.

Usage: invarlock advanced calibrate ve-sweep --config <CONFIG> --out <OUT> [options]

OptionDefaultDescription
--configconfigs/calibration/rmt_ve_sweep_ci.yamlBase VE sweep YAML (quant_rtn edit).
--outreports/calibration/ve_sweepOutput directory for calibration artifacts.
--tierAll tiersTier(s) to evaluate (repeatable).
--seed--seed-start + rangeSeed(s) to run (repeatable). Overrides --n-seeds/--seed-start.
--n-seeds10Number of seeds to run.
--seed-start42Starting seed.
--window6, 8, 12, 16Variance calibration window counts (repeatable).
--target-enable-rate0.05Target expected VE enable rate (predictive-gate lower bound).
--profileciRun profile (ci, release, ci_cpu, dev).
--deviceAutoDevice override.
--safety-margin0.0Safety margin applied to min_effect recommendations.

Outputs:

  • ve_sweep_report.json — Machine-readable sweep summary with per-tier recommendations.
  • ve_sweep_runs.csv — Per-run metrics (predictive gate deltas, CI widths, etc.).
  • ve_power_curve.csv — Mean CI width per (tier, windows) for power analysis.
  • ve_sweep_summary.md — Human-readable Markdown summary.
  • tiers_patch_variance_ve.yaml — Recommended variance_guard settings for tiers.yaml.

Applying recommendations

After a sweep, merge the tiers_patch_*.yaml into your runtime/tiers.yaml:

# Review recommendations
cat reports/calibration/null_sweep/tiers_patch_spectral_null.yaml

# Merge into tiers.yaml (manual review recommended)
# The patch contains only the keys being updated:
#   balanced:
#     spectral_guard:
#       family_caps: { ... }
#       multiple_testing: { alpha: ... }

Troubleshooting

  • Missing config files: Ensure calibration configs exist under configs/calibration/.
  • Sweep failures: Check individual run reports under <out>/runs/<tier>/seed_*.
  • Unexpected recommendations: Review the safety margin and target rate parameters.

Observability

  • Sweep artifacts include full provenance (config, profile, tiers, run count).
  • Per-run reports are preserved under <out>/runs/ for debugging.
  • Power curves (VE sweep) help assess sample size requirements.