Proof Pack Internals

This guide explains how the proof pack suite is wired internally: entrypoints, task graph, scheduling, and artifact generation. It complements Proof Packs, which focuses on how to run a suite.

Scope note: in this guide, CALIBRATION_RUN -> GENERATE_PRESET is called Preset Derivation. It produces run-scoped calibrated_preset_<model>.yaml/json files and does not directly modify global runtime/tiers.yaml.

Overview

AspectDetails
PurposeHardware-agnostic Phase 0 validation harness for edit detection
Versionproof-packs-v1
HardwareNVIDIA GPUs where models fit VRAM; multi-GPU recommended for full
Modelssubset (1 model), showcase/workshop3 (3 models), or full (6 models); all ungated public
Edits4 types × 2 versions per model; clean variants use tuned presets
Preset DerivationCALIBRATION_RUN + GENERATE_PRESET create run-scoped calibrated presets
SchedulingDynamic work-stealing, small_first priority strategy
Multi-GPUProfile-based; required_gpus grows only when memory requires it
OutputProof pack with manifest.json, checksums.sha256, and cert bundles (--layout v2 nests results + metadata)

Quick Start (Context)

# Run the subset suite (offline by default)
./scripts/proof_packs/run_suite.sh --suite subset

# Run the full suite and build a proof pack
./scripts/proof_packs/run_pack.sh --suite full --net 1

# Verify an existing proof pack
./scripts/proof_packs/verify_pack.sh --pack ./proof_pack_runs/subset_20250101_000000/proof_pack

Hardware Target

  • Hardware-agnostic by design; run on any NVIDIA GPU topology where the models fit in VRAM.
  • Multi-GPU scheduling is enabled automatically when a task’s memory plan exceeds per-device capacity.
  • Set GPU_MEMORY_GB or GPU_MEMORY_PER_DEVICE to match your hardware when running on GPUs with unusual memory sizes.

Entrypoints and modules

Entrypoints

  • scripts/proof_packs/run_suite.sh runs a suite and sets PACK_* runtime flags before calling the main orchestrator.
  • scripts/proof_packs/run_pack.sh runs a suite, then packages artifacts into a portable proof pack (manifest + checksums + certs).
  • scripts/proof_packs/verify_pack.sh validates a proof pack: checksums, optional GPG signature, and invarlock verify.
  • scripts/proof_packs/suites.sh defines the model suites and allows MODEL_1MODEL_8 overrides.
  • scripts/proof_packs/lib/validation_suite.sh orchestrates the run: preflight, queue creation, worker launch, and monitoring.

Library modules

  • lib/task_serialization.sh: task schema, JSON helpers, GPU planning.
  • lib/queue_manager.sh: queue states, dependency resolution, task generation.
  • lib/scheduler.sh: dynamic priority, memory gating, reservations.
  • lib/gpu_worker.sh: worker loop, heartbeats, task execution glue.
  • lib/task_functions.sh: implementations for each task type.
  • lib/model_creation.sh: edit and error-model creation helpers (create_model_variant dispatcher).
  • lib/config_generator.sh: InvarLock config generation and wrapper helpers.
  • lib/result_compiler.sh: analysis and verdict compilation.
  • lib/fault_tolerance.sh: error classification and retry/backoff logic.
  • scripts/proof_packs/python/manifest_writer.py: proof pack manifest.json writer.
  • scripts/proof_packs/python/preset_generator.py: preset derivation + edit-type variants.

Module dependency graph

┌───────────────────────────────────────────────────────────────────────┐
│                        MODULE DEPENDENCY GRAPH                        │
├───────────────────────────────────────────────────────────────────────┤
│ ENTRYPOINTS                                                           │
│   run_pack.sh | run_suite.sh | verify_pack.sh                         │
│   (pack+run) | (run only)  | (checksums+certs verify)                 │
│                                   │                                   │
│                                   ▼                                   │
│ ORCHESTRATION LAYER                                                   │
│   lib/validation_suite.sh (main_dynamic)                              │
│   Phase 0: setup + preflight                                          │
│   Phase 1: queue init -> Phase 2: worker launch -> Phase 3: monitor   │
│                                   │                                   │
│                   ┌───────────────┴───────────────┐                   │
│                   ▼                               ▼                   │
│ TASK EXECUTION                                  CORE SERVICES         │
│   lib/gpu_worker.sh                               queue_manager       │
│   task claim -> precheck -> execute -> cleanup    scheduler           │
│                                                  task_serialization   │
│                                                  fault_tolerance      │
│                   │                                                   │
│                   ▼                                                   │
│ TASK FUNCTIONS                                                        │
│   SETUP_BASELINE, CALIBRATION_RUN, GENERATE_PRESET                    │
│   CREATE_EDITS(_BATCH), CREATE_ERROR, evaluate_*                      │
└───────────────────────────────────────────────────────────────────────┘

Troubleshooting decision tree

Proof pack issues?
│
├─ Missing manifest.json/checksums.sha256?
│  └─ Used run_suite.sh instead of run_pack.sh
│     → Run: ./scripts/proof_packs/run_pack.sh --suite ... --net ...
│
├─ Spectral guard failing “clean” quantization edits?
│  ├─ Check: caps_exceeded in report spectral.summary
│  │  └─ Use edit-type presets (generated from preset derivation) or increase max_caps
│  └─ Check: high z-scores in attention layers
│     └─ Expected for quantization; tune thresholds if needed
│
├─ OOM errors?
│  ├─ Lower GPU_MEMORY_PER_DEVICE / GPU_MEMORY_GB
│  ├─ Disable batching: PACK_USE_BATCH_EDITS=false
│  └─ Reduce InvarLock batch/seq_len (INVARLOCK_EVAL_BATCH, INVARLOCK_SEQ_LEN)
│
└─ Disk pressure / ENOSPC?
   ├─ Check OUTPUT_DIR filesystem free space
   └─ Use a larger volume and rerun (suite writes caches under OUTPUT_DIR/.hf)

Model Suite

Model suites are defined in scripts/proof_packs/suites.sh and applied by run_suite.sh.

SuiteModelsNotes
subsetmistralai/Mistral-7B-v0.1Single-GPU friendly
showcasemistralai/Mistral-7B-v0.1, Qwen/Qwen2.5-14B, Qwen/Qwen2.5-32BMulti-GPU recommended; guard-focused scenarios
workshop3mistralai/Mistral-7B-v0.1, mistralai/Mixtral-8x7B-v0.1, 01-ai/Yi-34BWorkshop-friendly 3-model suite (architecture diversity)
fullmistralai/Mistral-7B-v0.1, Qwen/Qwen2.5-14B, Qwen/Qwen2.5-32B, 01-ai/Yi-34B, mistralai/Mixtral-8x7B-v0.1, Qwen/Qwen1.5-72BMulti-GPU recommended

Default full-suite model sizes (weights-only, approximate):

ModelVRAMCategoryNotes
mistralai/Mistral-7B-v0.1~14 GBSmallFlash Attention 2 compatible
Qwen/Qwen2.5-14B~28 GBSmallFlash Attention 2 compatible
Qwen/Qwen2.5-32B~64 GBMediumFlash Attention 2 compatible
01-ai/Yi-34B~68 GBMediumFlash Attention 2 compatible
mistralai/Mixtral-8x7B-v0.1~90 GBMoEMoE architecture
Qwen/Qwen1.5-72B~144 GBLargeFlash Attention 2 compatible

Notes:

  • Override models via MODEL_1MODEL_8; set an empty string to disable a slot.
  • validation_suite.sh includes a fallback list of large causal models if it is run directly without suites.sh.

Edit Types

Each model runs 8 edit experiments (4 types × 2 versions) plus optional error injection tests.

Clean edits (tuned)

Clean edits use tuned parameters supplied via PACK_TUNED_EDIT_PARAMS_FILE. The suite uses :clean: as a sentinel in the edit spec and resolves concrete parameters at runtime.

Edit TypeParametersScope
Quantization RTNtuned (bits, group_size) from tuned params fileFFN only
FP8 Quantizationtuned (format) from tuned params fileFFN only
Magnitude Pruningtuned (prune_level) from tuned params fileFFN only
Low-Rank SVDtuned (rank) from tuned params fileFFN only

Stress edits

Stress edits are split into required-fail (catastrophic) and informational scenarios. Required-fail scenarios are gating in the final verdict; informational scenarios are tracked as detection-quality signals and are validated by a minimum signal-fraction criterion.

Important nuance: some guards remediate without flipping a boolean validation gate. For example, Spectral can remain validation.spectral_stable=true while applying caps (spectral.caps_applied > 0). Informational stress scenarios treat both hard gate flips and remediation events (caps applied) as a “signal” so the suite measures guard activity without manufacturing clean false positives.

Edit TypeParametersScope
Quantization RTNquant_rtn:4:32:all (4-bit, group size 32)All layers
FP8 Quantizationfp8_quant:e5m2:allAll layers
Magnitude Pruningmagnitude_prune:0.5:all (50% sparsity)All layers
Low-Rank SVDlowrank_svd:32:all (rank 32)All layers

Error injection tests

Enabled when RUN_ERROR_INJECTION=true (default):

  • Required detection (must_detect): nan_injection, inf_injection, shape_mismatch, missing_tensors, extreme_quant, scale_explosion, rank_collapse, norm_collapse, weight_tying_break
  • Informational detection: rmt_norm_noise, spectral_moderate_scale, ve_mlp_scale_skew

rmt_norm_noise additionally emits an rmt_probe.json sidecar next to the error cert. This runs an explicit cross-model RMT probe on shared calibration windows (stored in the baseline report) so the proof pack can demonstrate RMT’s delta policy even when compare-mode evaluation keeps validation.rmt_stable=true.

ve_mlp_scale_skew additionally emits a ve_probe.json sidecar next to the error cert. Variance (DD-VE) is a remediation guard and compare-mode evaluation runs the subject model with a no-op edit, which can mute VE’s in-report evidence. The VE probe runs VE calibration directly on shared windows and records whether VE proposes scales and produces a meaningful primary-metric improvement.

Source of truth: scripts/proof_packs/scenarios.json strictness + intent + primary_guard metadata.

Scheduling

The suite uses dynamic work-stealing scheduling with a file-backed task queue. validation_suite.sh seeds the queue and launches one worker per GPU; workers claim tasks under a scheduler lock with GPU reservation files.

small_first priority strategy

Base task priorities (queue manager) are combined with dynamic boosts in scheduler.sh (model size, blocked dependents, age, and fairness penalties).

Priority (base)     Task type
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  90 ┤ SETUP_BASELINE
  85 ┤ CALIBRATION_RUN
  75 ┤ GENERATE_PRESET
  70 ┤ CREATE_EDITS_BATCH / CREATE_EDIT
  65 ┤ evaluate_EDIT
  60 ┤ CREATE_ERROR
  55 ┤ evaluate_ERROR

Dynamic boosts (scheduler):

  • Model size boosts: <30GB (+30), <70GB (+20), <100GB (+10).
  • Critical tasks: SETUP_BASELINE (+50), CALIBRATION_RUN (+20).
  • Unblock boost: +2 per dependent task (capped).
  • Age boost: +1 per 5 minutes in the queue (capped).
  • Fairness penalty: -3 per running task for the same model (capped).
  • Work-stealing boost: raises priority for lagging models.

Dynamic scheduling diagram

run_pack.sh (optional)
  -> run_suite.sh
     -> validation_suite.sh (main_dynamic)
        -> init_queue + generate_all_tasks
        -> start gpu_worker per GPU
        -> monitor loop (resolve deps, progress, restarts)

Work-stealing timeline (illustrative)

Time→   T=0                     T=50%                  T=100%
GPU 0   ████ small ████ small ████ large (helping) ████░░░░░░
GPU 1   ████ small ████ medium ████ large (helping) ███░░░░░░
GPU 2   ████ small ████ medium ████ large ████░░░░░░░░░░░░░░░
GPU 3   ████ medium ████ medium ████ large ████░░░░░░░░░░░░░░
GPU 4   ████ medium ████ large ████████████████░░░░░░░░░░░░░░
GPU 5   ████ MoE ████████ large ████████████████░░░░░░░░░░░░░

Illustrative only; actual scheduling depends on queue state and memory.

Multi-GPU Model Distribution

After baseline setup, the suite writes model_profile.json and updates per-task memory estimates. task_serialization.sh calculates required_gpus based on GPU_MEMORY_PER_DEVICE and NUM_GPUS:

  • Tasks reserve multiple GPUs only when memory exceeds per-device capacity.
  • Adaptive under-allocation is disabled by default (get_minimum_gpus matches required_gpus) to avoid OOM.
  • Set GPU_MEMORY_PER_DEVICE explicitly for non-80/180GB hardware.

Memory-aware selection example

GPU 2: 80GB total, 28GB free

Ready queue scan (highest-priority fit):
  qwen-14b_CALIBRATION_RUN_002  req=24GB  pri=85  FITS ✓
  mixtral_CREATE_EDITS_BATCH_001 req=92GB pri=70  SKIP ✗
  yi-34b_evaluate_EDIT_001       req=72GB  pri=65  SKIP ✗

GPU reservation protection

Reservations are stored under OUTPUT_DIR/workers/gpu_reservations/ and guarded by a queue/scheduler.lock (mkdir-based). The scheduler also expires stale reservations by TTL (GPU_RESERVATION_TTL).

Reservation state example

GPU 0   GPU 1   GPU 2   GPU 3
FREE    RSVD    FREE    RSVD
        ^              ^
        |              |
      task_a     task_b (multi-GPU: 1,3)
queue/scheduler.lock
workers/gpu_reservations/
├── gpu_1.lock
├── task_<task_id>.gpus
└── task_<task_id>.meta

Task lifecycle

┌─────────┐    ┌───────┐    ┌─────────┐    ┌───────────┐
│ PENDING │───▶│ READY │───▶│ RUNNING │───▶│ COMPLETED │
└─────────┘    └───────┘    └─────────┘    └───────────┘
                                 │
                                 ▼
                             ┌────────┐
                             │ FAILED │
                             └────────┘

GPU worker loop

START gpu_worker
  │
  ├─ check shutdown? ── yes → exit
  │
  ├─ query GPU memory
  ├─ find_and_claim_task (scheduler lock + reservation)
  │     ├─ none → sleep → loop
  │     └─ task → execute_task → complete/fail → release_gpus
  └─ update heartbeat/status → loop

Batch optimizations

Small/medium models default to batch edit creation:

  • Batch edit creation: CREATE_EDITS_BATCH loads a model once and creates all 8 edits (cuts repeated model loads).

Large or MoE models disable batch edits automatically (or via PACK_USE_BATCH_EDITS=false) and fall back to per-edit tasks (CREATE_EDIT → evaluate_EDIT).

Task dependency graphs

Batch (default):

SETUP_BASELINE
  ├─ CALIBRATION_RUN × N ──> GENERATE_PRESET ──┐
  ├─ CREATE_EDITS_BATCH ------------------------┴─> evaluate_EDIT × runs
  └─ CREATE_ERROR × types ----------------------┴─> evaluate_ERROR × types

Notes:

  • Error injection tasks (CREATE_ERRORevaluate_ERROR) branch off SETUP_BASELINE and require the preset for evaluation.

Per-edit path (large/MoE or PACK_USE_BATCH_EDITS=false):

SETUP_BASELINE
  ├─ CALIBRATION_RUN × N ──> GENERATE_PRESET ──┐
  ├─ CREATE_EDIT × edits -----------------------┴─> evaluate_EDIT × runs
  └─ CREATE_ERROR × types ----------------------┴─> evaluate_ERROR × types

Task breakdown per model (defaults)

Defaults: DRIFT_CALIBRATION_RUNS=5, CLEAN_EDIT_RUNS=3, STRESS_EDIT_RUNS=2, RUN_ERROR_INJECTION=true.

Batch path (default for small/medium):

  • Setup baseline: 1 task
  • Preset-derivation runs + preset generation: 6 tasks
  • Batch edits: 1 task
  • evaluate edits: 20 tasks
  • Error injection: 10 tasks

Total: ~38 tasks/model (varies with overrides).

Per-edit path (large/MoE or PACK_USE_BATCH_EDITS=false):

  • Setup baseline: 1 task
  • Preset-derivation runs + preset generation: 6 tasks
  • Create edits: 8 tasks
  • evaluate edits: 20 tasks
  • Error injection: 10 tasks

Total: ~45 tasks/model (varies with overrides).

Execution phases

PHASE 0: Environment setup
  - Dependency checks, GPU pool configuration, disk preflight
PHASE 1: Task queue initialization
  - Generate tasks for all models, resolve initial dependencies
PHASE 2: GPU worker launch
  - Spawn one worker per GPU, dynamic scheduling in loop
PHASE 3: Reports + verdict
  - Compile reports into final verdict reports

Run directory layout

OUTPUT_DIR/
  analysis/
    determinism_repeats.json          # optional (when --repeats is used)
  reports/
    final_verdict.txt
    final_verdict.json
    category_summary.json
    guard_signal_summary.json
    guard_intervention_summary.json
    scenario_signal_summary.json
  presets/
  state/
    model_revisions.json              # pinned HF revisions (when --net 1)
    progress.json
    disk_pressure.json
    tuned_edit_params.json            # copy of PACK_TUNED_EDIT_PARAMS_FILE
  queue/
    pending/ ready/ running/ completed/ failed/
    queue.lock
    scheduler.lock
  logs/
    gpu_<id>.log
    tasks/<task_id>.log
  workers/
    gpu_<id>.pid
    gpu_<id>.heartbeat
    gpu_<id>.status
    gpu_reservations/
    SHUTDOWN
  <model_name>/
    models/
      baseline/
      <edit_name>/
      error_<type>/
    reports/
      calibration/
      <edit_name>/run_<n>/
      errors/<type>/

Some scenarios emit additional sidecar artifacts alongside evaluation.report.json (for example reports/errors/rmt_norm_noise/rmt_probe.json or reports/errors/ve_mlp_scale_skew/ve_probe.json). When present, run_pack.sh copies these sidecars into the packaged proof pack under certs/**/.

Run modes

  • --calibrate-only / PACK_SUITE_MODE=calibrate-only
    • Preset derivation only mode.
    • Only promotes SETUP_BASELINE, CALIBRATION_RUN, and GENERATE_PRESET tasks.
    • The monitor exits after all GENERATE_PRESET tasks complete.
  • --run-only
    • Continue a prior run after preset derivation. This is effectively --resume with PACK_SUITE_MODE=full.
  • --resume
    • Reuses an existing queue and continues from where the run stopped.

Determinism vs throughput

PACK_DETERMINISM controls harness-level determinism:

# Throughput (default)
PACK_DETERMINISM=throughput ./scripts/proof_packs/run_suite.sh --suite subset

# Strict
PACK_DETERMINISM=strict ./scripts/proof_packs/run_suite.sh --suite subset
  • Throughput: NVIDIA_TF32_OVERRIDE=1, CUDNN_BENCHMARK=1.
  • Strict: NVIDIA_TF32_OVERRIDE=0, CUDNN_BENCHMARK=0, CUBLAS_WORKSPACE_CONFIG=:4096:8.

Network mode and model revisions

Proof packs are offline by default:

  • PACK_NET=0 sets INVARLOCK_ALLOW_NETWORK=0 and enables HF offline modes.
  • PACK_NET=1 enables downloads and writes state/model_revisions.json (ungated models only).
  • Offline runs require model_revisions.json; missing revisions trigger a hard error during SETUP_BASELINE.

Use PACK_MODEL_REVISIONS_FILE to override the revisions path.

Disk and cache behavior

Large runs can be storage-heavy (baseline + edits + error models):

  • Disk preflight estimates required storage and aborts early when insufficient.
    • Override with PACK_SKIP_DISK_PREFLIGHT=1 (not recommended).
    • The minimum free space guard is MIN_FREE_DISK_GB (default 200).
  • PACK_BASELINE_STORAGE_MODE=snapshot_symlink stores baseline weights as symlinks to HF cache files to reduce duplication.
  • HF caches default to OUTPUT_DIR/.hf (override with HF_HOME, HF_HUB_CACHE, HF_DATASETS_CACHE).

Proof pack packaging and verification

run_pack.sh builds a portable pack:

  • Copies reports/final_verdict.{txt,json} plus verdict sidecars (category_summary, guard_signal_summary, scenario_signal_summary) and key analysis/* artifacts.
  • Collects all reports into proof_pack/certs/....
  • Generates manifest.json, checksums.sha256, and optional manifest.json.asc.
  • Optional HTML export can be disabled with PACK_SKIP_HTML=1.

Packaging flow

run_pack.sh
  ├─ run_suite.sh → OUTPUT_DIR
  ├─ collect certs + reports
  ├─ write manifest + checksums
  └─ optional HTML + GPG signature

verify_pack.sh checks the pack:

  • Verifies manifest.json binds checksums.sha256 via checksums_sha256_digest.
  • Verifies checksums.sha256 (and thus all hashed artifacts).
  • Verifies the GPG signature when present; --strict requires it.
  • Enforces “no extra files” semantics in --strict mode.
  • Runs invarlock verify across all certs (JSON output optional).

Remote setup helper

scripts/proof_packs/lib/setup_remote.sh is an optional bootstrap script for fresh GPU hosts. It clones the repo, creates a venv, installs PyTorch and InvarLock, and leaves the host ready to run run_pack.sh.

Common knobs for the setup script:

  • REPO_DIR, REPO_URL, BRANCH, PYTHON_BIN, VENV_DIR.
  • TORCH_INDEX_URL, TORCH_PACKAGES, PACK_SKIP_TORCH_CHECK.
  • HF_HOME, HF_HUB_CACHE, HF_DATASETS_CACHE.

Tuning reference

Core configuration

VariableDefaultDescription
PACK_SUITEsubsetSuite name (subset or full)
PACK_NET0Enable network preflight/downloads
PACK_OUTPUT_DIRunsetSets OUTPUT_DIR when provided
OUTPUT_DIRauto./proof_pack_runs/<suite>_<timestamp> via entrypoint
PACK_OUTPUT_DIR_ABSOLUTEfalseNormalize OUTPUT_DIR to absolute path
PACK_SUITE_MODEfullfull, calibrate-only, or run-only
PACK_DETERMINISMthroughputHarness determinism mode
PACK_REPEATS0Determinism repeat metadata
PACK_MODEL_REVISIONS_FILEOUTPUT_DIR/state/model_revisions.jsonRevisions path
PACK_USE_BATCH_EDITSautoForce/disable batch edit creation
RESUME_MODEtrueSkip completed steps when outputs exist

Hardware selection

VariableDefaultDescription
CUDA_VISIBLE_DEVICESunsetExplicit GPU pool (comma-separated)
GPU_ID_LISTunsetAlternate GPU pool list
NUM_GPUSautoNumber of GPUs to use (clamped to pool)
GPU_MEMORY_GBautoPer-GPU memory hint for planning
GPU_MEMORY_PER_DEVICEGPU_MEMORY_GBPer-device memory for required_gpus
GPU_MIN_FREE_GB10Minimum free VRAM for eligibility
GPU_REQUIRE_IDLEtrueRequire GPUs with no compute processes
GPU_CACHE_TTL5GPU cache TTL (seconds)
GPU_RESERVATION_TTL60Reservation TTL (seconds)
GPU_RESERVATION_LOCK_TIMEOUT5Reservation lock timeout (seconds)

Model overrides

VariableDefaultDescription
MODEL_1MODEL_8suite-definedOverride model slots; empty disables

InvarLock settings

VariableDefaultDescription
INVARLOCK_DATASETwikitext2Dataset provider
INVARLOCK_DATASET_PROVIDER_YAMLunsetRaw YAML mapping for dataset.provider (advanced; overrides provider kind + args)
INVARLOCK_DATASET_PROVIDER_JSONunsetRaw JSON object for dataset.provider (advanced; overrides provider kind + args)
INVARLOCK_HF_DATASET_NAMEallenai/c4HF dataset name when INVARLOCK_DATASET=hf_text (legacy c4 auto-migrated)
INVARLOCK_HF_CONFIG_NAMEen (for allenai/c4)HF dataset config when INVARLOCK_DATASET=hf_text
INVARLOCK_HF_TEXT_FIELDtextText field when INVARLOCK_DATASET=hf_text
INVARLOCK_HF_MAX_SAMPLES2000Max rows consumed when INVARLOCK_DATASET=hf_text
INVARLOCK_HF_TRUST_REMOTE_CODEunsetPass trust_remote_code to HF load_dataset (not needed for allenai/c4 Parquet)
INVARLOCK_HF_CACHE_DIRunsetdatasets cache_dir override when INVARLOCK_DATASET=hf_text
INVARLOCK_LOCAL_JSONL_FILEunsetJSONL file path when INVARLOCK_DATASET=local_jsonl
INVARLOCK_LOCAL_JSONL_PATHunsetJSONL file/dir path when INVARLOCK_DATASET=local_jsonl
INVARLOCK_LOCAL_JSONL_DATA_FILESunsetJSONL glob/list when INVARLOCK_DATASET=local_jsonl
INVARLOCK_LOCAL_JSONL_TEXT_FIELDtextText field when INVARLOCK_DATASET=local_jsonl
INVARLOCK_LOCAL_JSONL_MAX_SAMPLES2000Max rows consumed when INVARLOCK_DATASET=local_jsonl
INVARLOCK_TIERbalancedGuard tier preset
INVARLOCK_PREVIEW_WINDOWS32Preview windows
INVARLOCK_FINAL_WINDOWS32Final windows
INVARLOCK_SEQ_LEN512Sequence length
INVARLOCK_STRIDE256Stride
INVARLOCK_EVAL_BATCH32InvarLock batch size
PACK_GUARDS_ORDERinvariants,spectral,rmt,variance,invariantsGuards included in preset derivation and generated presets

Primary metric acceptance/drift gates should be configured via profile/config (primary_metric.acceptance_range, primary_metric.drift_band), not env vars.

Tuned edit presets

VariableDefaultDescription
PACK_TUNED_EDIT_PARAMS_FILEunsetJSON file with tuned clean edit params (required when CLEAN_EDIT_RUNS>0).

Preset derivation reuse

VariableDefaultDescription
PACK_CALIBRATION_PRESET_DIRunsetDirectory containing calibrated_preset_<model>.yaml/json to reuse; skips preset-derivation runs.
PACK_CALIBRATION_PRESET_FILEunsetSingle preset file applied to all models (advanced).

Experiment controls

VariableDefaultDescription
DRIFT_CALIBRATION_RUNS5Preset-derivation run count
CLEAN_EDIT_RUNS3Clean edit evaluate runs
STRESS_EDIT_RUNS2Stress edit evaluate runs
RUN_ERROR_INJECTIONtrueEnable error injection

Storage and memory planning

VariableDefaultDescription
PACK_BASELINE_STORAGE_MODEsnapshot_symlinkBaseline storage mode
MIN_FREE_DISK_GB200Disk pressure threshold
PACK_SKIP_DISK_PREFLIGHT0Skip storage preflight
CUDA_MEMORY_FRACTION0.92Target GPU memory fraction
MODEL_LOAD_OVERHEAD_GB4Load overhead for planning
EDIT_OVERHEAD_GB8Per-edit overhead for planning
BATCH_EDIT_OVERHEAD_GB8Batch edit overhead
INVARLOCK_OVERHEAD_GB6InvarLock overhead

Worker + reliability controls

VariableDefaultDescription
WORKER_HEARTBEAT_INTERVAL30Heartbeat interval (seconds)
WORKER_IDLE_SLEEP5Sleep when idle (seconds)
WORKER_MAX_FAILURES10Stop worker after N failures
WORKER_TIMEOUT2700Worker heartbeat timeout (seconds)
CANCEL_BLOCKED_TASKS_GRACE_SECONDS90Fail blocked tasks after grace
TASK_TIMEOUT_DEFAULT21600Default task timeout (seconds)
TASK_TIMEOUT_<TASKTYPE>unsetPer-task timeout override

Packaging and verification

VariableDefaultDescription
PACK_DIROUTPUT_DIR/proof_packProof pack output dir
PACK_GPG_SIGN1Sign manifest if gpg available
PACK_SKIP_HTML0Skip HTML rendering
PACK_VERIFY_PROFILEdevProfile for invarlock verify

Troubleshooting

Missing model revisions (offline)

If offline runs fail with “requires model revisions”, run a preflight:

./scripts/proof_packs/run_suite.sh --suite subset --net 1

Or point to an existing revisions file with PACK_MODEL_REVISIONS_FILE.

OOM on large models

  • Lower GPU_MEMORY_PER_DEVICE so the planner requests more GPUs.
  • Disable batch edits: PACK_USE_BATCH_EDITS=false.
  • Reduce InvarLock batch/seq_len (e.g., INVARLOCK_EVAL_BATCH=16 INVARLOCK_SEQ_LEN=256).
  • Increase memory overhead knobs (MODEL_LOAD_OVERHEAD_GB, EDIT_OVERHEAD_GB).

Disk pressure / preflight failures

Check state/disk_pressure.json and ensure the output filesystem has headroom. Use MIN_FREE_DISK_GB=0 or PACK_SKIP_DISK_PREFLIGHT=1 only if you accept risk of partial artifacts.

Task timeouts

Increase the default or per-task timeout:

TASK_TIMEOUT_DEFAULT=28800 ./scripts/proof_packs/run_suite.sh --suite subset
TASK_TIMEOUT_CREATE_EDIT=28800 ./scripts/proof_packs/run_suite.sh --suite subset

Stuck queues or dead workers

  • Inspect state/progress.json and workers/gpu_<id>.status.
  • Check worker logs: logs/gpu_<id>.log and logs/tasks/<task_id>.log.
  • Re-run with --resume to recover from a crash.