Evidence Pack Internals

This guide explains how the evidence pack suite is wired internally: entrypoints, task graph, scheduling, and artifact generation. It complements Evidence Packs, which focuses on how to run a suite.

Scope note: in this guide, CALIBRATION_RUN -> GENERATE_PRESET is called Preset Derivation. It produces run-scoped calibrated_preset_<model>.yaml/json files and does not directly modify global runtime/tiers.yaml.

Overview

Aspect	Details
Purpose	Hardware-agnostic Phase 0 validation harness for edit detection
Version	`evidence-packs-v1`
Hardware	NVIDIA GPUs where models fit VRAM; multi-GPU recommended for `full`
Models	`subset` (1 model), `showcase`/`workshop3` (3 models), or `full` (6 models); all ungated public
Edits	Scenario-driven; default suites use 4 clean + 4 stress edit scenarios per model, and filtered manifests may select any subset
Preset Derivation	`CALIBRATION_RUN` + `GENERATE_PRESET` create run-scoped calibrated presets
Scheduling	Dynamic work-stealing, `small_first` priority strategy
Multi-GPU	Profile-based; `required_gpus` grows only when memory requires it
Output	Evidence pack with `manifest.json`, `checksums.sha256`, and report bundles (`--layout v2` nests results + metadata)
Source of truth	`scripts/evidence_packs/run_suite.sh`, `scripts/evidence_packs/run_pack.sh`, `src/invarlock/evidence_pack.py`, `src/invarlock/cli/commands/evidence_pack.py`

Quick Start (Context)

# Run the subset suite (offline by default)
./scripts/evidence_packs/run_suite.sh --suite subset

# Run the full suite and build an evidence pack
./scripts/evidence_packs/run_pack.sh --suite full --net 1

# Verify an existing evidence pack
invarlock advanced evidence-pack verify ./evidence_pack_runs/subset_20250101_000000/evidence_pack --strict

Hardware Target

Hardware-agnostic by design; run on any NVIDIA GPU topology where the models fit in VRAM.
Multi-GPU scheduling is enabled automatically when a task’s memory plan exceeds per-device capacity.
Set GPU_MEMORY_GB or GPU_MEMORY_PER_DEVICE to match your hardware when running on GPUs with unusual memory sizes.

Entrypoints and modules

Entrypoints

scripts/evidence_packs/run_suite.sh runs a suite and sets PACK_* runtime flags before calling the main orchestrator.
scripts/evidence_packs/run_pack.sh runs a suite, then packages artifacts into a portable evidence pack (manifest + checksums + reports).
scripts/evidence_packs/verify_pack.sh validates an evidence pack in repo workflows.
invarlock advanced evidence-pack verify provides the package-native verifier path for installed wheels.
scripts/evidence_packs/suites.sh defines the model suites and allows MODEL_1–MODEL_8 overrides.
scripts/evidence_packs/lib/validation_suite.sh orchestrates the run: preflight, queue creation, worker launch, and monitoring.

Library modules

lib/task_serialization.sh: task schema, JSON helpers, GPU planning.
lib/queue_manager.sh: queue states, dependency resolution, task generation.
lib/scheduler.sh: dynamic priority, memory gating, reservations.
lib/gpu_worker.sh: worker loop, heartbeats, task execution glue.
lib/task_functions.sh: implementations for each task type.
lib/model_creation.sh: edit and error-model creation helpers (create_model_variant dispatcher).
lib/config_generator.sh: InvarLock config generation and wrapper helpers.
lib/result_compiler.sh: analysis and verdict compilation.
lib/fault_tolerance.sh: error classification and retry/backoff logic.
scripts/evidence_packs/python/manifest_writer.py: evidence pack manifest.json writer.
scripts/evidence_packs/python/preset_generator.py: preset derivation + edit-type variants.

Module dependency graph

Troubleshooting decision tree

Model Suite

Model suites are defined in scripts/evidence_packs/suites.sh and applied by run_suite.sh.

Suite	Models	Notes
`subset`	`mistralai/Mistral-7B-v0.1`	Single-GPU friendly
`showcase`	`mistralai/Mistral-7B-v0.1`, `Qwen/Qwen2.5-14B`, `Qwen/Qwen2.5-32B`	Multi-GPU recommended; guard-focused scenarios
`workshop3`	`mistralai/Mistral-7B-v0.1`, `mistralai/Mixtral-8x7B-v0.1`, `01-ai/Yi-34B`	Workshop-friendly 3-model suite (architecture diversity)
`full`	`mistralai/Mistral-7B-v0.1`, `Qwen/Qwen2.5-14B`, `Qwen/Qwen2.5-32B`, `01-ai/Yi-34B`, `mistralai/Mixtral-8x7B-v0.1`, `Qwen/Qwen1.5-72B`	Multi-GPU recommended

Default full-suite model sizes (weights-only, approximate):

Model	VRAM	Category	Notes
`mistralai/Mistral-7B-v0.1`	~14 GB	Small	Flash Attention 2 compatible
`Qwen/Qwen2.5-14B`	~28 GB	Small	Flash Attention 2 compatible
`Qwen/Qwen2.5-32B`	~64 GB	Medium	Flash Attention 2 compatible
`01-ai/Yi-34B`	~68 GB	Medium	Flash Attention 2 compatible
`mistralai/Mixtral-8x7B-v0.1`	~90 GB	MoE	MoE architecture
`Qwen/Qwen1.5-72B`	~144 GB	Large	Flash Attention 2 compatible

Notes:

Override models via MODEL_1–MODEL_8; set an empty string to disable a slot.
validation_suite.sh includes a fallback list of large causal models if it is run directly without suites.sh.

Edit Types

Each model runs 8 edit experiments (4 types × 2 versions) plus optional error injection tests.

Clean edits (tuned)

Clean edits use tuned parameters supplied via PACK_TUNED_EDIT_PARAMS_FILE. The suite uses :clean: as a sentinel in the edit spec and resolves concrete parameters at runtime.

Edit Type	Parameters	Scope
Quantization RTN	tuned (`bitwidth`, `group_size`) from tuned params file	FFN only
FP8 Quantization	tuned (`format`) from tuned params file	FFN only
Magnitude Pruning	tuned (`prune_level`) from tuned params file	FFN only
Low-Rank SVD	tuned (`rank`) from tuned params file	FFN only

Stress edits

Stress edits are split into required-fail (catastrophic) and informational scenarios. Required-fail scenarios are gating in the final verdict; informational scenarios are tracked as detection-quality signals and are validated by a minimum signal-fraction criterion.

Important nuance: some guards remediate without flipping a boolean validation gate. For example, Spectral can remain validation.spectral_stable=true while applying caps (spectral.caps_applied > 0). Informational stress scenarios treat both hard gate flips and remediation events (caps applied) as a “signal” so the suite measures guard activity without manufacturing clean false positives.

Edit Type	Parameters	Scope
Quantization RTN	`quant_rtn:8:all` (8-bit)	All layers
FP8 Quantization	`fp8_quant:e5m2:all`	All layers
Magnitude Pruning	`magnitude_prune:0.5:all` (50% sparsity)	All layers
Low-Rank SVD	`lowrank_svd:32:all` (rank 32)	All layers

Error injection tests

Enabled when RUN_ERROR_INJECTION=true (default):

Required detection (must_detect): nan_injection, inf_injection, shape_mismatch, missing_tensors, extreme_quant, scale_explosion, rank_collapse, norm_collapse, weight_tying_break
Informational detection: rmt_norm_noise, spectral_moderate_scale, ve_mlp_scale_skew

rmt_norm_noise additionally emits an rmt_probe.json sidecar next to the error report. This runs an explicit cross-model RMT probe on shared calibration windows (stored in the baseline report) so the evidence pack can demonstrate RMT’s delta policy even when compare-mode evaluation keeps validation.rmt_stable=true.

ve_mlp_scale_skew additionally emits a ve_probe.json sidecar next to the error report. Variance (DD-VE) is a remediation guard and compare-mode evaluation runs the subject model with a no-op edit, which can mute VE’s in-report evidence. The VE probe runs VE calibration directly on shared windows and records whether VE proposes scales and produces a meaningful primary-metric improvement.

Source of truth: scripts/evidence_packs/scenarios.json strictness + intent + primary_guard metadata.

Scheduling

The suite uses dynamic work-stealing scheduling with a file-backed task queue. validation_suite.sh seeds the queue and launches one worker per GPU; workers claim tasks under a scheduler lock with GPU reservation files.

`small_first` priority strategy

Base task priorities (queue manager) are combined with dynamic boosts in scheduler.sh (model size, blocked dependents, age, and fairness penalties).

Priority bands mapping evidence-pack task types to base scheduler priority values.

Dynamic boosts (scheduler):

Model size boosts: <30GB (+30), <70GB (+20), <100GB (+10).
Critical tasks: SETUP_BASELINE (+50), CALIBRATION_RUN (+20).
Unblock boost: +2 per dependent task (capped).
Age boost: +1 per 5 minutes in the queue (capped).
Fairness penalty: -3 per running task for the same model (capped).
Work-stealing boost: raises priority for lagging models.

Dynamic scheduling diagram

Scheduler flow from run_pack through run_suite, queue initialization, worker launch, and monitor loop.

Work-stealing timeline (illustrative)

GPU work-stealing timeline showing smaller jobs finishing early and helping with larger jobs.

Illustrative only; actual scheduling depends on queue state and memory.

Multi-GPU Model Distribution

After baseline setup, the suite writes model_profile.json and updates per-task memory estimates. task_serialization.sh calculates required_gpus based on GPU_MEMORY_PER_DEVICE and NUM_GPUS:

Tasks reserve multiple GPUs only when memory exceeds per-device capacity.
Adaptive under-allocation is disabled by default (get_minimum_gpus matches required_gpus) to avoid OOM.
Set GPU_MEMORY_PER_DEVICE explicitly for non-80/180GB hardware.

Memory-aware selection example

Memory-fit decision example showing ready-queue scanning against free GPU memory.

GPU reservation protection

Reservations are stored under OUTPUT_DIR/workers/gpu_reservations/ and guarded by a queue/scheduler.lock (mkdir-based). The scheduler also expires stale reservations by TTL (GPU_RESERVATION_TTL).

Reservation state example

GPU reservation state showing free and reserved devices for multi-GPU task claims.

Reservation file layout for scheduler locks and GPU reservation metadata.

Task lifecycle

GPU worker loop

Batch optimizations

Small/medium models default to batch edit creation:

Batch edit creation: CREATE_EDITS_BATCH loads a model once and creates all 8 edits (cuts repeated model loads).

Large or MoE models disable batch edits automatically (or via PACK_USE_BATCH_EDITS=false) and fall back to per-edit tasks (CREATE_EDIT → evaluate_EDIT).

Task dependency graphs

Batch (default):

Batch dependency graph from baseline setup into calibration, preset generation, edit, and error evaluations.

Notes:

Error injection tasks (CREATE_ERROR → evaluate_ERROR) branch off SETUP_BASELINE and require the preset for evaluation.

Per-edit path (large/MoE or PACK_USE_BATCH_EDITS=false):

Per-edit dependency graph from baseline setup into edit and error evaluation tasks.

Task breakdown per model (defaults)

Defaults: DRIFT_CALIBRATION_RUNS=5, CLEAN_EDIT_RUNS=3, STRESS_EDIT_RUNS=2, RUN_ERROR_INJECTION=true.

Batch path (default for small/medium):

Setup baseline: 1 task
Preset-derivation runs + preset generation: 6 tasks
Batch edits: 1 task
evaluate edits: 20 tasks
Error injection: 10 tasks

Total: ~38 tasks/model (varies with overrides).

Per-edit path (large/MoE or PACK_USE_BATCH_EDITS=false):

Setup baseline: 1 task
Preset-derivation runs + preset generation: 6 tasks
Create edits: 8 tasks
evaluate edits: 20 tasks
Error injection: 10 tasks

Total: ~45 tasks/model (varies with overrides).

Execution phases

Run directory layout

Output directory layout for evidence-pack analysis, reports, and final verdict artifacts.

Some scenarios emit additional sidecar artifacts alongside evaluation.report.json (for example reports/errors/rmt_norm_noise/rmt_probe.json or reports/errors/ve_mlp_scale_skew/ve_probe.json). When present, run_pack.sh copies these sidecars into the packaged evidence pack under reports/**/.

Run modes

--calibrate-only / PACK_SUITE_MODE=calibrate-only
- Preset derivation only mode.
- Only promotes SETUP_BASELINE, CALIBRATION_RUN, and GENERATE_PRESET tasks.
- The monitor exits after all GENERATE_PRESET tasks complete.
--run-only
- Continue a prior run after preset derivation. This is effectively --resume with PACK_SUITE_MODE=full.
--resume
- Reuses an existing queue and continues from where the run stopped.

Determinism vs throughput

PACK_DETERMINISM controls harness-level determinism:

# Throughput (default)
PACK_DETERMINISM=throughput ./scripts/evidence_packs/run_suite.sh --suite subset

# Strict
PACK_DETERMINISM=strict ./scripts/evidence_packs/run_suite.sh --suite subset

Throughput: NVIDIA_TF32_OVERRIDE=1, CUDNN_BENCHMARK=1.
Strict: NVIDIA_TF32_OVERRIDE=0, CUDNN_BENCHMARK=0, CUBLAS_WORKSPACE_CONFIG=:4096:8.

Network mode and model revisions

Evidence packs are offline by default:

PACK_NET=0 sets INVARLOCK_ALLOW_NETWORK=0 and enables HF offline modes.
PACK_NET=1 enables downloads and writes state/model_revisions.json (ungated models only).
Offline runs require model_revisions.json; missing revisions trigger a hard error during SETUP_BASELINE.

Use PACK_MODEL_REVISIONS_FILE to override the revisions path.

Disk and cache behavior

Large runs can be storage-heavy (baseline + edits + error models):

Disk preflight estimates required storage and aborts early when insufficient.
- Override with PACK_SKIP_DISK_PREFLIGHT=1 (not recommended).
- The minimum free space guard is MIN_FREE_DISK_GB (default 200).
PACK_BASELINE_STORAGE_MODE=snapshot_symlink builds a local symlink tree that points into the Hugging Face cache snapshot. This avoids a second baseline copy under OUTPUT_DIR, but it requires one full model copy in HF_HUB_CACHE when that cache shares the output filesystem.
PACK_BASELINE_STORAGE_MODE=snapshot_copy materializes a full baseline copy under OUTPUT_DIR/models/<model>/baseline.
Baseline downloads prefer one weight format only. When both .safetensors and .bin weights are published, evidence packs download the safetensors set and ignore the .bin copy.
HF caches default to OUTPUT_DIR/.hf (override with HF_HOME, HF_HUB_CACHE, HF_DATASETS_CACHE).

For the default subset suite (mistralai/Mistral-7B-v0.1), the model-weight budget is roughly:

~42 GB on the output filesystem with snapshot_symlink when HF_HUB_CACHE lives on the same filesystem as OUTPUT_DIR (one cached baseline + one clean edit peak + one error-model peak under cleanup mode).
~28 GB on the output filesystem with snapshot_symlink when HF_HUB_CACHE is on a separate volume.
~56 GB on the output filesystem with snapshot_copy on the same filesystem.

Those figures are for model weights only; the default preflight also requires MIN_FREE_DISK_GB=200 headroom.

Evidence pack packaging and verification

run_pack.sh builds a portable pack:

Copies reports/final_verdict.{txt,json} plus verdict sidecars (category_summary, guard_signal_summary, scenario_signal_summary) and key analysis/* artifacts.
Collects all reports into evidence_pack/reports/....
Generates manifest.json, checksums.sha256, optional manifest.signature.json.
Writes pack-contained provenance metadata such as metadata/source_repo.json and metadata/environment.json before sealing the pack.
Stages the pack in a hidden sibling temporary directory and renames it into place only after sealing succeeds, so failed builds do not leave partial evidence_pack/ output behind.
Optional HTML export can be disabled with PACK_SKIP_HTML=1.

Packaging flow

invarlock advanced evidence-pack verify checks the pack:

Verifies manifest.json binds checksums.sha256 via checksums_sha256_digest.
Verifies digest-backed manifest references (subject, invocation.config_source, environment, and materials) against on-pack files.
Verifies checksums.sha256 (and thus all hashed artifacts).
Verifies the package-native Ed25519 signature bundle when present; --strict requires it.
Enforces “no extra files” semantics in --strict mode.
Runs invarlock verify across all bundled reports (JSON output optional) with runtime-manifest enforcement on; each packaged evaluation.report.json carries an adjacent runtime.manifest.json.
Returns structured exit codes so callers can distinguish usage, missing-file, manifest-format, signature, integrity, and report-verification failures.

The installed-wheel package-native CLI is self-contained:

invarlock advanced evidence-pack keygen generates Ed25519 signing keys.
invarlock advanced evidence-pack build --signing-key ... emits manifest.signature.json.
invarlock advanced evidence-pack verify validates the signature bundle in-process and does not depend on external signature binaries.

The repo shell harness remains a separate maintainer path, but it uses the same package-native Ed25519 manifest-signature format as the installed CLI.

Maintainer evidence-pack packaging also treats source provenance as fail-closed:

run_pack.sh writes metadata/source_repo.json from the active Git checkout.
If git is unavailable or the repository metadata cannot be collected, pack creation stops instead of silently emitting partial provenance.
If you need to package from a detached artifact tree, write a complete metadata/source_repo.json first rather than relying on fallback inference.

Remote setup helper

scripts/evidence_packs/lib/setup_remote.sh is an optional bootstrap script for fresh GPU hosts. It clones the repo, creates a venv, installs PyTorch and InvarLock, and leaves the host ready to run run_pack.sh.

Operational guidance for remote evidence-pack work:

Prefer a fresh clone or work tree per campaign instead of reusing an older editable-install checkout.
If you intentionally run from a work tree that is not the editable install behind .venv, either reinstall that work tree or export PYTHONPATH=src so invarlock resolves to the intended source tree.
run_suite.sh and run_pack.sh default to SKIP_FLASH_ATTN=true and PACK_BASELINE_STORAGE_MODE=snapshot_copy for bulk default runtime-container runs.
Bulk evidence-pack runs fail fast unless INVARLOCK_ALLOW_REMOTE_CODE=1 is set.
Export non-default runtime roots before launching the suite when you expect them inside delegated container jobs: INVARLOCK_CONFIG_ROOT, HF_HOME, HF_HUB_CACHE, HF_DATASETS_CACHE, TRANSFORMERS_CACHE, TMPDIR, TMP.
If a staged preset or profile uses !include outside its config directory, set INVARLOCK_ALLOW_CONFIG_INCLUDE_OUTSIDE=1 on the remote host before the evidence-pack entrypoint; the default runtime-container launcher rejects that config graph before container start when the override is missing.
After Qwen2.5-14B campaigns, run scripts/evidence_packs/run_qwen14_sentinels.sh from the same fresh work tree to validate saved-model direct evaluate and the public quant smoke.

Recommended remote validation checklist after security-default changes:

Run an evidence-pack subset lane with explicit external HF_HOME and INVARLOCK_CONFIG_ROOT overrides.
Run one delegated invarlock evaluate with external --edit-config, TMPDIR, and INVARLOCK_EXPORT_DIR roots.
Run one scripts/model_evidence_sweep.py --execution-mode container lane with an external output root and confirm the published report path is populated.

Common knobs for the setup script:

REPO_DIR, REPO_URL, BRANCH, PYTHON_BIN, VENV_DIR.
TORCH_INDEX_URL, TORCH_PACKAGES, PACK_SKIP_TORCH_CHECK.
HF_HOME, HF_HUB_CACHE, HF_DATASETS_CACHE.

Tuning reference

Core configuration

Variable	Default	Description
`PACK_SUITE`	`subset`	Suite name (`subset` or `full`)
`PACK_NET`	`0`	Enable network preflight/downloads
`PACK_OUTPUT_DIR`	unset	Sets `OUTPUT_DIR` when provided
`OUTPUT_DIR`	auto	`./evidence_pack_runs/<suite>_<timestamp>` via entrypoint
`PACK_OUTPUT_DIR_ABSOLUTE`	`false`	Normalize `OUTPUT_DIR` to absolute path
`PACK_SUITE_MODE`	`full`	`full`, `calibrate-only`, or `run-only`
`PACK_DETERMINISM`	`throughput`	Harness determinism mode
`PACK_REPEATS`	`0`	Determinism repeat metadata
`PACK_MODEL_REVISIONS_FILE`	`OUTPUT_DIR/state/model_revisions.json`	Revisions path
`PACK_USE_BATCH_EDITS`	`auto`	Force/disable batch edit creation
`RESUME_MODE`	`true`	Skip completed steps when outputs exist

Hardware selection

Variable	Default	Description
`CUDA_VISIBLE_DEVICES`	unset	Explicit GPU pool (comma-separated)
`GPU_ID_LIST`	unset	Alternate GPU pool list
`NUM_GPUS`	auto	Number of GPUs to use (clamped to pool)
`GPU_MEMORY_GB`	auto	Per-GPU memory hint for planning
`GPU_MEMORY_PER_DEVICE`	`GPU_MEMORY_GB`	Per-device memory for `required_gpus`
`GPU_MIN_FREE_GB`	`10`	Minimum free VRAM for eligibility
`GPU_REQUIRE_IDLE`	`true`	Require GPUs with no compute processes
`GPU_CACHE_TTL`	`5`	GPU cache TTL (seconds)
`GPU_RESERVATION_TTL`	`60`	Reservation TTL (seconds)
`GPU_RESERVATION_LOCK_TIMEOUT`	`5`	Reservation lock timeout (seconds)

Model overrides

Variable	Default	Description
`MODEL_1`–`MODEL_8`	suite-defined	Override model slots; empty disables

InvarLock settings

Variable	Default	Description
`INVARLOCK_DATASET`	`wikitext2`	Dataset provider
`INVARLOCK_DATASET_PROVIDER_YAML`	unset	Raw YAML mapping for `dataset.provider` (advanced; overrides provider kind + args)
`INVARLOCK_DATASET_PROVIDER_JSON`	unset	Raw JSON object for `dataset.provider` (advanced; overrides provider kind + args)
`INVARLOCK_HF_DATASET_NAME`	`allenai/c4`	HF dataset name when `INVARLOCK_DATASET=hf_text`
`INVARLOCK_HF_CONFIG_NAME`	`en` (for `allenai/c4`)	HF dataset config when `INVARLOCK_DATASET=hf_text`
`INVARLOCK_HF_TEXT_FIELD`	`text`	Text field when `INVARLOCK_DATASET=hf_text`
`INVARLOCK_HF_MAX_SAMPLES`	`2000`	Max rows consumed when `INVARLOCK_DATASET=hf_text`
`INVARLOCK_HF_TRUST_REMOTE_CODE`	unset	Pass `trust_remote_code` to HF `load_dataset` (not needed for `allenai/c4` Parquet)
`INVARLOCK_HF_CACHE_DIR`	unset	`datasets` cache_dir override when `INVARLOCK_DATASET=hf_text`
`INVARLOCK_LOCAL_JSONL_FILE`	unset	JSONL file path when `INVARLOCK_DATASET=local_jsonl`
`INVARLOCK_LOCAL_JSONL_PATH`	unset	JSONL file/dir path when `INVARLOCK_DATASET=local_jsonl`
`INVARLOCK_LOCAL_JSONL_DATA_FILES`	unset	JSONL glob/list when `INVARLOCK_DATASET=local_jsonl`
`INVARLOCK_LOCAL_JSONL_TEXT_FIELD`	`text`	Text field when `INVARLOCK_DATASET=local_jsonl`
`INVARLOCK_LOCAL_JSONL_MAX_SAMPLES`	`2000`	Max rows consumed when `INVARLOCK_DATASET=local_jsonl`
`INVARLOCK_TIER`	`balanced`	Guard tier preset
`INVARLOCK_PREVIEW_WINDOWS`	`32`	Preview windows
`INVARLOCK_FINAL_WINDOWS`	`32`	Final windows
`INVARLOCK_SEQ_LEN`	`512`	Sequence length
`INVARLOCK_STRIDE`	`256`	Stride
`INVARLOCK_EVAL_BATCH`	`32`	InvarLock batch size
`PACK_GUARDS_ORDER`	`invariants,spectral,rmt,variance,invariants`	Guards included in preset derivation and generated presets

Primary metric acceptance/drift gates should be configured via profile/config (primary_metric.acceptance_range, primary_metric.drift_band), not env vars.

Tuned edit presets

Variable	Default	Description
`PACK_TUNED_EDIT_PARAMS_FILE`	unset	JSON file with tuned clean edit params (required when `CLEAN_EDIT_RUNS>0`).

Preset derivation reuse

Variable	Default	Description
`PACK_CALIBRATION_PRESET_DIR`	unset	Directory containing `calibrated_preset_<model>.yaml/json` to reuse; skips preset-derivation runs.
`PACK_CALIBRATION_PRESET_FILE`	unset	Single preset file applied to all models (advanced).

Experiment controls

Variable	Default	Description
`DRIFT_CALIBRATION_RUNS`	`5`	Preset-derivation run count
`CLEAN_EDIT_RUNS`	`3`	Clean edit evaluate runs
`STRESS_EDIT_RUNS`	`2`	Stress edit evaluate runs
`RUN_ERROR_INJECTION`	`true`	Enable error injection

Storage and memory planning

Variable	Default	Description
`PACK_BASELINE_STORAGE_MODE`	`snapshot_symlink`	Baseline storage mode (`snapshot_symlink`, `snapshot_copy`, or `save_pretrained`)
`MIN_FREE_DISK_GB`	`200`	Disk pressure threshold
`PACK_SKIP_DISK_PREFLIGHT`	`0`	Skip storage preflight
`CUDA_MEMORY_FRACTION`	`0.92`	Target GPU memory fraction
`MODEL_LOAD_OVERHEAD_GB`	`4`	Load overhead for planning
`EDIT_OVERHEAD_GB`	`8`	Per-edit overhead for planning
`BATCH_EDIT_OVERHEAD_GB`	`8`	Batch edit overhead
`INVARLOCK_OVERHEAD_GB`	`6`	InvarLock overhead

Worker + reliability controls

Variable	Default	Description
`WORKER_HEARTBEAT_INTERVAL`	`30`	Heartbeat interval (seconds)
`WORKER_IDLE_SLEEP`	`5`	Sleep when idle (seconds)
`WORKER_MAX_FAILURES`	`10`	Stop worker after N failures
`WORKER_TIMEOUT`	`2700`	Worker heartbeat timeout (seconds)
`CANCEL_BLOCKED_TASKS_GRACE_SECONDS`	`90`	Fail blocked tasks after grace
`TASK_TIMEOUT_DEFAULT`	`21600`	Default task timeout (seconds)
`TASK_TIMEOUT_<TASKTYPE>`	unset	Per-task timeout override

Packaging and verification

Variable	Default	Description
`PACK_DIR`	`OUTPUT_DIR/evidence_pack`	Evidence pack output dir
`PACK_SIGN_MANIFEST`	`1`	Sign `manifest.json` with a package-native Ed25519 key (auto-generated if `PACK_SIGNING_KEY` is unset)
`PACK_SIGNING_KEY`	unset	Optional Ed25519 private key PEM for deterministic signer identity
`PACK_SKIP_HTML`	`0`	Skip HTML rendering
`PACK_VERIFY_PROFILE`	`dev`	Profile for `invarlock verify`

Troubleshooting

Missing model revisions (offline)

If offline runs fail with “requires model revisions”, run a preflight:

./scripts/evidence_packs/run_suite.sh --suite subset --net 1

Or point to an existing revisions file with PACK_MODEL_REVISIONS_FILE.

OOM on large models

Lower GPU_MEMORY_PER_DEVICE so the planner requests more GPUs.
Disable batch edits: PACK_USE_BATCH_EDITS=false.
Reduce InvarLock batch/seq_len (e.g., INVARLOCK_EVAL_BATCH=16 INVARLOCK_SEQ_LEN=256).
Increase memory overhead knobs (MODEL_LOAD_OVERHEAD_GB, EDIT_OVERHEAD_GB).

Disk pressure / preflight failures

Check state/disk_pressure.json and ensure the output filesystem has headroom. Use MIN_FREE_DISK_GB=0 or PACK_SKIP_DISK_PREFLIGHT=1 only if you accept risk of partial artifacts.

Task timeouts

Increase the default or per-task timeout:

TASK_TIMEOUT_DEFAULT=28800 ./scripts/evidence_packs/run_suite.sh --suite subset
TASK_TIMEOUT_CREATE_EDIT=28800 ./scripts/evidence_packs/run_suite.sh --suite subset

Stuck queues or dead workers

Inspect state/progress.json and workers/gpu_<id>.status.
Check worker logs: logs/gpu_<id>.log and logs/tasks/<task_id>.log.
Re-run with --resume to recover from a crash.