Glossary

Plain language: When reading a report or debugging a pipeline, use this glossary to understand what each field means and where the term originated.

Overview

Aspect	Details
Purpose	Define key assurance, report, guard, data, policy, and provenance terms in one reference.
Audience	Report readers, contributors, reviewers, and operators following cross-document terminology.
Contract scope	Terminology only; runtime behavior is governed by the linked assurance docs and reference pages.
Source of truth	This glossary, linked assurance notes, report schemas, and public contract data under `src/invarlock/_data/contracts/`.

TL;DR: This glossary defines key terms used across InvarLock documentation, reports, and code. Terms are grouped by domain (metrics, guards, data, provenance) for quick reference. Each entry includes a definition, context, and cross-references to relevant assurance documents.

Quick Reference Tables

Primary Metric Terms

Term	Short Definition	Report Field
Primary Metric	Canonical task metric for gating (ppl or accuracy)	`primary_metric.*`
BCa Bootstrap	Bias-corrected accelerated bootstrap for CIs	`primary_metric.ci`, `primary_metric.reps`
Ratio vs Baseline	PPL-like subject final ÷ baseline final (`>1` is worse); accuracy gates use baseline delta	`primary_metric.ratio_vs_baseline`
Primary Metric Tail	Tail regression gate (ΔlogNLL at q95)	`primary_metric_tail.*`

Guard Terms

Term	Short Definition	Report Field
Canonical Guard Chain	invariants (pre) → spectral → RMT → variance → invariants (post)	`assurance.{guard_chain_observed,canonical_guard_chain}`, `validation.{invariants_pass,spectral_stable,rmt_stable}`
κ (kappa) Threshold	Per-family spectral cap for z-score outliers	`spectral.family_caps.*.kappa`
ε (epsilon) Band	RMT acceptance threshold for edge-risk	`rmt.epsilon_by_family.*`
Guard Overhead	Primary-metric impact of guarded evaluation vs a bare control run	`guard_overhead.*`
Measurement Contract	Estimator + sampling policy recorded in reports	`spectral.measurement_contract_hash`

Data Terms

Term	Short Definition	Report Field
Window Pairing	Aligning baseline and subject eval windows	`dataset.windows.stats.paired_windows`
Provider Digest	Hash of dataset identity (ids/tokenizer/masking)	`provenance.provider_digest`
Tokenizer Hash	Stable hash of tokenizer settings	`meta.tokenizer_hash`

Policy Terms

Term	Short Definition	Report Field
Tier Policy	Guard threshold preset (conservative/balanced/aggressive)	`auto.tier`
Policy Digest	Stable hash of resolved policy thresholds	`policy_digest.thresholds_hash`

Detailed Definitions

A–B

Baseline

The unedited reference model run used for comparison and gating.

Aspect	Details
Context	`baseline` report in Compare & evaluate workflow
Related terms	Subject Run, Window Pairing, Evaluation Report
report fields	`provenance.baseline.`, `baseline_ref.`
See also	Compare & evaluate

Example: invarlock evaluate --baseline gpt2 --subject gpt2-quant This follows the default runtime-container path unless a host-side, non-assurance workflow uses --execution-mode host.

BCa Bootstrap

Bias-corrected and accelerated bootstrap method for estimating confidence intervals.

Aspect	Details
Context	Applied to paired log-loss deltas for primary metric gating
Related terms	Primary Metric, Window Pairing, Confidence Interval
report fields	`primary_metric.ci`, `primary_metric.reps`, `dataset.windows.stats.bootstrap`
See also	BCa Bootstrap Derivation

Example: BCa bootstrap with 2000 replicates produces a log-space ci: [-0.005, 0.008] on paired ΔlogNLL, then exponentiates it to display_ci: [0.995, 1.008] for the ratio view.

C–D

Evaluation Report

Structured evidence artifact summarizing an evaluation run and its validation status.

Aspect	Details
Context	Generated by `invarlock evaluate` or `invarlock report generate --format report`
Related terms	Run Report, Evidence Pack, Evaluation Bundle, Manifest
report fields	`schema_version`, `run_id`, `validation.`, `artifacts.`
See also	Reports Reference

Example: evaluation.report.json with schema_version: v1 and validation.primary_metric_acceptable: true

Compare & evaluate (BYOE)

Workflow that compares a subject model to a baseline, optionally with an external edit (Bring Your Own Edit).

Aspect	Details
Context	`invarlock evaluate --baseline ... --subject ...`
Related terms	Baseline, Subject Run, Evaluation Report
report fields	`provenance.baseline.`, `provenance.edited.`
See also	Compare & evaluate Guide

Example: BYOE workflow evaluates an externally edited checkpoint against its unmodified baseline.

E–G

Evidence Pack / Evaluation Bundle

Set of files produced for audit. An evidence pack is the portable signed/checksummed package shape; an evaluation bundle is the local output directory containing reports, runtime-provenance sidecars, events, or derived renderings.

Aspect	Details
Context	Output directory from `invarlock evaluate` or `invarlock report generate --format report`
Related terms	Run Report, Evaluation Report, Runtime Manifest
Typical contents	`evaluation.report.json`, `evaluation_report.md`, `runtime.manifest.json`
See also	Artifact Layout

Canonical Guard Chain

The default guard chain is invariants (pre) → spectral → RMT → variance → invariants (post).

Aspect	Details
Context	Core guard checks in `evaluate` and internal config-runner flows
Canonical order	`invariants` (pre), `spectral`, `rmt`, `variance`, `invariants` (post)
Related terms	Guard Chain, Guard Overhead
See also	Guards Reference

Enforcement: Guards execute in canonical order for reproducibility; strict reports can record assurance.guard_chain_observed, assurance.canonical_guard_chain, and assurance.canonical_guard_chain_enforced. Guard outcomes are also recorded in validation.invariants_pass, validation.spectral_stable, and validation.rmt_stable.

Guard Chain (Canonical Order)

Fixed execution order for guard preparation and evaluation ensuring deterministic, auditable outcomes.

Aspect	Details
Context	Canonical default enforced by strict assurance; config may request order, but strict reports must match the canonical chain
Related terms	Guard Chain (Canonical Order), Guard Overhead
report fields	`assurance.{canonical_guard_chain,guard_chain_observed,canonical_guard_chain_enforced}`
See also	Guards Reference

Guard Overhead

Primary-metric impact of guard checks vs a bare control run (no guards). This is not a wall-clock latency benchmark.

Aspect	Details
Context	Measured in Release profile; gate requires ≤ +1.0% PM overhead
Related terms	Canonical Guard Chain, Timing Summary
report fields	`guard_overhead.{bare_ppl,guarded_ppl,overhead_ratio,overhead_percent}`
See also	Guard Overhead Method

Example: overhead_percent: +0.12% indicates guards add 0.12% to primary metric.

K–M

κ (kappa) Threshold

Per-family spectral cap used to flag abnormally high z-scores.

Aspect	Details
Context	`spectral.family_caps.*.kappa` in tier policy
Typical values	ffn: 3.85, attn: 3.02, embed: 1.05 (Balanced tier)
Related terms	Spectral Cap, z-score, Spectral Guard
See also	Spectral FPR Derivation

Example: kappa=2.8 for attention family means z-scores > 2.8 are flagged.

Measurement Contract

Guard measurement procedure signature and digest recorded in reports.

Aspect	Details
Context	Spectral and RMT guards record estimator + sampling policy
Verified by	`invarlock verify --profile ci\|release` (plus `runtime.manifest.json` runtime provenance for container-backed outputs)
report fields	`spectral.{measurement_contract_hash,baseline_measurement_contract_hash,measurement_contract_match}`, `rmt.{measurement_contract_hash,baseline_measurement_contract_hash,measurement_contract_match}`
See also	Guard Contracts

Enforcement: CI/Release profiles require measurement contract match between baseline and subject.

P–R

Policy Digest

Stable hash summarizing resolved policy thresholds for auditability.

Aspect	Details
Context	Stored in report for policy change detection
Related terms	Tier Policy, Policy Overrides, Policy Provenance
report fields	`policy_digest.thresholds_hash`, `policy_provenance.*`, `auto.policy_digest`
See also	Policy Provenance

Primary Metric

The canonical task metric used for gating (perplexity for LMs, accuracy for classification).

Aspect	Details
Supported kinds	`accuracy`, `bleu`, `f1`, `ppl_causal`, `ppl_mlm`, `ppl_seq2seq`, `rouge`
Gating logic	PPL-like kinds gate on ratio vs baseline; accuracy gates on percentage-point delta vs baseline
Related terms	Primary Metric Tail, BCa Bootstrap, Window Pairing
report fields	`primary_metric.{kind,preview,final,ratio_vs_baseline,ci}`
See also	Reports Reference

Example: primary_metric.kind: ppl_causal with ratio_vs_baseline: 1.003

Primary Metric Tail

Optional tail regression gate checking high-loss windows (e.g., q95 ΔlogNLL).

Aspect	Details
Context	Catches regression in hard examples even when mean is acceptable
Mode	`warn` (default) or `fail`
Related terms	Primary Metric, BCa Bootstrap
report fields	`primary_metric_tail.{evaluated,passed,warned,stats}`
See also	Reports Reference

Provider Digest

Dataset identity hash covering token IDs, tokenizer config, and masking strategy.

Aspect	Details
Context	Ensures baseline and subject use identical data
Related terms	Window Pairing, Tokenizer Hash
report fields	`provenance.provider_digest.ids_sha256`
See also	Coverage & Pairing

Run Report

Run-level artifact with metrics, guard results, and metadata.

Aspect	Details
Context	Generated by `invarlock evaluate`; input to report generation
Related terms	Evaluation Report, Evidence Pack, Evaluation Bundle
File format	`report.json` + `events.jsonl`
See also	Artifact Layout

RMT ε (epsilon) Rule

Random Matrix Theory epsilon band used for activation edge-risk stability checks.

Aspect	Details
Context	`rmt.epsilon_default` and `rmt.epsilon_by_family.*` thresholds
Calibration	Derived from null-sweep runs on target model families
Related terms	RMT Guard, κ Threshold
report fields	`rmt.{epsilon_default,epsilon_by_family,epsilon_violations,stable,status,max_edge_ratio,max_edge_delta}`
See also	RMT ε Rule

RMT Guard

Guard that checks eigenvalue statistics against Random Matrix Theory bounds.

Aspect	Details
Focus	Activation edge-risk growth across model families
Validation	`validation.rmt_stable`
Related terms	Canonical Guard Chain, RMT ε Rule
report fields	`rmt.{families,stable,status,max_edge_ratio,max_edge_delta}`
See also	Guards Reference

S–T

Spectral Cap

Limit on spectral z-scores per family to flag weight instability.

Aspect	Details
Context	Applied by spectral guard; counts violations per family
Related terms	κ Threshold, z-score, Spectral Guard
report fields	`spectral.{caps_applied,caps_exceeded,top_z_scores}`
See also	Spectral FPR

Spectral Guard

Guard that monitors spectral norms and z-scores for weight matrices.

Aspect	Details
Focus	Baseline-relative weight matrix stability
Validation	`validation.spectral_stable`
Related terms	Canonical Guard Chain, Spectral Cap, κ Threshold
report fields	`spectral.{caps_applied,family_caps,top_z_scores,summary}`
See also	Guards Reference

Subject Run

The edited or target model run under evaluation (compared against baseline).

Aspect	Details
Context	`subject` checkpoint in Compare & evaluate
Related terms	Baseline, Evaluation Report, Window Pairing
report fields	`provenance.edited.*`
See also	Compare & evaluate

Telemetry

Performance and resource metrics emitted with reports.

Aspect	Details
Context	Optional fields for performance analysis
Related terms	Timing Summary, Guard Overhead
report fields	`telemetry.*`, `metrics.memory_mb_peak`
See also	Observability

Tier Policy

Guard threshold preset selecting the tier profile for a run.

Aspect	Details
Options	`conservative` (strictest), `balanced` (default), `aggressive` (loosest)
Source	Packaged `runtime/tiers.yaml`; overrides use `INVARLOCK_CONFIG_ROOT/runtime/tiers.yaml`
Related terms	Policy Digest, Policy Overrides
report fields	`auto.tier`, `resolved_policy.*`
See also	Tier Policy Catalog

Timing Summary

Consolidated timing breakdown for an evaluation run.

Aspect	Details
Context	CLI output via `print_timing_summary()`
Includes	Model load, dataset load, evaluation, report generation
Related terms	Guard Overhead, Telemetry
See also	Observability

Tokenizer Hash

Stable hash of tokenizer settings and vocabulary for reproducibility.

Aspect	Details
Context	Ensures baseline and subject use identical tokenization
Related terms	Provider Digest, Window Pairing
report fields	`meta.tokenizer_hash`
See also	Determinism Contracts

V–Z

Variance Effect (VE)

Guard that tracks variance change and applies equalization when beneficial.

Aspect	Details
Context	A/B test compares bare vs VE-enabled evaluation
Enabling condition	Predictive CI upper bound ≤ -min_effect_lognll and mean Δ ≤ -min_effect_lognll; enabled VE also requires A/B provenance
Related terms	Canonical Guard Chain, Guard Overhead, Predictive Gate
report fields	`variance.{enabled,gain,predictive_gate.delta_ci,predictive_gate.passed,ab_test}`
See also	VE Gate Power

Window Pairing

Alignment of baseline and subject evaluation windows for paired statistical testing.

Aspect	Details
Requirements	Same window IDs, zero overlap, 100% match fraction
Violation	`E001` pairing error in CI/Release profiles
Related terms	BCa Bootstrap, Primary Metric, Provider Digest
report fields	`dataset.windows.stats.{paired_windows,window_match_fraction,window_overlap_fraction}`
See also	Coverage & Pairing

Example: paired_windows: 200, window_match_fraction: 1.0, window_overlap_fraction: 0.0

z-score

Standardized deviation used in spectral guard scoring.

Aspect	Details
Formula	`z = (ŝ - μ_f) / σ_f` for the guard's measured spectral statistic
Thresholding	Compared against family-specific κ caps
Related terms	Spectral Cap, κ Threshold
report fields	`spectral.top_z_scores`, `spectral.family_caps.*.kappa`
See also	Spectral FPR

Example: max |z| = 2.1 indicates the largest z-score across all weight matrices.