Glossary

Plain language: When reading a report or debugging a pipeline, use this glossary to understand what each field means and where the term originated.

Overview

AspectDetails
PurposeDefine key assurance, report, guard, data, policy, and provenance terms in one reference.
AudienceReport readers, contributors, reviewers, and operators following cross-document terminology.
Contract scopeTerminology only; runtime behavior is governed by the linked assurance docs and reference pages.
Source of truthThis glossary, linked assurance notes, report schemas, and public contract data under src/invarlock/_data/contracts/.

TL;DR: This glossary defines key terms used across InvarLock documentation, reports, and code. Terms are grouped by domain (metrics, guards, data, provenance) for quick reference. Each entry includes a definition, context, and cross-references to relevant assurance documents.

Quick Reference Tables

Primary Metric Terms

TermShort DefinitionReport Field
Primary MetricCanonical task metric for gating (ppl or accuracy)primary_metric.*
BCa BootstrapBias-corrected accelerated bootstrap for CIsprimary_metric.ci, primary_metric.reps
Ratio vs BaselinePPL-like subject final ÷ baseline final (>1 is worse); accuracy gates use baseline deltaprimary_metric.ratio_vs_baseline
Primary Metric TailTail regression gate (ΔlogNLL at q95)primary_metric_tail.*

Guard Terms

TermShort DefinitionReport Field
Canonical Guard Chaininvariants (pre) → spectral → RMT → variance → invariants (post)assurance.{guard_chain_observed,canonical_guard_chain}, validation.{invariants_pass,spectral_stable,rmt_stable}
κ (kappa) ThresholdPer-family spectral cap for z-score outliersspectral.family_caps.*.kappa
ε (epsilon) BandRMT acceptance threshold for edge-riskrmt.epsilon_by_family.*
Guard OverheadPrimary-metric impact of guarded evaluation vs a bare control runguard_overhead.*
Measurement ContractEstimator + sampling policy recorded in reportsspectral.measurement_contract_hash

Data Terms

TermShort DefinitionReport Field
Window PairingAligning baseline and subject eval windowsdataset.windows.stats.paired_windows
Provider DigestHash of dataset identity (ids/tokenizer/masking)provenance.provider_digest
Tokenizer HashStable hash of tokenizer settingsmeta.tokenizer_hash

Policy Terms

TermShort DefinitionReport Field
Tier PolicyGuard threshold preset (conservative/balanced/aggressive)auto.tier
Policy DigestStable hash of resolved policy thresholdspolicy_digest.thresholds_hash

Detailed Definitions

A–B

Baseline

The unedited reference model run used for comparison and gating.

AspectDetails
Contextbaseline report in Compare & evaluate workflow
Related termsSubject Run, Window Pairing, Evaluation Report
report fieldsprovenance.baseline.*, baseline_ref.*
See alsoCompare & evaluate

Example: invarlock evaluate --baseline gpt2 --subject gpt2-quant This follows the default runtime-container path unless a host-side, non-assurance workflow uses --execution-mode host.


BCa Bootstrap

Bias-corrected and accelerated bootstrap method for estimating confidence intervals.

AspectDetails
ContextApplied to paired log-loss deltas for primary metric gating
Related termsPrimary Metric, Window Pairing, Confidence Interval
report fieldsprimary_metric.ci, primary_metric.reps, dataset.windows.stats.bootstrap
See alsoBCa Bootstrap Derivation

Example: BCa bootstrap with 2000 replicates produces a log-space ci: [-0.005, 0.008] on paired ΔlogNLL, then exponentiates it to display_ci: [0.995, 1.008] for the ratio view.


C–D

Evaluation Report

Structured evidence artifact summarizing an evaluation run and its validation status.

AspectDetails
ContextGenerated by invarlock evaluate or invarlock report generate --format report
Related termsRun Report, Evidence Pack, Evaluation Bundle, Manifest
report fieldsschema_version, run_id, validation.*, artifacts.*
See alsoReports Reference

Example: evaluation.report.json with schema_version: v1 and validation.primary_metric_acceptable: true


Compare & evaluate (BYOE)

Workflow that compares a subject model to a baseline, optionally with an external edit (Bring Your Own Edit).

AspectDetails
Contextinvarlock evaluate --baseline ... --subject ...
Related termsBaseline, Subject Run, Evaluation Report
report fieldsprovenance.baseline.*, provenance.edited.*
See alsoCompare & evaluate Guide

Example: BYOE workflow evaluates an externally edited checkpoint against its unmodified baseline.


E–G

Evidence Pack / Evaluation Bundle

Set of files produced for audit. An evidence pack is the portable signed/checksummed package shape; an evaluation bundle is the local output directory containing reports, runtime-provenance sidecars, events, or derived renderings.

AspectDetails
ContextOutput directory from invarlock evaluate or invarlock report generate --format report
Related termsRun Report, Evaluation Report, Runtime Manifest
Typical contentsevaluation.report.json, evaluation_report.md, runtime.manifest.json
See alsoArtifact Layout

Canonical Guard Chain

The default guard chain is invariants (pre) → spectralRMTvarianceinvariants (post).

AspectDetails
ContextCore guard checks in evaluate and internal config-runner flows
Canonical orderinvariants (pre), spectral, rmt, variance, invariants (post)
Related termsGuard Chain, Guard Overhead
See alsoGuards Reference

Enforcement: Guards execute in canonical order for reproducibility; strict reports can record assurance.guard_chain_observed, assurance.canonical_guard_chain, and assurance.canonical_guard_chain_enforced. Guard outcomes are also recorded in validation.invariants_pass, validation.spectral_stable, and validation.rmt_stable.


Guard Chain (Canonical Order)

Fixed execution order for guard preparation and evaluation ensuring deterministic, auditable outcomes.

AspectDetails
ContextCanonical default enforced by strict assurance; config may request order, but strict reports must match the canonical chain
Related termsGuard Chain (Canonical Order), Guard Overhead
report fieldsassurance.{canonical_guard_chain,guard_chain_observed,canonical_guard_chain_enforced}
See alsoGuards Reference

Guard Overhead

Primary-metric impact of guard checks vs a bare control run (no guards). This is not a wall-clock latency benchmark.

AspectDetails
ContextMeasured in Release profile; gate requires ≤ +1.0% PM overhead
Related termsCanonical Guard Chain, Timing Summary
report fieldsguard_overhead.{bare_ppl,guarded_ppl,overhead_ratio,overhead_percent}
See alsoGuard Overhead Method

Example: overhead_percent: +0.12% indicates guards add 0.12% to primary metric.


K–M

κ (kappa) Threshold

Per-family spectral cap used to flag abnormally high z-scores.

AspectDetails
Contextspectral.family_caps.*.kappa in tier policy
Typical valuesffn: 3.85, attn: 3.02, embed: 1.05 (Balanced tier)
Related termsSpectral Cap, z-score, Spectral Guard
See alsoSpectral FPR Derivation

Example: kappa=2.8 for attention family means z-scores > 2.8 are flagged.


Measurement Contract

Guard measurement procedure signature and digest recorded in reports.

AspectDetails
ContextSpectral and RMT guards record estimator + sampling policy
Verified byinvarlock verify --profile ci|release (plus runtime.manifest.json runtime provenance for container-backed outputs)
report fieldsspectral.{measurement_contract_hash,baseline_measurement_contract_hash,measurement_contract_match}, rmt.{measurement_contract_hash,baseline_measurement_contract_hash,measurement_contract_match}
See alsoGuard Contracts

Enforcement: CI/Release profiles require measurement contract match between baseline and subject.


P–R

Policy Digest

Stable hash summarizing resolved policy thresholds for auditability.

AspectDetails
ContextStored in report for policy change detection
Related termsTier Policy, Policy Overrides, Policy Provenance
report fieldspolicy_digest.thresholds_hash, policy_provenance.*, auto.policy_digest
See alsoPolicy Provenance

Primary Metric

The canonical task metric used for gating (perplexity for LMs, accuracy for classification).

AspectDetails
Supported kindsaccuracy, bleu, f1, ppl_causal, ppl_mlm, ppl_seq2seq, rouge
Gating logicPPL-like kinds gate on ratio vs baseline; accuracy gates on percentage-point delta vs baseline
Related termsPrimary Metric Tail, BCa Bootstrap, Window Pairing
report fieldsprimary_metric.{kind,preview,final,ratio_vs_baseline,ci}
See alsoReports Reference

Example: primary_metric.kind: ppl_causal with ratio_vs_baseline: 1.003


Primary Metric Tail

Optional tail regression gate checking high-loss windows (e.g., q95 ΔlogNLL).

AspectDetails
ContextCatches regression in hard examples even when mean is acceptable
Modewarn (default) or fail
Related termsPrimary Metric, BCa Bootstrap
report fieldsprimary_metric_tail.{evaluated,passed,warned,stats}
See alsoReports Reference

Provider Digest

Dataset identity hash covering token IDs, tokenizer config, and masking strategy.

AspectDetails
ContextEnsures baseline and subject use identical data
Related termsWindow Pairing, Tokenizer Hash
report fieldsprovenance.provider_digest.ids_sha256
See alsoCoverage & Pairing

Run Report

Run-level artifact with metrics, guard results, and metadata.

AspectDetails
ContextGenerated by invarlock evaluate; input to report generation
Related termsEvaluation Report, Evidence Pack, Evaluation Bundle
File formatreport.json + events.jsonl
See alsoArtifact Layout

RMT ε (epsilon) Rule

Random Matrix Theory epsilon band used for activation edge-risk stability checks.

AspectDetails
Contextrmt.epsilon_default and rmt.epsilon_by_family.* thresholds
CalibrationDerived from null-sweep runs on target model families
Related termsRMT Guard, κ Threshold
report fieldsrmt.{epsilon_default,epsilon_by_family,epsilon_violations,stable,status,max_edge_ratio,max_edge_delta}
See alsoRMT ε Rule

RMT Guard

Guard that checks eigenvalue statistics against Random Matrix Theory bounds.

AspectDetails
FocusActivation edge-risk growth across model families
Validationvalidation.rmt_stable
Related termsCanonical Guard Chain, RMT ε Rule
report fieldsrmt.{families,stable,status,max_edge_ratio,max_edge_delta}
See alsoGuards Reference

S–T

Spectral Cap

Limit on spectral z-scores per family to flag weight instability.

AspectDetails
ContextApplied by spectral guard; counts violations per family
Related termsκ Threshold, z-score, Spectral Guard
report fieldsspectral.{caps_applied,caps_exceeded,top_z_scores}
See alsoSpectral FPR

Spectral Guard

Guard that monitors spectral norms and z-scores for weight matrices.

AspectDetails
FocusBaseline-relative weight matrix stability
Validationvalidation.spectral_stable
Related termsCanonical Guard Chain, Spectral Cap, κ Threshold
report fieldsspectral.{caps_applied,family_caps,top_z_scores,summary}
See alsoGuards Reference

Subject Run

The edited or target model run under evaluation (compared against baseline).

AspectDetails
Contextsubject checkpoint in Compare & evaluate
Related termsBaseline, Evaluation Report, Window Pairing
report fieldsprovenance.edited.*
See alsoCompare & evaluate

Telemetry

Performance and resource metrics emitted with reports.

AspectDetails
ContextOptional fields for performance analysis
Related termsTiming Summary, Guard Overhead
report fieldstelemetry.*, metrics.memory_mb_peak
See alsoObservability

Tier Policy

Guard threshold preset selecting the tier profile for a run.

AspectDetails
Optionsconservative (strictest), balanced (default), aggressive (loosest)
SourcePackaged runtime/tiers.yaml; overrides use INVARLOCK_CONFIG_ROOT/runtime/tiers.yaml
Related termsPolicy Digest, Policy Overrides
report fieldsauto.tier, resolved_policy.*
See alsoTier Policy Catalog

Timing Summary

Consolidated timing breakdown for an evaluation run.

AspectDetails
ContextCLI output via print_timing_summary()
IncludesModel load, dataset load, evaluation, report generation
Related termsGuard Overhead, Telemetry
See alsoObservability

Tokenizer Hash

Stable hash of tokenizer settings and vocabulary for reproducibility.

AspectDetails
ContextEnsures baseline and subject use identical tokenization
Related termsProvider Digest, Window Pairing
report fieldsmeta.tokenizer_hash
See alsoDeterminism Contracts

V–Z

Variance Effect (VE)

Guard that tracks variance change and applies equalization when beneficial.

AspectDetails
ContextA/B test compares bare vs VE-enabled evaluation
Enabling conditionPredictive CI upper bound ≤ -min_effect_lognll and mean Δ ≤ -min_effect_lognll; enabled VE also requires A/B provenance
Related termsCanonical Guard Chain, Guard Overhead, Predictive Gate
report fieldsvariance.{enabled,gain,predictive_gate.delta_ci,predictive_gate.passed,ab_test}
See alsoVE Gate Power

Window Pairing

Alignment of baseline and subject evaluation windows for paired statistical testing.

AspectDetails
RequirementsSame window IDs, zero overlap, 100% match fraction
ViolationE001 pairing error in CI/Release profiles
Related termsBCa Bootstrap, Primary Metric, Provider Digest
report fieldsdataset.windows.stats.{paired_windows,window_match_fraction,window_overlap_fraction}
See alsoCoverage & Pairing

Example: paired_windows: 200, window_match_fraction: 1.0, window_overlap_fraction: 0.0


z-score

Standardized deviation used in spectral guard scoring.

AspectDetails
Formulaz = (ŝ - μ_f) / σ_f for the guard's measured spectral statistic
ThresholdingCompared against family-specific κ caps
Related termsSpectral Cap, κ Threshold
report fieldsspectral.top_z_scores, spectral.family_caps.*.kappa
See alsoSpectral FPR

Example: max |z| = 2.1 indicates the largest z-score across all weight matrices.


See Also