Threat Model
This document provides a high-level threat model for InvarLock deployments. It is intentionally aligned with the assurance case scope: InvarLock’s primary goal is to control regression risk from weight edits relative to a baseline under specified configurations, not to provide a complete solution to model security or alignment.
Assumptions
- Users operate in isolated virtual environments or containers on Linux/macOS hosts with supported HF/PyTorch versions.
- Models and datasets may be sourced from public repositories, but are treated as potentially untrusted artifacts.
- Default runtime posture disables outbound network connections unless
INVARLOCK_ALLOW_NETWORK=1is explicitly set. - Default runtime posture keeps model-loading commands inside the runtime
container unless a public host-side workflow uses
invarlock evaluate --execution-mode hostor an advanced/internal workflow explicitly setsINVARLOCK_ALLOW_HOST_EXECUTION=1. - Evaluation runs use the pairing, windowing, and bootstrap profiles described in the assurance docs and configs.
Security Flow Overview
Assets and adversaries (in scope)
Assets
- Baseline and subject model weights for supported task families.
- Evaluation datasets, pairing schedules, and seed bundles.
- Evaluation artifacts: reports, logs, and policy digests.
Adversaries / failure modes
- Malicious or malformed model artifacts (e.g., unsafe pickle payloads) used as baselines or subjects.
- Misconfigured edits or guard policies that silently degrade quality or break structural invariants while “appearing to run”.
- Dependency vulnerabilities in the Python stack and transitive extras that could affect evaluation or guard logic.
Mitigations (built-in + process)
- Network guard (
invarlock.security) denies outbound sockets by default; network use must be opted into per command. - Runtime security defaults keep model-loading commands containerized, third-party plugins disabled, and remote model code off unless explicitly allowed.
- Supply-chain checks in CI and PR validation (install-surface SBOM
generation,
pip-auditon the base/hf/advancedshipped surfaces,gitleakshistory JSON/SARIF scanning), with scheduled/tag backstops for drift detection. - CodeQL scans shipped Python code plus repository helper scripts, and the analysis workflow fails closed if upload/analysis cannot complete.
- Release automation only rebuilds and publishes from validated tags resolved to an immutable commit SHA.
- Strict configuration and report validation (
invarlock doctor,invarlock verify) to detect misconfiguration, schema drift, and runtime provenance mismatches. - report fields for seeds, windowing, dataset/tokenizer hashes, and guard telemetry so reviewers can audit the assurance evidence.
Attack Scenarios
Concrete attack scenarios InvarLock is designed to address or explicitly delegates to external processes:
1. Poisoned Baseline Model
Threat: Attacker provides a pre-backdoored baseline that passes all guards.
Mitigation: Baseline provenance is the caller's responsibility. InvarLock compares subject to baseline but does not validate baseline correctness.
Detection: None — baseline is trusted by design. Use external model provenance checks (e.g., model cards, hash verification) before evaluation.
2. Malformed Pickle in Subject Checkpoint
Threat: Unsafe deserialization executes arbitrary code during model load.
Mitigation: InvarLock does not use pickle-capable adapter snapshot restore
in the default path, and adapters using from_pretrained inherit HF's
safetensors preference.
Detection: Invariants guard checks for non-finite values post-load; does not catch code execution during load itself.
3. Edit That Evades Guards
Threat: Carefully crafted edit stays within spectral/RMT bounds but causes task-specific degradation not captured by primary metric.
Mitigation: Primary metric gate + guard ensemble provides layered defense. Tighten tier (conservative) for high-stakes releases.
Detection: validation.primary_metric_acceptable = false or guard warnings
in report. Manual review of report.guards[] evidence.
4. Configuration Drift Attack
Threat: Attacker modifies config to weaken guards (larger ε, disabled checks) hoping reviewers don't notice.
Mitigation: reports capture resolved_policy.* and policy_digest
for audit. invarlock verify enforces schema compliance.
Detection: Policy changes appear in policy_digest.changed = true.
Compare reports side-by-side for unexpected policy drift.
5. Window Schedule Manipulation
Threat: Attacker provides crafted baseline windows that inflate subject performance (cherry-picked easy examples).
Mitigation: Pairing enforcement requires window_match_fraction = 1.0 and
window_overlap_fraction = 0.0. CI/Release profiles fail on pairing violations.
Detection: [INVARLOCK:E001] error on pairing schedule mismatch.
Out of scope (security non-goals)
These match the assurance non-goals:
- Multi-tenant GPU isolation, kernel-level sandboxing, and host hardening.
- Protection against prompt-level attacks, content harms (toxicity, bias, jailbreaks), or general alignment failures.
- Guarantees for environments outside the documented support matrix (e.g., native Windows, arbitrary CUDA stacks, unpinned dependency versions).