Threat Model
This document provides a high-level threat model for InvarLock deployments. It is intentionally aligned with the assurance case scope: InvarLock’s primary goal is to control regression risk from weight edits relative to a baseline under specified configurations, not to provide a complete solution to model security or alignment.
Assumptions
- Users operate in isolated virtual environments or containers on Linux/macOS hosts with supported HF/PyTorch versions.
- Models and datasets may be sourced from public repositories, but are treated as potentially untrusted artifacts.
- Default runtime posture disables outbound network connections unless
INVARLOCK_ALLOW_NETWORK=1is explicitly set. - Evaluation runs use the pairing, windowing, and bootstrap profiles described in the assurance docs and configs.
Security Flow Overview
┌─────────────────────────────────────────────────────────────────────────┐
│ SECURITY BOUNDARY LAYERS │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ NETWORK LAYER │ │
│ │ INVARLOCK_ALLOW_NETWORK=0 by default; outbound blocked unless │ │
│ │ explicitly enabled. │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ ARTIFACT LAYER │ │
│ │ model loading | dataset loading | config loading │ │
│ │ adapter checks | pairing checks | schema checks │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ VALIDATION LAYER │ │
│ │ invarlock doctor -> invarlock evaluate -> invarlock verify │ │
│ │ env/config checks | pairing math | schema + contracts │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ EVIDENCE LAYER │ │
│ │ evaluation.report.json with seeds, hashes, policy digest, and │ │
│ │ guard measurement contracts for audit. │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Assets and adversaries (in scope)
Assets
- Baseline and subject model weights for supported task families.
- Evaluation datasets, pairing schedules, and seed bundles.
- Evaluation artifacts: reports, logs, and policy digests.
Adversaries / failure modes
- Malicious or malformed model artifacts (e.g., unsafe pickle payloads) used as baselines or subjects.
- Misconfigured edits or guard policies that silently degrade quality or break structural invariants while still “appearing to run”.
- Dependency vulnerabilities in the Python stack and transitive extras that could affect evaluation or guard logic.
Mitigations (built-in + process)
- Network guard (
invarlock.security) denies outbound sockets by default; network use must be opted into per command. - Supply-chain checks in CI (SBOM generation,
pip-audit, secret scanning). - Strict configuration and report validation (
invarlock doctor,invarlock verify) to detect misconfiguration and schema drift. - report fields for seeds, windowing, dataset/tokenizer hashes, and guard telemetry so reviewers can audit the assurance evidence.
Attack Scenarios
Concrete attack scenarios InvarLock is designed to address or explicitly delegates to external processes:
1. Poisoned Baseline Model
Threat: Attacker provides a pre-backdoored baseline that passes all guards.
Mitigation: Baseline provenance is the caller's responsibility. InvarLock compares subject to baseline but does not validate baseline correctness.
Detection: None — baseline is trusted by design. Use external model provenance checks (e.g., model cards, hash verification) before evaluation.
2. Malformed Pickle in Subject Checkpoint
Threat: Unsafe deserialization executes arbitrary code during model load.
Mitigation: Use weights_only=True when available in PyTorch. Adapters
using from_pretrained inherit HF's safetensors preference.
Detection: Invariants guard checks for non-finite values post-load; does not catch code execution during load itself.
3. Edit That Evades Guards
Threat: Carefully crafted edit stays within spectral/RMT bounds but causes task-specific degradation not captured by primary metric.
Mitigation: Primary metric gate + guard ensemble provides layered defense. Tighten tier (conservative) for high-stakes releases.
Detection: validation.primary_metric_acceptable = false or guard warnings
in report. Manual review of report.guards[] evidence.
4. Configuration Drift Attack
Threat: Attacker modifies config to weaken guards (larger ε, disabled checks) hoping reviewers don't notice.
Mitigation: reports capture resolved_policy.* and policy_digest
for audit. invarlock verify enforces schema compliance.
Detection: Policy changes appear in policy_digest.changed = true.
Compare reports side-by-side for unexpected policy drift.
5. Window Schedule Manipulation
Threat: Attacker provides crafted baseline windows that inflate subject performance (cherry-picked easy examples).
Mitigation: Pairing enforcement requires window_match_fraction = 1.0 and
window_overlap_fraction = 0.0. CI/Release profiles fail on pairing violations.
Detection: [INVARLOCK:E001] error on pairing schedule mismatch.
Out of scope (security non-goals)
These match the assurance non-goals:
- Multi-tenant GPU isolation, kernel-level sandboxing, and host hardening.
- Protection against prompt-level attacks, content harms (toxicity, bias, jailbreaks), or general alignment failures.
- Guarantees for environments outside the documented support matrix (e.g., native Windows, arbitrary CUDA stacks, unpinned dependency versions).