Emergent Interaction Security

Quira's existing security layers protect individual components — encryption, sandboxing, IFC labels, capability-based access. But a class of threats emerges only from the interaction between three systems: the user's cognitive patterns, the AI inference pipeline, and the Knowledge Graph's evolving structure.

This page documents eight novel structures that address the knowledge lifecycle — how knowledge enters the graph, what happens when AI draws inferences, how to measure and contain semantic leakage, and how to retroactively enforce policy changes on already-processed data.

Research status

These structures build on the Advanced Architecture foundations (KPC, CSEB, SKV, etc.) and extend the formal threat model's adversary classes with seven new attack vectors.

1. Knowledge Supply Chain Integrity (KSCI)

Software supply chain attacks (SolarWinds, Log4Shell) inject malicious code through legitimate distribution channels. KSCI applies this concept to the knowledge domain: an attacker strategically places factually correct but directionally biased content on legitimate websites to poison a user's Knowledge Graph.

This is fundamentally different from phishing (fake sites) or prompt injection (hidden instructions). KSCI targets visible, plausible content whose purpose is graph structure manipulation.

How this differs

Context Gatekeeper and XProtect detect malicious code and known-bad signatures. KSCI detects strategically placed knowledge — content that is technically true but designed to shift a user's knowledge structure.

Formal model

A knowledge supply chain is modeled as KSC = (S, T, π, ι) where S is the set of information sources, T is a trust score function, π traces provenance, and ι verifies integrity.

Each knowledge node receives a composite trust score:

Component	Measures	Example
Source trust	Domain reputation, HTTPS, content stability	mayoclinic.org = 0.95, random-blog.xyz = 0.2
Corroboration	Independent sources confirming the same claim	3 independent sources = 0.85
Temporal stability	Content consistency over time	Unchanged for 2 years = 0.9; edited yesterday = 0.3

Poisoning detection: A source triggers an alert when all three conditions are met simultaneously: (1) recent content change (low temporal stability), (2) large semantic shift from previous version, and (3) suspiciously high affinity with the user's existing graph structure.

Attack scenarios

Scenario	Method	KSCI defense
Gradual graph poisoning	Attacker builds reputation on Stack Overflow, then edits answers to introduce bias on user's research topic	Temporal stability detects content change; corroboration engine checks if claims match independent sources
Knowledge cluster targeting	Attacker places subtly biased content across multiple sites; user's natural search leads them to accumulate nudges toward a conclusion	Source independence verification detects coordinated placement; graph affinity anomaly flags unnaturally fitting content
Citation chain manipulation	Create a chain of pages that cite each other circularly, manufacturing false corroboration	Provenance tracker detects circular citation graphs; independence scoring penalizes co-owned sources

Defense architecture

Two core components:

Provenance Tracker — Records source metadata: domain Whois history, content snapshots (via Wayback Machine comparison), TLS certificate age, edit timestamps. Treats web content as supply chain artifacts.
Corroboration Engine — Verifies that the same claim (entity + relationship) is confirmed by independent sources. Single-source nodes receive low trust scores and are visually distinguished in the UI.

Threat model mapping: Adversary A2 (malicious web content) — extends beyond existing A2.1–A2.5 vectors. New vector A2.8 Knowledge Supply Chain Poisoning targets visible content rather than hidden instructions.

2. Inference Residue Defense (IRD)

When a user deletes a node from the Knowledge Graph, the data is removed from disk. But AI-generated inferences based on that node may survive in other nodes' attributes, entity relationships, embedding indices, and cluster statistics. This is inference residue.

The ghost problem

A user deletes all HIV treatment pages. But the AI previously inferred "interested in immunology" and added that tag to related nodes. The original data is gone, but its ghost — the inference — persists and can be exploited.

Residue classification

Type	Definition	Example	Detection difficulty
Direct residue	Explicit edges/attributes referencing deleted node	"Related to [deleted node]" tag	Low
Propagation residue	Inferences indirectly influenced by deleted node	Cluster centroid shift from deleted data	Medium
Embedding residue	Deleted node's embedding affected similarity calculations	Changed similarity rankings	High
Statistical residue	Deleted node's impact on topic distributions and cluster stats	Skewed topic distribution	Very high

Purge architecture

IRD introduces the Inference Dependency Graph (IDG) — a directed acyclic graph that records which input nodes contributed to which AI-generated outputs.

When a node is deleted:

Direct residue: Immediately deleted (explicit references)
Propagation residue: Queued for re-inference with the deleted node excluded from input
Embedding residue: Affected similarity indices rebuilt
Statistical residue: Topic statistics recomputed excluding deleted node

The IDG records dependencies at ~2% overhead during inference. Each entry is compact (node ID + operation ID).

Threat model mapping: Adversaries A3 (extensions) and A4 (OS-level). New vector A3.6 Inference Residue Exploitation.

3. Semantic Blast Radius Containment (SBRC)

Traditional breach metrics count leaked records, affected users, or exposed bytes. But in a Knowledge Graph, 100 leaked cooking nodes ≠ 100 leaked psychiatric treatment nodes. The semantic damage is orders of magnitude different.

SBRC measures breach impact in semantic units and defines containment boundaries in semantic space.

Containment zones

The graph is partitioned into semantic containment zones, each with a damage budget:

Zone	Base sensitivity	Damage budget	Example categories
Zone 1 (Medical)	1.0	50	Diagnoses, treatments, prescriptions
Zone 2 (Financial)	0.9	40	Banking, investments, credit
Zone 3 (Personal)	0.8	35	Relationships, private messages
Zone 4 (Work)	0.3	20	Code, projects, meetings
Zone 5 (General)	0.1	5	News, recipes, tutorials

Cross-zone edge abstraction: Edges between zones carry abstracted labels. "Researching diabetes for a medical paper" (zone 1 → zone 4) becomes "health-related work" at the zone boundary. This prevents a breach in the Work zone from revealing specific Medical zone contents.

Semantic Damage Index

When a breach is detected, SBRC calculates the Semantic Damage Index (SDI) instead of a raw node count:

SDI(breach) = Σ sensitivity(node) × linkability(node) × uniqueness(node)

Sensitivity: Category-based (medical = 1.0, general = 0.1) multiplied by specificity (how detailed the node is)
Linkability: High-degree nodes that connect sensitive categories multiply the damage
Uniqueness: "Python syntax" (general knowledge) vs. "Patient X's prescription" (uniquely identifying)

Breach reports show: "3 high-sensitivity medical nodes and 12 general nodes leaked. Estimated privacy impact: HIGH" — not just "15 nodes leaked."

Relationship: SBRC and Space-Scoped Security

SBRC containment zones are a semantic superset of Space-Scoped Security. Context Spaces define administrative boundaries (user-created project groups); SBRC defines security boundaries based on semantic sensitivity. A single Space may contain nodes across multiple SBRC zones (e.g., a "Health Research" Space spans Zone 1 Medical and Zone 5 General). SBRC uses IFC labels to track cross-zone data flow.

4. Adversarial Graph Topology Resistance (AGTR)

Existing attacks target individual nodes (prompt injection, entity poisoning, embedding manipulation). AGTR addresses a new threat class: attacks on the graph's topology (structure) itself.

The attacker manipulates not node contents but connection patterns — to identify, track, or manipulate users through their graph structure.

Topology attacks

Attack	Mechanism	Goal
Structural watermarking	Place k pages with specific inter-link structure. When user visits them, an identifiable subgraph (watermark) forms in the Knowledge Graph.	User identification. Later, partial graph access reveals the watermark.
Bridge node injection	Inject content that creates bridge nodes between separated clusters.	Force AI inference to propagate across sensitivity boundaries.
Motif fingerprinting	Observe graph structure (degree distribution, clustering coefficient, spectral properties) from metadata.	Identify the user among a population via unique structural patterns.

Topological Invariant Monitor

AGTR deploys a Topological Invariant Monitor (TIM) that continuously tracks structural properties:

Structural drift detection: Alert when the graph's topological feature vector (degree sequence, clustering, motif frequencies, spectral properties) changes beyond threshold between time steps.
Watermark pattern matching: Check if recently added nodes form identifiable subgraph patterns.
Structural noise on export: Add dummy edges, remove random edges, and inject decoy substructures before any graph data export — so the exported topology ≠ the real topology.

Threat model mapping: Adversary A2. New vectors A2.6 Structural Watermarking and A2.7 Bridge Node Injection.

5. Temporal Causality Verification (TCV)

Knowledge in the graph has a causal acquisition order: the user learned A, then B, then inferred C from A+B. This causal chain is essential for trust evaluation and fraud detection — but the existing Temporal Security paradigm only protects timestamps and time-based access, not causal order itself.

Hash chain construction

Each node addition produces a hash chain entry:

h_i = SHA-256(h_i-1 || content(v_i) || timestamp(v_i) || causal_refs(v_i))

Guarantees:

Order immutability: Inserting a node mid-chain invalidates all subsequent hashes.
Causal reference verification: A node claiming dependency on another can be verified — the referenced node must appear earlier in the chain.
Deletion detection: Removing a node breaks the chain.

Scenario	Without TCV	With TCV
Causal forgery	Biased content v1 leads user to "objective" content v2 via link — v2 appears independent	Causal chain records v1 → v2 navigation, flagging non-independence
Knowledge backdating	Malicious extension alters node timestamps to fabricate "prior knowledge"	Hash chain detects timestamp inconsistency
Research precedence	No cryptographic proof of when knowledge was acquired	Merkle root + RFC 3161 timestamp authority enables provable precedence

6. Cross-Session Entropy Leakage Prevention (CSELP)

Even with perfect intra-session isolation, the Knowledge Graph is a session-spanning persistent structure. Each session's activity accumulates as graph growth patterns — and the sequence of growth patterns is a powerful fingerprint.

Graph growth fingerprinting

Session 1: +50 nodes (ML, Python). Session 2: +3 nodes (cooking). Session 3: +80 nodes (ML, PyTorch). This pattern identifies "ML engineer who cooks" — each session is harmless alone, but the series is uniquely identifying.

Decorrelation techniques

Technique	What it breaks	How
Graph growth batching	Session-specific growth patterns	Buffer node additions and commit at fixed intervals (hourly), not per-session
Topic distribution smoothing	Session topic signatures	Add Laplace noise to per-session topic distributions
Query timing jitter	Cognitive rhythm fingerprints	Add random delay to NL query issuance timestamps
Phantom session injection	Session sequence correlation	Insert dummy sessions (background graph operations) between real sessions

Defense goal: The mutual information between any two sessions' metadata vectors must stay below a threshold ε_session. An observer who sees session N's metadata gains at most ε_session bits about session N-1 or N+1.

Threat model mapping: Adversaries A1 (network observer) and A3 (extensions — A3.5 metadata collection). New vector A3.7 Cross-Session Metadata Correlation.

7. Retroactive Sanitization with Inference Purging (RSIP)

When a user changes their security policy (e.g., raises medical data sensitivity to maximum), the new policy must apply not only to future data but also to all data already collected and processed. This is far harder than it sounds:

Data is already processed by the AI pipeline
Inferences have propagated to other nodes (IRD's problem)
Embeddings encode the information
Cluster statistics and topic models are affected

Legal alignment

RSIP provides the technical mechanism for GDPR's "right to be forgotten" and CCPA's consumer data deletion right — extended beyond deletion to retroactive reclassification and re-restriction.

Retroactive pipeline

When policy Π_old changes to Π_new:

Identify affected nodes: R = {nodes where Π_new is more restrictive than Π_old}
Apply new policy directly: Update labels, access restrictions, encryption levels on R
Walk inference dependencies: Use IRD's Inference Dependency Graph to find all inferences derived from R
Re-infer with R excluded: Recompute affected attributes and edges without R's data
Rebuild embeddings: Update or remove R's embeddings and rebuild similarity indices
Recompute statistics: Recalculate topic distributions and cluster aggregates excluding R
Verify completeness: Confirm that no non-R node has dependency > 0 on any R node

Optimization	Approach	Trade-off
Progressive purge (default)	Process highest-dependency items first, stop below threshold	Controllable compute; tiny residue may remain
Lazy re-inference	Flag affected attributes; recompute on next access	Old values persist until accessed
Batch processing	Run full recomputation overnight	Delay before policy takes full effect
Full purge (user-selected)	Walk entire dependency graph immediately	CPU-intensive but guarantees completeness

Threat model mapping: All adversary classes. New vector A-cross Retroactive Policy Evasion.

8. Contextual Amnesia Verification (CAV)

When a user says "forget this data," conventional systems confirm deletion was executed. CAV goes further: it provides cryptographic proof that recovery is impossible across all subsystems.

Proof of Forgetting

A Proof of Forgetting (PoF) is a triple: (commitment, deletion_proof, absence_proof).

Commitment: While data exists, record a Merkle commitment showing the data is in the graph.
Deletion proof: After deletion, the new Merkle root proves no valid Merkle path exists for the deleted data.
Absence proof: Verify that no node in the updated graph has embedding similarity above threshold with the deleted data — i.e., the information doesn't survive in encoded form.

Subsystem-wise forgetting

Same data exists across multiple subsystems. PoF must cover all of them:

Subsystem	Data form	Forgetting method	Verification
Graph DB (SQLite)	Nodes + edges	DELETE + VACUUM	Merkle proof of absence
Embedding index	Vectors	Index rebuild	max_v sim(Φ(v), Φ(d)) < τ
Inference cache	AI outputs	IDG-based purge (IRD)	IDG dependency edge absence
Audit log	Text records	Encrypt entries, destroy key	Structural integrity preserved, content irrecoverable
Export history	Export records	Record deletion	— (data already sent externally is unrecoverable)

The Composite PoF is the conjunction of all per-subsystem proofs. If any subsystem fails verification, the user is told: "Forgetting incomplete — data residue detected in [subsystem]."

The log dilemma

Audit logs prove data was accessed — but the log itself reveals data existed. Solution: log entries are encrypted at write time. For forgetting, the log encryption key is destroyed. The log entry remains (integrity preserved) but its content becomes irrecoverable.

Threat model mapping: Adversaries A4 (OS-level — post-deletion recovery) and A5 (vendor — log-based reconstruction). New vector A4.3 Post-Deletion Data Recovery.

Implementation priority

Phase	Structure	Rationale
Phase 2 (immediate)	IRD, TCV	IRD's Inference Dependency Graph is a prerequisite for RSIP and CAV. TCV's hash chain should start early — the longer the chain, the stronger the verification.
Phase 2	KSCI, SBRC	Provenance Tracker (KSCI basic) and Semantic Zone Analyzer (SBRC) can be built as IFC label extensions.
Phase 3 (mature)	RSIP, CAV	Require IRD's IDG to have accumulated sufficient data. CAV needs all subsystems to expose forgetting interfaces.
Phase 3	AGTR, CSELP	Topological Invariant Monitor (AGTR) needs user base for validation. Session Metadata Decorrelator (CSELP) has high compute cost.

Dependency graph

Structure	Depends on	Enables
IRD	—	RSIP, CAV
TCV	—	KSCI (strengthens)
KSCI	—	AGTR (supply chain includes topology)
SBRC	IFC labels	—
RSIP	IRD	CAV
CAV	IRD, RSIP	—
AGTR	—	—
CSELP	—	—

New threat model attack vectors

Vector	Adversary	Defended by
A2.6 Structural Watermarking	A2 (Web content)	AGTR
A2.7 Bridge Node Injection	A2 (Web content)	AGTR
A2.8 Knowledge Supply Chain Poisoning	A2 (Web content)	KSCI
A3.6 Inference Residue Exploitation	A3 (Extension)	IRD
A3.7 Cross-Session Metadata Correlation	A3 (Extension)	CSELP
A4.3 Post-Deletion Data Recovery	A4 (OS-level)	CAV
A-cross Retroactive Policy Evasion	All classes	RSIP

← Previous: Advanced Architecture Next: Security Hardening →