Quira Quira Docs
GitHub Star

Emergent Interaction Security

Quira's existing security layers protect individual components — encryption, sandboxing, IFC labels, capability-based access. But a class of threats emerges only from the interaction between three systems: the user's cognitive patterns, the AI inference pipeline, and the Knowledge Graph's evolving structure.

This page documents eight novel structures that address the knowledge lifecycle — how knowledge enters the graph, what happens when AI draws inferences, how to measure and contain semantic leakage, and how to retroactively enforce policy changes on already-processed data.

Research status

These structures build on the Advanced Architecture foundations (KPC, CSEB, SKV, etc.) and extend the formal threat model's adversary classes with seven new attack vectors.

1. Knowledge Supply Chain Integrity (KSCI)

Software supply chain attacks (SolarWinds, Log4Shell) inject malicious code through legitimate distribution channels. KSCI applies this concept to the knowledge domain: an attacker strategically places factually correct but directionally biased content on legitimate websites to poison a user's Knowledge Graph.

This is fundamentally different from phishing (fake sites) or prompt injection (hidden instructions). KSCI targets visible, plausible content whose purpose is graph structure manipulation.

How this differs

Context Gatekeeper and XProtect detect malicious code and known-bad signatures. KSCI detects strategically placed knowledge — content that is technically true but designed to shift a user's knowledge structure.

Formal model

A knowledge supply chain is modeled as KSC = (S, T, π, ι) where S is the set of information sources, T is a trust score function, π traces provenance, and ι verifies integrity.

Each knowledge node receives a composite trust score:

ComponentMeasuresExample
Source trustDomain reputation, HTTPS, content stabilitymayoclinic.org = 0.95, random-blog.xyz = 0.2
CorroborationIndependent sources confirming the same claim3 independent sources = 0.85
Temporal stabilityContent consistency over timeUnchanged for 2 years = 0.9; edited yesterday = 0.3

Poisoning detection: A source triggers an alert when all three conditions are met simultaneously: (1) recent content change (low temporal stability), (2) large semantic shift from previous version, and (3) suspiciously high affinity with the user's existing graph structure.

Attack scenarios

ScenarioMethodKSCI defense
Gradual graph poisoningAttacker builds reputation on Stack Overflow, then edits answers to introduce bias on user's research topicTemporal stability detects content change; corroboration engine checks if claims match independent sources
Knowledge cluster targetingAttacker places subtly biased content across multiple sites; user's natural search leads them to accumulate nudges toward a conclusionSource independence verification detects coordinated placement; graph affinity anomaly flags unnaturally fitting content
Citation chain manipulationCreate a chain of pages that cite each other circularly, manufacturing false corroborationProvenance tracker detects circular citation graphs; independence scoring penalizes co-owned sources

Defense architecture

Two core components:

  • Provenance Tracker — Records source metadata: domain Whois history, content snapshots (via Wayback Machine comparison), TLS certificate age, edit timestamps. Treats web content as supply chain artifacts.
  • Corroboration Engine — Verifies that the same claim (entity + relationship) is confirmed by independent sources. Single-source nodes receive low trust scores and are visually distinguished in the UI.

Threat model mapping: Adversary A2 (malicious web content) — extends beyond existing A2.1–A2.5 vectors. New vector A2.8 Knowledge Supply Chain Poisoning targets visible content rather than hidden instructions.

2. Inference Residue Defense (IRD)

When a user deletes a node from the Knowledge Graph, the data is removed from disk. But AI-generated inferences based on that node may survive in other nodes' attributes, entity relationships, embedding indices, and cluster statistics. This is inference residue.

The ghost problem

A user deletes all HIV treatment pages. But the AI previously inferred "interested in immunology" and added that tag to related nodes. The original data is gone, but its ghost — the inference — persists and can be exploited.

Residue classification

TypeDefinitionExampleDetection difficulty
Direct residueExplicit edges/attributes referencing deleted node"Related to [deleted node]" tagLow
Propagation residueInferences indirectly influenced by deleted nodeCluster centroid shift from deleted dataMedium
Embedding residueDeleted node's embedding affected similarity calculationsChanged similarity rankingsHigh
Statistical residueDeleted node's impact on topic distributions and cluster statsSkewed topic distributionVery high

Purge architecture

IRD introduces the Inference Dependency Graph (IDG) — a directed acyclic graph that records which input nodes contributed to which AI-generated outputs.

When a node is deleted:

  1. Direct residue: Immediately deleted (explicit references)
  2. Propagation residue: Queued for re-inference with the deleted node excluded from input
  3. Embedding residue: Affected similarity indices rebuilt
  4. Statistical residue: Topic statistics recomputed excluding deleted node

The IDG records dependencies at ~2% overhead during inference. Each entry is compact (node ID + operation ID).

Threat model mapping: Adversaries A3 (extensions) and A4 (OS-level). New vector A3.6 Inference Residue Exploitation.

3. Semantic Blast Radius Containment (SBRC)

Traditional breach metrics count leaked records, affected users, or exposed bytes. But in a Knowledge Graph, 100 leaked cooking nodes ≠ 100 leaked psychiatric treatment nodes. The semantic damage is orders of magnitude different.

SBRC measures breach impact in semantic units and defines containment boundaries in semantic space.

Containment zones

The graph is partitioned into semantic containment zones, each with a damage budget:

ZoneBase sensitivityDamage budgetExample categories
Zone 1 (Medical)1.050Diagnoses, treatments, prescriptions
Zone 2 (Financial)0.940Banking, investments, credit
Zone 3 (Personal)0.835Relationships, private messages
Zone 4 (Work)0.320Code, projects, meetings
Zone 5 (General)0.15News, recipes, tutorials

Cross-zone edge abstraction: Edges between zones carry abstracted labels. "Researching diabetes for a medical paper" (zone 1 → zone 4) becomes "health-related work" at the zone boundary. This prevents a breach in the Work zone from revealing specific Medical zone contents.

Semantic Damage Index

When a breach is detected, SBRC calculates the Semantic Damage Index (SDI) instead of a raw node count:

SDI(breach) = Σ sensitivity(node) × linkability(node) × uniqueness(node)

  • Sensitivity: Category-based (medical = 1.0, general = 0.1) multiplied by specificity (how detailed the node is)
  • Linkability: High-degree nodes that connect sensitive categories multiply the damage
  • Uniqueness: "Python syntax" (general knowledge) vs. "Patient X's prescription" (uniquely identifying)

Breach reports show: "3 high-sensitivity medical nodes and 12 general nodes leaked. Estimated privacy impact: HIGH" — not just "15 nodes leaked."

Relationship: SBRC and Space-Scoped Security

SBRC containment zones are a semantic superset of Space-Scoped Security. Context Spaces define administrative boundaries (user-created project groups); SBRC defines security boundaries based on semantic sensitivity. A single Space may contain nodes across multiple SBRC zones (e.g., a "Health Research" Space spans Zone 1 Medical and Zone 5 General). SBRC uses IFC labels to track cross-zone data flow.

4. Adversarial Graph Topology Resistance (AGTR)

Existing attacks target individual nodes (prompt injection, entity poisoning, embedding manipulation). AGTR addresses a new threat class: attacks on the graph's topology (structure) itself.

The attacker manipulates not node contents but connection patterns — to identify, track, or manipulate users through their graph structure.

Topology attacks

AttackMechanismGoal
Structural watermarkingPlace k pages with specific inter-link structure. When user visits them, an identifiable subgraph (watermark) forms in the Knowledge Graph.User identification. Later, partial graph access reveals the watermark.
Bridge node injectionInject content that creates bridge nodes between separated clusters.Force AI inference to propagate across sensitivity boundaries.
Motif fingerprintingObserve graph structure (degree distribution, clustering coefficient, spectral properties) from metadata.Identify the user among a population via unique structural patterns.

Topological Invariant Monitor

AGTR deploys a Topological Invariant Monitor (TIM) that continuously tracks structural properties:

  • Structural drift detection: Alert when the graph's topological feature vector (degree sequence, clustering, motif frequencies, spectral properties) changes beyond threshold between time steps.
  • Watermark pattern matching: Check if recently added nodes form identifiable subgraph patterns.
  • Structural noise on export: Add dummy edges, remove random edges, and inject decoy substructures before any graph data export — so the exported topology ≠ the real topology.

Threat model mapping: Adversary A2. New vectors A2.6 Structural Watermarking and A2.7 Bridge Node Injection.

5. Temporal Causality Verification (TCV)

Knowledge in the graph has a causal acquisition order: the user learned A, then B, then inferred C from A+B. This causal chain is essential for trust evaluation and fraud detection — but the existing Temporal Security paradigm only protects timestamps and time-based access, not causal order itself.

Hash chain construction

Each node addition produces a hash chain entry:

hi = SHA-256(hi-1 || content(vi) || timestamp(vi) || causal_refs(vi))

Guarantees:

  • Order immutability: Inserting a node mid-chain invalidates all subsequent hashes.
  • Causal reference verification: A node claiming dependency on another can be verified — the referenced node must appear earlier in the chain.
  • Deletion detection: Removing a node breaks the chain.
ScenarioWithout TCVWith TCV
Causal forgeryBiased content v1 leads user to "objective" content v2 via link — v2 appears independentCausal chain records v1 → v2 navigation, flagging non-independence
Knowledge backdatingMalicious extension alters node timestamps to fabricate "prior knowledge"Hash chain detects timestamp inconsistency
Research precedenceNo cryptographic proof of when knowledge was acquiredMerkle root + RFC 3161 timestamp authority enables provable precedence

6. Cross-Session Entropy Leakage Prevention (CSELP)

Even with perfect intra-session isolation, the Knowledge Graph is a session-spanning persistent structure. Each session's activity accumulates as graph growth patterns — and the sequence of growth patterns is a powerful fingerprint.

Graph growth fingerprinting

Session 1: +50 nodes (ML, Python). Session 2: +3 nodes (cooking). Session 3: +80 nodes (ML, PyTorch). This pattern identifies "ML engineer who cooks" — each session is harmless alone, but the series is uniquely identifying.

Decorrelation techniques

TechniqueWhat it breaksHow
Graph growth batchingSession-specific growth patternsBuffer node additions and commit at fixed intervals (hourly), not per-session
Topic distribution smoothingSession topic signaturesAdd Laplace noise to per-session topic distributions
Query timing jitterCognitive rhythm fingerprintsAdd random delay to NL query issuance timestamps
Phantom session injectionSession sequence correlationInsert dummy sessions (background graph operations) between real sessions

Defense goal: The mutual information between any two sessions' metadata vectors must stay below a threshold εsession. An observer who sees session N's metadata gains at most εsession bits about session N-1 or N+1.

Threat model mapping: Adversaries A1 (network observer) and A3 (extensions — A3.5 metadata collection). New vector A3.7 Cross-Session Metadata Correlation.

7. Retroactive Sanitization with Inference Purging (RSIP)

When a user changes their security policy (e.g., raises medical data sensitivity to maximum), the new policy must apply not only to future data but also to all data already collected and processed. This is far harder than it sounds:

  1. Data is already processed by the AI pipeline
  2. Inferences have propagated to other nodes (IRD's problem)
  3. Embeddings encode the information
  4. Cluster statistics and topic models are affected

Legal alignment

RSIP provides the technical mechanism for GDPR's "right to be forgotten" and CCPA's consumer data deletion right — extended beyond deletion to retroactive reclassification and re-restriction.

Retroactive pipeline

When policy Πold changes to Πnew:

  1. Identify affected nodes: R = {nodes where Πnew is more restrictive than Πold}
  2. Apply new policy directly: Update labels, access restrictions, encryption levels on R
  3. Walk inference dependencies: Use IRD's Inference Dependency Graph to find all inferences derived from R
  4. Re-infer with R excluded: Recompute affected attributes and edges without R's data
  5. Rebuild embeddings: Update or remove R's embeddings and rebuild similarity indices
  6. Recompute statistics: Recalculate topic distributions and cluster aggregates excluding R
  7. Verify completeness: Confirm that no non-R node has dependency > 0 on any R node
OptimizationApproachTrade-off
Progressive purge (default)Process highest-dependency items first, stop below thresholdControllable compute; tiny residue may remain
Lazy re-inferenceFlag affected attributes; recompute on next accessOld values persist until accessed
Batch processingRun full recomputation overnightDelay before policy takes full effect
Full purge (user-selected)Walk entire dependency graph immediatelyCPU-intensive but guarantees completeness

Threat model mapping: All adversary classes. New vector A-cross Retroactive Policy Evasion.

8. Contextual Amnesia Verification (CAV)

When a user says "forget this data," conventional systems confirm deletion was executed. CAV goes further: it provides cryptographic proof that recovery is impossible across all subsystems.

Proof of Forgetting

A Proof of Forgetting (PoF) is a triple: (commitment, deletion_proof, absence_proof).

  1. Commitment: While data exists, record a Merkle commitment showing the data is in the graph.
  2. Deletion proof: After deletion, the new Merkle root proves no valid Merkle path exists for the deleted data.
  3. Absence proof: Verify that no node in the updated graph has embedding similarity above threshold with the deleted data — i.e., the information doesn't survive in encoded form.

Subsystem-wise forgetting

Same data exists across multiple subsystems. PoF must cover all of them:

SubsystemData formForgetting methodVerification
Graph DB (SQLite)Nodes + edgesDELETE + VACUUMMerkle proof of absence
Embedding indexVectorsIndex rebuildmaxv sim(Φ(v), Φ(d)) < τ
Inference cacheAI outputsIDG-based purge (IRD)IDG dependency edge absence
Audit logText recordsEncrypt entries, destroy keyStructural integrity preserved, content irrecoverable
Export historyExport recordsRecord deletion— (data already sent externally is unrecoverable)

The Composite PoF is the conjunction of all per-subsystem proofs. If any subsystem fails verification, the user is told: "Forgetting incomplete — data residue detected in [subsystem]."

The log dilemma

Audit logs prove data was accessed — but the log itself reveals data existed. Solution: log entries are encrypted at write time. For forgetting, the log encryption key is destroyed. The log entry remains (integrity preserved) but its content becomes irrecoverable.

Threat model mapping: Adversaries A4 (OS-level — post-deletion recovery) and A5 (vendor — log-based reconstruction). New vector A4.3 Post-Deletion Data Recovery.

Implementation priority

PhaseStructureRationale
Phase 2 (immediate)IRD, TCVIRD's Inference Dependency Graph is a prerequisite for RSIP and CAV. TCV's hash chain should start early — the longer the chain, the stronger the verification.
Phase 2KSCI, SBRCProvenance Tracker (KSCI basic) and Semantic Zone Analyzer (SBRC) can be built as IFC label extensions.
Phase 3 (mature)RSIP, CAVRequire IRD's IDG to have accumulated sufficient data. CAV needs all subsystems to expose forgetting interfaces.
Phase 3AGTR, CSELPTopological Invariant Monitor (AGTR) needs user base for validation. Session Metadata Decorrelator (CSELP) has high compute cost.

Dependency graph

StructureDepends onEnables
IRDRSIP, CAV
TCVKSCI (strengthens)
KSCIAGTR (supply chain includes topology)
SBRCIFC labels
RSIPIRDCAV
CAVIRD, RSIP
AGTR
CSELP

New threat model attack vectors

VectorAdversaryDefended by
A2.6 Structural WatermarkingA2 (Web content)AGTR
A2.7 Bridge Node InjectionA2 (Web content)AGTR
A2.8 Knowledge Supply Chain PoisoningA2 (Web content)KSCI
A3.6 Inference Residue ExploitationA3 (Extension)IRD
A3.7 Cross-Session Metadata CorrelationA3 (Extension)CSELP
A4.3 Post-Deletion Data RecoveryA4 (OS-level)CAV
A-cross Retroactive Policy EvasionAll classesRSIP
← Previous: Advanced Architecture Next: Security Hardening →
Was this page helpful?
Ask AI
Q

Ask anything about the documentation

Ctrl+K to open anytime