LLM Integration

The Local AI section covered what the AI does. This section explains how it works under the hood. Quira's AI capabilities are powered by llama.cpp (cross-platform) and MLX (Apple Silicon). The model is downloaded on first run and stored locally.

How NL Query works internally

When you submit a natural language query, Quira uses a RAG (Retrieval-Augmented Generation) pipeline over your Context Graph:

Query embedding EYour question is converted to a 384-dimensional embedding vector.
Retrieval Esqlite-vec finds the most semantically similar nodes. FTS5 supplements with keyword matches.
Context assembly ERetrieved node summaries, entities, and metadata are assembled into a context window for the LLM.
Generation with citations EThe LLM generates a response grounded in the retrieved context, with mandatory source citations.

AI capability latencies

AI Capability	Trigger	Target Latency
Page summarization	Automatic on node creation	< 2s
Entity extraction	Automatic on node creation	< 1s
Embedding generation	Automatic on node creation	< 500ms
NL Query (100 nodes)	User-initiated via Command Palette	< 3s

← Previous: Storage (SQLite) Next: Development Setup →