Quira Quira Docs
GitHub Star

LLM Integration

The Local AI section covered what the AI does. This section explains how it works under the hood. Quira's AI capabilities are powered by llama.cpp (cross-platform) and MLX (Apple Silicon). The model is downloaded on first run and stored locally.

How NL Query works internally

When you submit a natural language query, Quira uses a RAG (Retrieval-Augmented Generation) pipeline over your Context Graph:

  1. Query embedding EYour question is converted to a 384-dimensional embedding vector.
  2. Retrieval Esqlite-vec finds the most semantically similar nodes. FTS5 supplements with keyword matches.
  3. Context assembly ERetrieved node summaries, entities, and metadata are assembled into a context window for the LLM.
  4. Generation with citations EThe LLM generates a response grounded in the retrieved context, with mandatory source citations.

AI capability latencies

AI CapabilityTriggerTarget Latency
Page summarizationAutomatic on node creation< 2s
Entity extractionAutomatic on node creation< 1s
Embedding generationAutomatic on node creation< 500ms
NL Query (100 nodes)User-initiated via Command Palette< 3s
← Previous: Storage (SQLite) Next: Development Setup →
Was this page helpful?
Ask AI
Q

Ask anything about the documentation

Ctrl+K to open anytime