LLM Integration
The Local AI section covered what the AI does. This section explains how it works under the hood. Quira's AI capabilities are powered by llama.cpp (cross-platform) and MLX (Apple Silicon). The model is downloaded on first run and stored locally.
How NL Query works internally
When you submit a natural language query, Quira uses a RAG (Retrieval-Augmented Generation) pipeline over your Context Graph:
- Query embedding EYour question is converted to a 384-dimensional embedding vector.
- Retrieval Esqlite-vec finds the most semantically similar nodes. FTS5 supplements with keyword matches.
- Context assembly ERetrieved node summaries, entities, and metadata are assembled into a context window for the LLM.
- Generation with citations EThe LLM generates a response grounded in the retrieved context, with mandatory source citations.
AI capability latencies
| AI Capability | Trigger | Target Latency |
|---|---|---|
| Page summarization | Automatic on node creation | < 2s |
| Entity extraction | Automatic on node creation | < 1s |
| Embedding generation | Automatic on node creation | < 500ms |
| NL Query (100 nodes) | User-initiated via Command Palette | < 3s |
Was this page helpful?