Positioning — what engrava is (and isn’t)
engrava is a standalone embedded database for AI-agent memory. It is built on
SQLite and runs in-process: one pip install, no server, no LLM, no external
services. It gives an agent a durable thought-graph with hybrid retrieval
(full-text + vector + recency + priority + graph) and an optional tamper-evident
audit trail.
This page explains when engrava is the right tool, when it isn’t, and how it relates to the other memory options you might be choosing between.
When engrava is a good fit
- You want memory you own and can inspect. The whole store is one SQLite file. You can open it with any SQLite tool, back it up with a file copy (with care around WAL), and query it with SQL when the high-level API isn’t enough.
- You want retrieval, not just a vector index. engrava fuses FTS5/BM25, vector similarity, recency, priority, and a 1-hop graph signal into one ranked result. See Search.
- You want a graph, not a flat list. Thoughts are connected by typed, weighted edges, and the graph itself contributes to ranking.
- You want it embedded. No network hop, no service to operate, no separate process. It runs anywhere Python and SQLite run.
- You want embeddings to be optional and pluggable. Bring a local model, an OpenAI-compatible endpoint, Ollama, HuggingFace, or your own callback — or run with FTS-only and no embeddings at all. See Configuration → embeddings.
- Small-to-medium corpora. The default backend brute-forces vector search in
Python and works well up to roughly 100k embeddings; beyond that, switch to
the
sqlite-vecbackend. See Known Limitations.
When engrava is not a good fit
- You need a managed, horizontally-scaled vector service. engrava is a local embedded library, not a clustered database. One store is one SQLite file written by one process. If you need sharding, replication, or a multi-writer service across many machines, use a dedicated vector database.
- You need many processes writing the same store concurrently. SQLite is single-writer. WAL mode lets readers and a single writer coexist, and a single process can drive many async tasks safely, but heavy multi-process write fan-out is out of scope. See Known Limitations → Concurrent Write Safety.
- You want the library to call an LLM for you. engrava does no LLM-side fact extraction, summarisation, or entity resolution (see Non-goals). It stores and retrieves what you give it; your agent decides what to write.
- You need per-tenant retrieval isolation on the ranked path out of the box.
The
search_*methods take no scope/metadata filter today — retrieval is unscoped by default. There are good workarounds (over-fetch + post-filter, one store per tenant, raw-SQL pre-filter); see the migration guide’s scoping section.
Non-goals
These are deliberate boundaries, not missing features:
- No LLM-side intelligence. engrava never calls a language model. It does no fact extraction, no summarisation, no entity resolution, no automatic “memory writing” from raw text. Those belong in your agent (or a downstream extension), above the storage layer. The one consolidation feature that does synthesise — dreaming — is purely structural (clustering + centroids + keyword counts), with no LLM involved.
- Retrieval is unscoped by default.
search_hybrid/search_similar/search_ftsrank across the whole store; they accept no per-user or per-session filter argument. Scoping is an application-level concern today. - Not a distributed system. No clustering, replication, or cross-machine consistency. One file, one writer.
- Not an application framework. engrava is the memory layer. It does not provide an agent runtime, tool-calling, or prompt orchestration.
How it compares
A rough orientation, not a feature scorecard. Evaluate the specifics against your own workload.
| engrava | Hosted agent-memory services (e.g. mem0, Zep) | Framework memory (e.g. LangChain memory) | Standalone vector DBs (e.g. Chroma, Qdrant, pgvector) | |
|---|---|---|---|---|
| Deployment | Embedded library, one SQLite file, in-process | Typically a hosted/managed service or self-hosted server | In-process, tied to the framework | Separate database/service (some have embedded modes) |
| Retrieval model | Hybrid: FTS + vector + recency + priority + graph, fused | Varies; often vector + recency with managed pipelines | Usually buffer/window or a vector-store wrapper | Primarily vector similarity (some add keyword/hybrid) |
| Graph | First-class typed/weighted edges that feed ranking | Some offer entity/graph memory | Generally no | Generally no |
| LLM-side extraction | None — you decide what to store | Often built in (auto fact-extraction/summarisation) | Sometimes, via chains | None |
| External services | None required | Usually yes | Depends on the chosen store | Usually a running service |
| Audit trail | Optional tamper-evident hash-chain journal | Varies | No | Generally no |
| Best for | Owning a local, inspectable, hybrid memory graph for an agent | Offloading memory ops to a managed pipeline | Quick memory inside an existing framework app | Large-scale pure vector retrieval |
If you are currently using one of these and want concept mappings and porting snippets, see the OSS Migrating from another memory system guide.
See also
- Search — how the hybrid ranking actually works
- FAQ — common pre-adoption questions
- Known Limitations — the hard constraints in one place