Positioning — what engrava is (and isn’t)

engrava is a standalone embedded database for AI-agent memory. It is built on SQLite and runs in-process: one pip install, no server, no LLM, no external services. It gives an agent a durable thought-graph with hybrid retrieval (full-text + vector + recency + priority + graph) and an optional tamper-evident audit trail.

This page explains when engrava is the right tool, when it isn’t, and how it relates to the other memory options you might be choosing between.

When engrava is a good fit

  • You want memory you own and can inspect. The whole store is one SQLite file. You can open it with any SQLite tool, back it up with a file copy (with care around WAL), and query it with SQL when the high-level API isn’t enough.
  • You want retrieval, not just a vector index. engrava fuses FTS5/BM25, vector similarity, recency, priority, and a 1-hop graph signal into one ranked result. See Search.
  • You want a graph, not a flat list. Thoughts are connected by typed, weighted edges, and the graph itself contributes to ranking.
  • You want it embedded. No network hop, no service to operate, no separate process. It runs anywhere Python and SQLite run.
  • You want embeddings to be optional and pluggable. Bring a local model, an OpenAI-compatible endpoint, Ollama, HuggingFace, or your own callback — or run with FTS-only and no embeddings at all. See Configuration → embeddings.
  • Small-to-medium corpora. The default backend brute-forces vector search in Python and works well up to roughly 100k embeddings; beyond that, switch to the sqlite-vec backend. See Known Limitations.

When engrava is not a good fit

  • You need a managed, horizontally-scaled vector service. engrava is a local embedded library, not a clustered database. One store is one SQLite file written by one process. If you need sharding, replication, or a multi-writer service across many machines, use a dedicated vector database.
  • You need many processes writing the same store concurrently. SQLite is single-writer. WAL mode lets readers and a single writer coexist, and a single process can drive many async tasks safely, but heavy multi-process write fan-out is out of scope. See Known Limitations → Concurrent Write Safety.
  • You want the library to call an LLM for you. engrava does no LLM-side fact extraction, summarisation, or entity resolution (see Non-goals). It stores and retrieves what you give it; your agent decides what to write.
  • You need per-tenant retrieval isolation on the ranked path out of the box. The search_* methods take no scope/metadata filter today — retrieval is unscoped by default. There are good workarounds (over-fetch + post-filter, one store per tenant, raw-SQL pre-filter); see the migration guide’s scoping section.

Non-goals

These are deliberate boundaries, not missing features:

  • No LLM-side intelligence. engrava never calls a language model. It does no fact extraction, no summarisation, no entity resolution, no automatic “memory writing” from raw text. Those belong in your agent (or a downstream extension), above the storage layer. The one consolidation feature that does synthesise — dreaming — is purely structural (clustering + centroids + keyword counts), with no LLM involved.
  • Retrieval is unscoped by default. search_hybrid / search_similar / search_fts rank across the whole store; they accept no per-user or per-session filter argument. Scoping is an application-level concern today.
  • Not a distributed system. No clustering, replication, or cross-machine consistency. One file, one writer.
  • Not an application framework. engrava is the memory layer. It does not provide an agent runtime, tool-calling, or prompt orchestration.

How it compares

A rough orientation, not a feature scorecard. Evaluate the specifics against your own workload.

engravaHosted agent-memory services (e.g. mem0, Zep)Framework memory (e.g. LangChain memory)Standalone vector DBs (e.g. Chroma, Qdrant, pgvector)
DeploymentEmbedded library, one SQLite file, in-processTypically a hosted/managed service or self-hosted serverIn-process, tied to the frameworkSeparate database/service (some have embedded modes)
Retrieval modelHybrid: FTS + vector + recency + priority + graph, fusedVaries; often vector + recency with managed pipelinesUsually buffer/window or a vector-store wrapperPrimarily vector similarity (some add keyword/hybrid)
GraphFirst-class typed/weighted edges that feed rankingSome offer entity/graph memoryGenerally noGenerally no
LLM-side extractionNone — you decide what to storeOften built in (auto fact-extraction/summarisation)Sometimes, via chainsNone
External servicesNone requiredUsually yesDepends on the chosen storeUsually a running service
Audit trailOptional tamper-evident hash-chain journalVariesNoGenerally no
Best forOwning a local, inspectable, hybrid memory graph for an agentOffloading memory ops to a managed pipelineQuick memory inside an existing framework appLarge-scale pure vector retrieval

If you are currently using one of these and want concept mappings and porting snippets, see the OSS Migrating from another memory system guide.

See also

  • Search — how the hybrid ranking actually works
  • FAQ — common pre-adoption questions
  • Known Limitations — the hard constraints in one place