LlamaIndex gives you state-of-the-art retrieval — but ChatMemoryBuffer still dies at process end, and pure vector stores max out around 67% recall. REM pairs LlamaIndex's pipelines with a persistent vector + FTS + entity-graph substrate that hits 94.6% on LongMemEval.
Free tier · No credit card · Works with llama-index (Python) and llamaindex (TS)
What a persistent memory substrate adds to a retrieval framework.
Your ChatEngine resumes where it left off — weeks later, from a different process, on a different machine. Nothing to rehydrate.
Summarize, link, dedupe, score salience, detect contradictions — run nightly against your LlamaIndex corpus so retrieval accuracy improves without a re-embed job.
The embeddings LlamaIndex produces, the chat history it buffers, the entities it extracts — all land in the same namespace that LangChain, CrewAI, AutoGen, and your Cursor MCP share.
LlamaIndex already speaks HTTP retrievers. REM is a retriever.
The Python and TS SDKs are in private beta. The requests/fetch pattern below is production-ready today. Request SDK access in Discord.
Sign up at remlabs.ai/console. Copy the sk-rem-... key and export it.
LlamaIndex chunks documents; REM stores them as first-class memories that survive rebuilds.
Subclass BaseRetriever, hit /v1/memory-search-semantic, return NodeWithScore. Plug into any QueryEngine, ChatEngine, or ReactAgent.
Three shapes: custom retriever, chat memory, and agent memory tool.
Replaces VectorStoreIndex.as_retriever(). Use with any LlamaIndex engine.
A BaseMemory that persists every turn, so your ChatEngine never forgets between deploys.
Let the agent decide when to recall — REM exposes well as a FunctionTool.
Everything a pure-vector LlamaIndex stack leaves on the table.
Every ChatEngine turn lands in the same store your next process reads from — no pickle files, no Redis wrangling.
Vector + FTS5 fusion + neural rerank. Beats vector-only retrieval on proper nouns, acronyms, and temporal queries.
Dream Engine summarizes, de-duplicates, and builds an entity graph on top of your LlamaIndex corpus — no extra pipeline.
The same namespace feeds your LlamaIndex retriever, your LangChain chain, and your Cursor MCP server. One corpus, every consumer.
Free tier, no credit card. Ship a ReactAgent that remembers across deploys in under a minute.