+
REM Labs · LlamaIndex

Memory that persists across every LlamaIndex session

LlamaIndex gives you state-of-the-art retrieval — but ChatMemoryBuffer still dies at process end, and pure vector stores max out around 67% recall. REM pairs LlamaIndex's pipelines with a persistent vector + FTS + entity-graph substrate that hits 94.6% on LongMemEval.

Free tier · No credit card · Works with llama-index (Python) and llamaindex (TS)

Why LlamaIndex + REM

What a persistent memory substrate adds to a retrieval framework.

Continuity across sessions

Your ChatEngine resumes where it left off — weeks later, from a different process, on a different machine. Nothing to rehydrate.

9 Dream Engine strategies

Summarize, link, dedupe, score salience, detect contradictions — run nightly against your LlamaIndex corpus so retrieval accuracy improves without a re-embed job.

Cross-tool portability

The embeddings LlamaIndex produces, the chat history it buffers, the entities it extracts — all land in the same namespace that LangChain, CrewAI, AutoGen, and your Cursor MCP share.

Install in 60 seconds

LlamaIndex already speaks HTTP retrievers. REM is a retriever.

The Python and TS SDKs are in private beta. The requests/fetch pattern below is production-ready today. Request SDK access in Discord.

1

Get your API key

Sign up at remlabs.ai/console. Copy the sk-rem-... key and export it.

pip install llama-index requests # export REM_API_KEY="sk-rem-..."
2

Seed memory from any Document

LlamaIndex chunks documents; REM stores them as first-class memories that survive rebuilds.

import os, requests from llama_index.core import SimpleDirectoryReader docs = SimpleDirectoryReader("./notes").load_data() H = {"Authorization": f"Bearer {os.environ['REM_API_KEY']}", "Content-Type": "application/json"} for i, d in enumerate(docs): requests.post("https://remlabs.ai/v1/memory-set", headers=H, json={ "namespace": "research-assistant", "key": f"doc_{i}", "value": d.text[:8000], "metadata": {"source": d.metadata.get("file_name")}, })
3

Wrap REM as a LlamaIndex retriever

Subclass BaseRetriever, hit /v1/memory-search-semantic, return NodeWithScore. Plug into any QueryEngine, ChatEngine, or ReactAgent.

Common patterns

Three shapes: custom retriever, chat memory, and agent memory tool.

1. REM-backed BaseRetriever for QueryEngine

Python

Replaces VectorStoreIndex.as_retriever(). Use with any LlamaIndex engine.

import os, requests from llama_index.core.retrievers import BaseRetriever from llama_index.core.schema import NodeWithScore, TextNode from llama_index.core.query_engine import RetrieverQueryEngine from llama_index.llms.openai import OpenAI REM = "https://remlabs.ai/v1" H = {"Authorization": f"Bearer {os.environ['REM_API_KEY']}", "Content-Type": "application/json"} class RemRetriever(BaseRetriever): def __init__(self, namespace: str, top_k: int = 8): self.namespace = namespace; self.top_k = top_k; super().__init__() def _retrieve(self, query_bundle): r = requests.post(f"{REM}/memory-search-semantic", headers=H, json={ "namespace": self.namespace, "query": query_bundle.query_str, "limit": self.top_k, }).json() return [ NodeWithScore(node=TextNode(text=m["value"], id_=m.get("key", "")), score=m.get("score", 0.0)) for m in r.get("results", []) ] engine = RetrieverQueryEngine.from_args( retriever=RemRetriever(namespace="research-assistant"), llm=OpenAI(model="gpt-4o"), ) print(engine.query("What did we decide about pricing in the Q3 review?").response)

2. Persistent chat memory for ChatEngine

Python

A BaseMemory that persists every turn, so your ChatEngine never forgets between deploys.

from llama_index.core.memory import BaseMemory from llama_index.core.llms import ChatMessage class RemChatMemory(BaseMemory): namespace: str def get(self, input=None): r = requests.post(f"{REM}/memory-search-semantic", headers=H, json={ "namespace": self.namespace, "query": input or "recent conversation", "limit": 12, }).json() return [ChatMessage(role="user" if i%2==0 else "assistant", content=m["value"]) for i, m in enumerate(r.get("results", []))] def put(self, msg: ChatMessage): requests.post(f"{REM}/memory-set", headers=H, json={ "namespace": self.namespace, "key": f"turn_{msg.role}_{abs(hash(msg.content)) % 10**8}", "value": f"{msg.role}: {msg.content}", }) def reset(self): pass def get_all(self): return self.get() def set(self, messages): [self.put(m) for m in messages]

3. Memory tool for a ReactAgent

Python

Let the agent decide when to recall — REM exposes well as a FunctionTool.

from llama_index.core.tools import FunctionTool from llama_index.core.agent import ReActAgent def recall(query: str) -> str: r = requests.post(f"{REM}/memory-search-semantic", headers=H, json={"namespace": "research-assistant", "query": query, "limit": 6}).json() return "\n".join(f"- {m['value'][:200]}" for m in r.get("results", [])) or "(nothing)" def remember(fact: str) -> str: requests.post(f"{REM}/memory-set", headers=H, json={"namespace": "research-assistant", "key": fact[:40], "value": fact}) return "stored" agent = ReActAgent.from_tools( [FunctionTool.from_defaults(fn=recall), FunctionTool.from_defaults(fn=remember)], llm=OpenAI(model="gpt-4o"), verbose=True, ) agent.chat("Remember that our Q3 pricing is $29/mo. Now what's our price?")

What you get

Everything a pure-vector LlamaIndex stack leaves on the table.

Cross-session memory

Every ChatEngine turn lands in the same store your next process reads from — no pickle files, no Redis wrangling.

Semantic search under 180ms

Vector + FTS5 fusion + neural rerank. Beats vector-only retrieval on proper nouns, acronyms, and temporal queries.

Overnight consolidation

Dream Engine summarizes, de-duplicates, and builds an entity graph on top of your LlamaIndex corpus — no extra pipeline.

Shared across all your agents

The same namespace feeds your LlamaIndex retriever, your LangChain chain, and your Cursor MCP server. One corpus, every consumer.

Give your LlamaIndex pipelines a real memory

Free tier, no credit card. Ship a ReactAgent that remembers across deploys in under a minute.

Get API key Read full docs