REM Labs · LlamaIndex

Memory that persists across every LlamaIndex session

LlamaIndex gives you state-of-the-art retrieval — but ChatMemoryBuffer still dies at process end, and pure vector stores max out around 67% recall. REM pairs LlamaIndex's pipelines with a persistent vector + FTS + entity-graph substrate that delivers +15.33pp on SWE-bench Lite (n=150, p<0.05, paired bootstrap).

Get API key Full docs

Free tier · No credit card · Works with llama-index (Python) and llamaindex (TS)

Why LlamaIndex + REM

What a persistent memory substrate adds to a retrieval framework.

Continuity across sessions

Your ChatEngine resumes where it left off — weeks later, from a different process, on a different machine. Nothing to rehydrate.

9 Dream Engine strategies

Summarize, link, dedupe, score salience, detect contradictions — run nightly against your LlamaIndex corpus so retrieval accuracy improves without a re-embed job.

Cross-tool portability

The embeddings LlamaIndex produces, the chat history it buffers, the entities it extracts — all land in the same namespace that LangChain, CrewAI, AutoGen, and your Cursor MCP share.

Install in 60 seconds

LlamaIndex already speaks HTTP retrievers. REM is a retriever.

The Python and TS SDKs are in private beta. The requests/fetch pattern below is production-ready today. Request SDK access in Discord.

Get your API key

pip install llama-index requests # export REM_API_KEY="sk-rem-..."

Seed memory from any Document

LlamaIndex chunks documents; REM stores them as first-class memories that survive rebuilds.

import os, requests
from llama_index.core import SimpleDirectoryReader

docs = SimpleDirectoryReader("./notes").load_data()
H = {"Authorization": f"Bearer {os.environ['REM_API_KEY']}", "Content-Type": "application/json"}

for i, d in enumerate(docs):
    requests.post("https://remlabs.ai/v1/memory-set", headers=H, json={
        "namespace": "research-assistant",
        "key": f"doc_{i}",
        "value": d.text[:8000],
        "metadata": {"source": d.metadata.get("file_name")},
    })

Wrap REM as a LlamaIndex retriever

Subclass BaseRetriever, hit /v1/memory-search-semantic, return NodeWithScore. Plug into any QueryEngine, ChatEngine, or ReactAgent.

Common patterns

Three shapes: custom retriever, chat memory, and agent memory tool.

1. REM-backed BaseRetriever for QueryEngine

Python

Replaces VectorStoreIndex.as_retriever(). Use with any LlamaIndex engine.

import os, requests
from llama_index.core.retrievers import BaseRetriever
from llama_index.core.schema import NodeWithScore, TextNode
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.llms.openai import OpenAI

REM = "https://remlabs.ai/v1"
H = {"Authorization": f"Bearer {os.environ['REM_API_KEY']}", "Content-Type": "application/json"}

class RemRetriever(BaseRetriever):
    def __init__(self, namespace: str, top_k: int = 8):
        self.namespace = namespace; self.top_k = top_k; super().__init__()

    def _retrieve(self, query_bundle):
        r = requests.post(f"{REM}/memory-search-semantic", headers=H, json={
            "namespace": self.namespace, "query": query_bundle.query_str, "limit": self.top_k,
        }).json()
        return [
            NodeWithScore(node=TextNode(text=m["value"], id_=m.get("key", "")),
                         score=m.get("score", 0.0))
            for m in r.get("results", [])
        ]

engine = RetrieverQueryEngine.from_args(
    retriever=RemRetriever(namespace="research-assistant"),
    llm=OpenAI(model="gpt-4o"),
)
print(engine.query("What did we decide about pricing in the Q3 review?").response)

2. Persistent chat memory for ChatEngine

Python

A BaseMemory that persists every turn, so your ChatEngine never forgets between deploys.

from llama_index.core.memory import BaseMemory
from llama_index.core.llms import ChatMessage

class RemChatMemory(BaseMemory):
    namespace: str

    def get(self, input=None):
        r = requests.post(f"{REM}/memory-search-semantic", headers=H, json={
            "namespace": self.namespace, "query": input or "recent conversation", "limit": 12,
        }).json()
        return [ChatMessage(role="user" if i%2==0 else "assistant", content=m["value"])
                for i, m in enumerate(r.get("results", []))]

    def put(self, msg: ChatMessage):
        requests.post(f"{REM}/memory-set", headers=H, json={
            "namespace": self.namespace,
            "key": f"turn_{msg.role}_{abs(hash(msg.content)) % 10**8}",
            "value": f"{msg.role}: {msg.content}",
        })

    def reset(self): pass
    def get_all(self): return self.get()
    def set(self, messages): [self.put(m) for m in messages]

3. Memory tool for a ReactAgent

Python

Let the agent decide when to recall — REM exposes well as a FunctionTool.

from llama_index.core.tools import FunctionTool
from llama_index.core.agent import ReActAgent

def recall(query: str) -> str:
    r = requests.post(f"{REM}/memory-search-semantic", headers=H,
        json={"namespace": "research-assistant", "query": query, "limit": 6}).json()
    return "\n".join(f"- {m['value'][:200]}" for m in r.get("results", [])) or "(nothing)"

def remember(fact: str) -> str:
    requests.post(f"{REM}/memory-set", headers=H,
        json={"namespace": "research-assistant", "key": fact[:40], "value": fact})
    return "stored"

agent = ReActAgent.from_tools(
    [FunctionTool.from_defaults(fn=recall), FunctionTool.from_defaults(fn=remember)],
    llm=OpenAI(model="gpt-4o"), verbose=True,
)
agent.chat("Remember that our Q3 pricing is $29/mo. Now what's our price?")

What you get

Everything a pure-vector LlamaIndex stack leaves on the table.

Cross-session memory

Every ChatEngine turn lands in the same store your next process reads from — no pickle files, no Redis wrangling.

Semantic search under 180ms

Vector + FTS5 fusion + neural rerank. Beats vector-only retrieval on proper nouns, acronyms, and temporal queries.

Overnight consolidation

Dream Engine summarizes, de-duplicates, and builds an entity graph on top of your LlamaIndex corpus — no extra pipeline.

Shared across all your agents

The same namespace feeds your LlamaIndex retriever, your LangChain chain, and your Cursor MCP server. One corpus, every consumer.

Memory that persists across every LlamaIndex session

Why LlamaIndex + REM

Continuity across sessions

9 Dream Engine strategies

Cross-tool portability

Install in 60 seconds

Get your API key

Seed memory from any Document

Wrap REM as a LlamaIndex retriever

Common patterns

1. REM-backed BaseRetriever for QueryEngine

2. Persistent chat memory for ChatEngine

3. Memory tool for a ReactAgent

What you get

Cross-session memory

Semantic search under 180ms

Overnight consolidation

Shared across all your agents

Give your LlamaIndex pipelines a real memory