How to Add Persistent Memory to a LangChain Agent

LangChain's built-in memory classes reset when your process dies. This guide shows how to replace them with REM Labs -- a persistent memory backend that survives restarts, works across sessions, and scores 90% on LongMemEval. Three new lines of code. Nothing else changes.

The Problem with LangChain's Default Memory

LangChain ships with several memory classes -- ConversationBufferMemory, ConversationSummaryMemory, ConversationEntityMemory. They all work the same way: conversation history is stored in a Python object in RAM. When your server restarts, when your lambda cold-starts, when your notebook kernel dies -- the memory is gone.

For demos, that is fine. For anything in production, you need memory that persists. That means an external store with semantic search, entity extraction, and multi-signal retrieval. That is what REM Labs provides.

Step 1: Install

pip install remlabs-memory

The Python SDK wraps the REM Labs API. You will also need an API key -- get one free at remlabs.ai/console or by running npx @remlabs/memory.

Step 2: Configure REM as the Memory Backend

from langchain.memory import ConversationBufferMemory from remlabs.integrations.langchain import RemLabsMemory memory = RemLabsMemory( api_key="sk-slop-...", namespace="langchain-agent", search_type="rrf" # multi-signal fusion for best recall )

The RemLabsMemory class implements LangChain's memory interface, so it drops in anywhere you would use ConversationBufferMemory. The namespace parameter isolates this agent's memories from other agents or users. The search_type="rrf" option enables multi-signal fusion search -- combining vector similarity, full-text, and entity graph lookups for 90% recall accuracy.

Step 3: Use It in a Chain

from langchain.chains import ConversationChain from langchain_openai import ChatOpenAI llm = ChatOpenAI(model="gpt-4o") chain = ConversationChain( llm=llm, memory=memory ) # Session 1: tell the agent something chain.predict(input="My favorite language is Python and I work at Acme Corp.") # Session 2 (even after restart): it remembers chain.predict(input="What company do I work at?")

The second call returns an answer referencing Acme Corp -- even if the process was restarted between calls. REM stores each conversation turn as a separate memory unit with its own embedding, full-text index entry, and entity extraction pass. When the chain calls memory.load_memory_variables(), REM runs a multi-path retrieval query and returns the most relevant context.

Step 4: Search Memories Directly

You can also query the memory store outside the chain, which is useful for building agent tools or debugging.

from remlabs import RemMemory mem = RemMemory(api_key="sk-slop-...") # Semantic search across all stored memories results = mem.search("user preferences", namespace="langchain-agent", limit=5) for r in results: print(r["value"], r["score"])

What Gets Stored

Every memory stored through the REM Labs integration is automatically indexed three ways:

You do not need to configure any of this. It happens at write time, automatically.

Why Not Just Use a Vector Database?

You could wire Pinecone or Weaviate into LangChain's memory interface yourself. But vector-only retrieval tops out around 50-67% accuracy on real memory benchmarks. The hard queries -- proper nouns, temporal reasoning, knowledge updates -- require full-text search, entity graphs, and neural reranking working together. REM Labs runs all of these in parallel and fuses the results. That is how it reaches 90% on LongMemEval.

Full API docs: The complete LangChain integration reference, including namespace management, tag filtering, and metadata queries, is in the integration docs.

Give your LangChain agent a memory

Free tier. No credit card. pip install and go.

Get started free →