Add Memory to Haystack RAG Pipelines

Haystack pipelines excel at document retrieval and generation, but they have no built-in concept of conversational memory. This guide adds REM Labs as a custom Haystack component so your RAG pipelines remember previous interactions and retrieve context with 90% accuracy.

Why RAG Pipelines Need Memory

A standard Haystack RAG pipeline retrieves documents from a document store, then feeds them to an LLM. But it has no awareness of what the user asked yesterday, what the system answered last week, or what facts have been established across sessions. Every query starts from scratch.

Adding a persistent memory layer means your pipeline can blend document retrieval with conversational context -- giving answers that are both factually grounded and personally relevant.

Step 1: Install

pip install remlabs-memory haystack-ai

Step 2: Create a Haystack Memory Component

from haystack import component, Pipeline from remlabs import RemMemory @component class RemMemoryRetriever: def __init__(self, api_key: str, namespace: str, limit: int = 5): self.mem = RemMemory(api_key=api_key) self.namespace = namespace self.limit = limit @component.output_types(context=str) def run(self, query: str): results = self.mem.search(query, namespace=self.namespace, limit=self.limit) context = "\n".join([r["value"] for r in results]) return {"context": context} @component class RemMemoryWriter: def __init__(self, api_key: str, namespace: str): self.mem = RemMemory(api_key=api_key) self.namespace = namespace @component.output_types(stored=bool) def run(self, value: str, tags: list[str] = None): self.mem.store(value=value, namespace=self.namespace, tags=tags or []) return {"stored": True}

The RemMemoryRetriever component searches REM for relevant memories and returns them as a context string. The RemMemoryWriter persists new information after each interaction.

Step 3: Wire into a RAG Pipeline

from haystack.components.generators import OpenAIGenerator from haystack.components.builders import PromptBuilder template = """ Previous context: {{context}} Documents: {{documents}} Question: {{query}} Answer the question using the documents and any relevant previous context. """ pipeline = Pipeline() pipeline.add_component("memory", RemMemoryRetriever( api_key="sk-slop-...", namespace="haystack-rag" )) pipeline.add_component("prompt", PromptBuilder(template=template)) pipeline.add_component("llm", OpenAIGenerator(model="gpt-4o")) pipeline.connect("memory.context", "prompt.context") result = pipeline.run({ "memory": {"query": "deployment architecture"}, "prompt": {"query": "deployment architecture", "documents": docs} })

The memory component runs in parallel with your document retriever. Both feed into the prompt builder. The LLM sees document results and conversational memory side by side.

Step 4: Store Interactions

writer = RemMemoryWriter(api_key="sk-slop-...", namespace="haystack-rag") # After the pipeline produces a response writer.run( value=f"Q: {query}\nA: {response}", tags=["rag-session", "deployment"] )

What Gets Indexed

All three indexes are built at write time. At query time, multi-signal fusion combines results from all three for 90% recall on LongMemEval.

Works with any document store: REM handles conversational memory. Your existing Haystack document store (Elasticsearch, Weaviate, Qdrant) handles document retrieval. They complement each other in the same pipeline.

Give your Haystack pipeline a memory

Free tier. No credit card. pip install and go.

Get started free →