How to Add Persistent Memory to OpenAI Agents

OpenAI's chat completions API is stateless -- every request starts from scratch. If your GPT-4 agent needs to remember a user's name, preferences, or past conversations, you have to manage that yourself. This guide shows you how to wire REM Labs into any OpenAI-based agent so it stores and recalls context across sessions automatically.

Why OpenAI Agents Need External Memory

The OpenAI API does not persist conversation history between requests. Each call to /v1/chat/completions receives only the messages you send in that request body. ChatGPT's built-in memory is limited to the ChatGPT product and is not available through the API. If you are building a custom agent, assistant, or chatbot on top of GPT-4 or GPT-4o, you need your own memory layer.

Most developers start by stuffing previous messages into the system prompt. That works until you hit the context window limit, at which point older context gets silently dropped. REM Labs gives you a persistent memory backend with semantic search, entity extraction, and multi-signal retrieval -- so your agent can remember thousands of interactions and recall the right ones when they matter.

Step 1: Get Your API Key

Sign up at remlabs.ai/console or run npx @remlabs/memory from your terminal. The free tier includes 1,000 memory operations per month -- enough to build and test a full agent.

Step 2: Store Memories After Each Interaction

After your OpenAI agent generates a response, store the exchange as a memory using the REM Labs API.

import openai import requests OPENAI_KEY = "sk-..." REM_KEY = "sk-slop-..." REM_BASE = "https://api.api.remlabs.ai" messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "I'm moving to Austin next month for a job at Stripe."} ] # Get the OpenAI response client = openai.OpenAI(api_key=OPENAI_KEY) resp = client.chat.completions.create(model="gpt-4o", messages=messages) reply = resp.choices[0].message.content # Store the interaction as a memory requests.post(f"{REM_BASE}/v1/memory-set", json={ "key": "openai-agent", "value": f"User: {messages[-1]['content']}\nAssistant: {reply}", "namespace": "user-123", "tags": ["conversation"] }, headers={"Authorization": f"Bearer {REM_KEY}"})

Each call to POST /v1/memory-set stores the memory with a vector embedding, full-text index, and entity extraction -- all automatically. The namespace parameter keeps each user's memories isolated.

Step 3: Recall Relevant Context Before Each Request

Before sending a new message to OpenAI, search the user's memory store for relevant context and inject it into the system prompt.

# User sends a new message user_msg = "What city am I moving to?" # Search for relevant memories search = requests.post(f"{REM_BASE}/v1/memory/search", json={ "query": user_msg, "namespace": "user-123", "limit": 5 }, headers={"Authorization": f"Bearer {REM_KEY}"}) memories = search.json().get("results", []) context = "\n".join([m["value"] for m in memories]) # Build the prompt with recalled memories messages = [ {"role": "system", "content": f"You are a helpful assistant. Relevant context from previous conversations:\n{context}"}, {"role": "user", "content": user_msg} ] resp = client.chat.completions.create(model="gpt-4o", messages=messages) print(resp.choices[0].message.content) # Output: "You mentioned you're moving to Austin next month for a job at Stripe."

The search endpoint uses multi-signal fusion -- combining vector similarity, full-text matching, and entity graph lookups -- to find the most relevant memories. This is how REM Labs achieves 90% recall accuracy on the LongMemEval benchmark.

Step 4: Node.js Example

The same pattern works in JavaScript with the OpenAI Node SDK.

import OpenAI from "openai"; const openai = new OpenAI({ apiKey: "sk-..." }); const REM_BASE = "https://api.api.remlabs.ai"; const REM_KEY = "sk-slop-..."; // Store a memory await fetch(`${REM_BASE}/v1/memory-set`, { method: "POST", headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" }, body: JSON.stringify({ key: "openai-agent", value: "User prefers concise responses. Works at Stripe in Austin.", namespace: "user-123" }) }); // Recall before next request const search = await fetch(`${REM_BASE}/v1/memory/search`, { method: "POST", headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" }, body: JSON.stringify({ query: "user preferences", namespace: "user-123", limit: 5 }) }); const { results } = await search.json(); const context = results.map(r => r.value).join("\n");

What Gets Stored and Indexed

Every memory written through the REM Labs API is automatically indexed three ways:

You do not configure any of this. It happens at write time. When you search, all three retrieval paths run in parallel and results are fused using reciprocal rank fusion for maximum recall.

API reference: Full documentation for /v1/memory-set, /v1/memory/search, namespace management, and tag filtering is available in the developer docs.

Give your OpenAI agent a memory

Free tier. No credit card. Works with GPT-4, GPT-4o, and any OpenAI model.

Get started free →