Integrations Tutorial April 13, 2026

How to Add Persistent Memory to OpenAI Agents

OpenAI's chat completions API is stateless -- every request starts from scratch. If your GPT-4 agent needs to remember a user's name, preferences, or past conversations, you have to manage that yourself. This guide shows you how to wire REM Labs into any OpenAI-based agent so it stores and recalls context across sessions automatically.

Why OpenAI Agents Need External Memory

The OpenAI API does not persist conversation history between requests. Each call to /v1/chat/completions receives only the messages you send in that request body. ChatGPT's built-in memory is limited to the ChatGPT product and is not available through the API. If you are building a custom agent, assistant, or chatbot on top of GPT-4 or GPT-4o, you need your own memory layer.

Most developers start by stuffing previous messages into the system prompt. That works until you hit the context window limit, at which point older context gets silently dropped. REM Labs gives you a persistent memory backend with semantic search, entity extraction, and multi-signal retrieval -- so your agent can remember thousands of interactions and recall the right ones when they matter.

Step 1: Get Your API Key

Sign up at remlabs.ai/console or run npx @remlabs/memory from your terminal. The free tier includes 1,000 memory operations per month -- enough to build and test a full agent.

Step 2: Store Memories After Each Interaction

After your OpenAI agent generates a response, store the exchange as a memory using the REM Labs API.

import openai
import requests

OPENAI_KEY = "sk-..."
REM_KEY = "sk-rem-..."
REM_BASE = "https://remlabs.ai"

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "I'm moving to Austin next month for a job at Stripe."}
]

# Get the OpenAI response
client = openai.OpenAI(api_key=OPENAI_KEY)
resp = client.chat.completions.create(model="gpt-4o", messages=messages)
reply = resp.choices[0].message.content

# Store the interaction as a memory
requests.post(f"{REM_BASE}/v1/memory-set", json={
    "key": "openai-agent",
    "value": f"User: {messages[-1]['content']}\nAssistant: {reply}",
    "namespace": "user-123",
    "tags": ["conversation"]
}, headers={"Authorization": f"Bearer {REM_KEY}"})

Each call to POST /v1/memory-set stores the memory with a vector embedding, full-text index, and entity extraction -- all automatically. The namespace parameter keeps each user's memories isolated.

Step 3: Recall Relevant Context Before Each Request

Before sending a new message to OpenAI, search the user's memory store for relevant context and inject it into the system prompt.

# User sends a new message
user_msg = "What city am I moving to?"

# Search for relevant memories
search = requests.post(f"{REM_BASE}/v1/memory/search", json={
    "query": user_msg,
    "namespace": "user-123",
    "limit": 5
}, headers={"Authorization": f"Bearer {REM_KEY}"})

memories = search.json().get("results", [])
context = "\n".join([m["value"] for m in memories])

# Build the prompt with recalled memories
messages = [
    {"role": "system", "content": f"You are a helpful assistant. Relevant context from previous conversations:\n{context}"},
    {"role": "user", "content": user_msg}
]

resp = client.chat.completions.create(model="gpt-4o", messages=messages)
print(resp.choices[0].message.content)
# Output: "You mentioned you're moving to Austin next month for a job at Stripe."

The search endpoint uses multi-signal fusion -- combining vector similarity, full-text matching, and entity graph lookups -- to find the most relevant memories. This is how REM Labs achieves 94.6% recall accuracy on the LongMemEval benchmark.

Step 4: Node.js Example

The same pattern works in JavaScript with the OpenAI Node SDK.

import OpenAI from "openai";

const openai = new OpenAI({ apiKey: "sk-..." });
const REM_BASE = "https://remlabs.ai";
const REM_KEY = "sk-rem-...";

// Store a memory
await fetch(`${REM_BASE}/v1/memory-set`, {
  method: "POST",
  headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" },
  body: JSON.stringify({
    key: "openai-agent",
    value: "User prefers concise responses. Works at Stripe in Austin.",
    namespace: "user-123"
  })
});

// Recall before next request
const search = await fetch(`${REM_BASE}/v1/memory/search`, {
  method: "POST",
  headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" },
  body: JSON.stringify({ query: "user preferences", namespace: "user-123", limit: 5 })
});
const { results } = await search.json();
const context = results.map(r => r.value).join("\n");

What Gets Stored and Indexed

Every memory written through the REM Labs API is automatically indexed three ways:

Vector embedding -- for semantic similarity search ("things like this")
Full-text index -- for exact keyword, proper noun, and acronym matching
Entity graph -- extracted entities and relationships for structured queries

You do not configure any of this. It happens at write time. When you search, all three retrieval paths run in parallel and results are fused using reciprocal rank fusion for maximum recall.

API reference: Full documentation for /v1/memory-set, /v1/memory/search, namespace management, and tag filtering is available in the developer docs.

Give your OpenAI agent a memory

Free tier. No credit card. Works with GPT-4, GPT-4o, and any OpenAI model.

Get started free →

How to Add Persistent Memory to OpenAI Agents

Why OpenAI Agents Need External Memory

Step 1: Get Your API Key

Step 2: Store Memories After Each Interaction

Step 3: Recall Relevant Context Before Each Request

Step 4: Node.js Example

What Gets Stored and Indexed

Related articles

Give your OpenAI agent a memory