How to Add Persistent Memory to OpenAI Agents
OpenAI's chat completions API is stateless -- every request starts from scratch. If your GPT-4 agent needs to remember a user's name, preferences, or past conversations, you have to manage that yourself. This guide shows you how to wire REM Labs into any OpenAI-based agent so it stores and recalls context across sessions automatically.
Why OpenAI Agents Need External Memory
The OpenAI API does not persist conversation history between requests. Each call to /v1/chat/completions receives only the messages you send in that request body. ChatGPT's built-in memory is limited to the ChatGPT product and is not available through the API. If you are building a custom agent, assistant, or chatbot on top of GPT-4 or GPT-4o, you need your own memory layer.
Most developers start by stuffing previous messages into the system prompt. That works until you hit the context window limit, at which point older context gets silently dropped. REM Labs gives you a persistent memory backend with semantic search, entity extraction, and multi-signal retrieval -- so your agent can remember thousands of interactions and recall the right ones when they matter.
Step 1: Get Your API Key
Sign up at remlabs.ai/console or run npx @remlabs/memory from your terminal. The free tier includes 1,000 memory operations per month -- enough to build and test a full agent.
Step 2: Store Memories After Each Interaction
After your OpenAI agent generates a response, store the exchange as a memory using the REM Labs API.
Each call to POST /v1/memory-set stores the memory with a vector embedding, full-text index, and entity extraction -- all automatically. The namespace parameter keeps each user's memories isolated.
Step 3: Recall Relevant Context Before Each Request
Before sending a new message to OpenAI, search the user's memory store for relevant context and inject it into the system prompt.
The search endpoint uses multi-signal fusion -- combining vector similarity, full-text matching, and entity graph lookups -- to find the most relevant memories. This is how REM Labs achieves 90% recall accuracy on the LongMemEval benchmark.
Step 4: Node.js Example
The same pattern works in JavaScript with the OpenAI Node SDK.
What Gets Stored and Indexed
Every memory written through the REM Labs API is automatically indexed three ways:
- Vector embedding -- for semantic similarity search ("things like this")
- Full-text index -- for exact keyword, proper noun, and acronym matching
- Entity graph -- extracted entities and relationships for structured queries
You do not configure any of this. It happens at write time. When you search, all three retrieval paths run in parallel and results are fused using reciprocal rank fusion for maximum recall.
API reference: Full documentation for /v1/memory-set, /v1/memory/search, namespace management, and tag filtering is available in the developer docs.
Give your OpenAI agent a memory
Free tier. No credit card. Works with GPT-4, GPT-4o, and any OpenAI model.
Get started free →