Integrations
Tutorial
April 13, 2026
Add Knowledge Persistence to Replicate Models
Replicate makes it easy to run machine learning models in the cloud -- from text generation to image creation to audio processing. But each prediction run is isolated. This guide shows how to add a persistent memory layer to your Replicate-powered applications using REM Labs, so your models can build on previous interactions.
Why Replicate Models Need Persistent Memory
Replicate runs models as stateless prediction endpoints. You send input, get output, and the model forgets. For one-off predictions, that is fine. But if you are building an application where users interact with a model repeatedly -- a personal assistant, a creative tool, a coding companion -- you need state that persists between runs.
Replicate does not provide a built-in memory solution. That is where REM Labs comes in: a memory API that stores context with vector embeddings, full-text indexing, and entity extraction, and retrieves the most relevant context when your model needs it.
Step 1: Get Your API Keys
Get a Replicate API token from replicate.com and a REM Labs key from remlabs.ai/console or run npx @remlabs/memory.
Step 2: Store and Recall with Replicate
import replicate
import requests
REM_KEY = "sk-slop-..."
REM_BASE = "https://api.api.remlabs.ai"
def chat_with_memory(user_msg, namespace="replicate-agent"):
# Recall relevant memories
search = requests.post(f"{REM_BASE}/v1/memory/search", json={
"query": user_msg,
"namespace": namespace,
"limit": 5
}, headers={"Authorization": f"Bearer {REM_KEY}"})
memories = search.json().get("results", [])
context = "\n".join([m["value"] for m in memories])
# Build prompt with context
system_prompt = "You are a helpful assistant."
if context:
system_prompt += f"\n\nRelevant context from prior conversations:\n{context}"
# Run on Replicate
output = replicate.run(
"meta/meta-llama-3.1-70b-instruct",
input={
"prompt": user_msg,
"system_prompt": system_prompt,
"max_tokens": 1024
}
)
reply = "".join(output)
# Store the interaction
requests.post(f"{REM_BASE}/v1/memory-set", json={
"key": "replicate-chat",
"value": f"User: {user_msg}\nAssistant: {reply}",
"namespace": namespace,
"tags": ["conversation"]
}, headers={"Authorization": f"Bearer {REM_KEY}"})
return reply
# Session 1
print(chat_with_memory("I'm a product designer. I work primarily in Figma."))
# Session 2
print(chat_with_memory("What design tool do I use?"))
# "You mentioned you work primarily in Figma."
Step 3: Node.js Example
import Replicate from "replicate";
const replicate = new Replicate();
const REM_BASE = "https://api.api.remlabs.ai";
const REM_KEY = "sk-slop-...";
async function chatWithMemory(userMsg, namespace = "replicate-agent") {
// Recall
const search = await fetch(`${REM_BASE}/v1/memory/search`, {
method: "POST",
headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" },
body: JSON.stringify({ query: userMsg, namespace, limit: 5 })
});
const { results } = await search.json();
const context = results.map(r => r.value).join("\n");
// Run on Replicate
const output = await replicate.run("meta/meta-llama-3.1-70b-instruct", {
input: {
prompt: userMsg,
system_prompt: context ? `Relevant memories:\n${context}` : "You are a helpful assistant.",
max_tokens: 1024
}
});
const reply = output.join("");
// Store
await fetch(`${REM_BASE}/v1/memory-set`, {
method: "POST",
headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" },
body: JSON.stringify({ key: "replicate-chat", value: `User: ${userMsg}\nAssistant: ${reply}`, namespace })
});
return reply;
}
Beyond Text: Multimodal Memory
Replicate hosts models for images, audio, video, and more. You can store the results or descriptions of any prediction as a memory. For example, if a user generates images with Stable Diffusion on Replicate, you can store the prompts and descriptions as memories. Next time they ask "make something like what I created last week," your app can recall the relevant context.
- Image generation -- store prompts, styles, and user preferences
- Audio transcription -- store transcripts and extracted entities
- Code generation -- store project context and coding patterns
API reference: Full documentation for /v1/memory-set, /v1/memory/search, namespaces, tags, and metadata is available in the developer docs.
Give your Replicate models a memory
Free tier. No credit card. Works with any model on Replicate.
Get started free →