Add Knowledge Persistence to Replicate Models

Replicate makes it easy to run machine learning models in the cloud -- from text generation to image creation to audio processing. But each prediction run is isolated. This guide shows how to add a persistent memory layer to your Replicate-powered applications using REM Labs, so your models can build on previous interactions.

Why Replicate Models Need Persistent Memory

Replicate runs models as stateless prediction endpoints. You send input, get output, and the model forgets. For one-off predictions, that is fine. But if you are building an application where users interact with a model repeatedly -- a personal assistant, a creative tool, a coding companion -- you need state that persists between runs.

Replicate does not provide a built-in memory solution. That is where REM Labs comes in: a memory API that stores context with vector embeddings, full-text indexing, and entity extraction, and retrieves the most relevant context when your model needs it.

Step 1: Get Your API Keys

Get a Replicate API token from replicate.com and a REM Labs key from remlabs.ai/console or run npx @remlabs/memory.

Step 2: Store and Recall with Replicate

import replicate import requests REM_KEY = "sk-slop-..." REM_BASE = "https://api.api.remlabs.ai" def chat_with_memory(user_msg, namespace="replicate-agent"): # Recall relevant memories search = requests.post(f"{REM_BASE}/v1/memory/search", json={ "query": user_msg, "namespace": namespace, "limit": 5 }, headers={"Authorization": f"Bearer {REM_KEY}"}) memories = search.json().get("results", []) context = "\n".join([m["value"] for m in memories]) # Build prompt with context system_prompt = "You are a helpful assistant." if context: system_prompt += f"\n\nRelevant context from prior conversations:\n{context}" # Run on Replicate output = replicate.run( "meta/meta-llama-3.1-70b-instruct", input={ "prompt": user_msg, "system_prompt": system_prompt, "max_tokens": 1024 } ) reply = "".join(output) # Store the interaction requests.post(f"{REM_BASE}/v1/memory-set", json={ "key": "replicate-chat", "value": f"User: {user_msg}\nAssistant: {reply}", "namespace": namespace, "tags": ["conversation"] }, headers={"Authorization": f"Bearer {REM_KEY}"}) return reply # Session 1 print(chat_with_memory("I'm a product designer. I work primarily in Figma.")) # Session 2 print(chat_with_memory("What design tool do I use?")) # "You mentioned you work primarily in Figma."

Step 3: Node.js Example

import Replicate from "replicate"; const replicate = new Replicate(); const REM_BASE = "https://api.api.remlabs.ai"; const REM_KEY = "sk-slop-..."; async function chatWithMemory(userMsg, namespace = "replicate-agent") { // Recall const search = await fetch(`${REM_BASE}/v1/memory/search`, { method: "POST", headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" }, body: JSON.stringify({ query: userMsg, namespace, limit: 5 }) }); const { results } = await search.json(); const context = results.map(r => r.value).join("\n"); // Run on Replicate const output = await replicate.run("meta/meta-llama-3.1-70b-instruct", { input: { prompt: userMsg, system_prompt: context ? `Relevant memories:\n${context}` : "You are a helpful assistant.", max_tokens: 1024 } }); const reply = output.join(""); // Store await fetch(`${REM_BASE}/v1/memory-set`, { method: "POST", headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" }, body: JSON.stringify({ key: "replicate-chat", value: `User: ${userMsg}\nAssistant: ${reply}`, namespace }) }); return reply; }

Beyond Text: Multimodal Memory

Replicate hosts models for images, audio, video, and more. You can store the results or descriptions of any prediction as a memory. For example, if a user generates images with Stable Diffusion on Replicate, you can store the prompts and descriptions as memories. Next time they ask "make something like what I created last week," your app can recall the relevant context.

API reference: Full documentation for /v1/memory-set, /v1/memory/search, namespaces, tags, and metadata is available in the developer docs.

Give your Replicate models a memory

Free tier. No credit card. Works with any model on Replicate.

Get started free →