Integrations Tutorial April 13, 2026

Add Knowledge Persistence to Replicate Models

Replicate makes it easy to run machine learning models in the cloud -- from text generation to image creation to audio processing. But each prediction run is isolated. This guide shows how to add a persistent memory layer to your Replicate-powered applications using REM Labs, so your models can build on previous interactions.

Why Replicate Models Need Persistent Memory

Replicate runs models as stateless prediction endpoints. You send input, get output, and the model forgets. For one-off predictions, that is fine. But if you are building an application where users interact with a model repeatedly -- a personal assistant, a creative tool, a coding companion -- you need state that persists between runs.

Replicate does not provide a built-in memory solution. That is where REM Labs comes in: a memory API that stores context with vector embeddings, full-text indexing, and entity extraction, and retrieves the most relevant context when your model needs it.

Step 1: Get Your API Keys

Get a Replicate API token from replicate.com and a REM Labs key from remlabs.ai/console or run npx @remlabs/memory.

Step 2: Store and Recall with Replicate

import replicate
import requests

REM_KEY = "sk-rem-..."
REM_BASE = "https://remlabs.ai"

def chat_with_memory(user_msg, namespace="replicate-agent"):
    # Recall relevant memories
    search = requests.post(f"{REM_BASE}/v1/memory/search", json={
        "query": user_msg,
        "namespace": namespace,
        "limit": 5
    }, headers={"Authorization": f"Bearer {REM_KEY}"})

    memories = search.json().get("results", [])
    context = "\n".join([m["value"] for m in memories])

    # Build prompt with context
    system_prompt = "You are a helpful assistant."
    if context:
        system_prompt += f"\n\nRelevant context from prior conversations:\n{context}"

    # Run on Replicate
    output = replicate.run(
        "meta/meta-llama-3.1-70b-instruct",
        input={
            "prompt": user_msg,
            "system_prompt": system_prompt,
            "max_tokens": 1024
        }
    )
    reply = "".join(output)

    # Store the interaction
    requests.post(f"{REM_BASE}/v1/memory-set", json={
        "key": "replicate-chat",
        "value": f"User: {user_msg}\nAssistant: {reply}",
        "namespace": namespace,
        "tags": ["conversation"]
    }, headers={"Authorization": f"Bearer {REM_KEY}"})

    return reply

# Session 1
print(chat_with_memory("I'm a product designer. I work primarily in Figma."))

# Session 2
print(chat_with_memory("What design tool do I use?"))
# "You mentioned you work primarily in Figma."

Step 3: Node.js Example

import Replicate from "replicate";

const replicate = new Replicate();
const REM_BASE = "https://remlabs.ai";
const REM_KEY = "sk-rem-...";

async function chatWithMemory(userMsg, namespace = "replicate-agent") {
  // Recall
  const search = await fetch(`${REM_BASE}/v1/memory/search`, {
    method: "POST",
    headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" },
    body: JSON.stringify({ query: userMsg, namespace, limit: 5 })
  });
  const { results } = await search.json();
  const context = results.map(r => r.value).join("\n");

  // Run on Replicate
  const output = await replicate.run("meta/meta-llama-3.1-70b-instruct", {
    input: {
      prompt: userMsg,
      system_prompt: context ? `Relevant memories:\n${context}` : "You are a helpful assistant.",
      max_tokens: 1024
    }
  });
  const reply = output.join("");

  // Store
  await fetch(`${REM_BASE}/v1/memory-set`, {
    method: "POST",
    headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" },
    body: JSON.stringify({ key: "replicate-chat", value: `User: ${userMsg}\nAssistant: ${reply}`, namespace })
  });

  return reply;
}

Beyond Text: Multimodal Memory

Replicate hosts models for images, audio, video, and more. You can store the results or descriptions of any prediction as a memory. For example, if a user generates images with Stable Diffusion on Replicate, you can store the prompts and descriptions as memories. Next time they ask "make something like what I created last week," your app can recall the relevant context.

Image generation -- store prompts, styles, and user preferences
Audio transcription -- store transcripts and extracted entities
Code generation -- store project context and coding patterns

API reference: Full documentation for /v1/memory-set, /v1/memory/search, namespaces, tags, and metadata is available in the developer docs.

Give your Replicate models a memory

Free tier. No credit card. Works with any model on Replicate.

Get started free →

Add Knowledge Persistence to Replicate Models

Why Replicate Models Need Persistent Memory

Step 1: Get Your API Keys

Step 2: Store and Recall with Replicate

Step 3: Node.js Example

Beyond Text: Multimodal Memory

Related articles

Give your Replicate models a memory