Integrations Tutorial April 13, 2026

Local AI Memory with Ollama + REM Labs

Ollama lets you run LLMs locally -- Llama 3, Mistral, Phi, Gemma, and dozens more. But local models have the same memory problem as cloud APIs: they forget everything between sessions. This guide shows how to pair your self-hosted Ollama setup with REM Labs for persistent, searchable memory that survives restarts.

Why Local LLMs Need Persistent Memory

Running models locally with Ollama gives you privacy, zero API costs, and complete control. But Ollama's chat endpoint is stateless -- each request starts from scratch. If you close your terminal, restart Ollama, or start a new conversation, all prior context is lost.

For personal assistants, local RAG pipelines, or development tools backed by Ollama, you need a memory layer that persists. REM Labs provides exactly this: a memory API with semantic search, full-text indexing, and entity extraction that works with any model, local or cloud.

Step 1: Set Up Ollama

If you have not already, install Ollama and pull a model.

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull llama3.1

# Verify it's running
ollama list

Ollama runs a local API server on http://localhost:11434 by default.

Step 2: Get a REM Labs API Key

Step 3: Store and Recall Memories

import requests

OLLAMA_BASE = "http://localhost:11434"
REM_BASE = "https://remlabs.ai"
REM_KEY = "sk-rem-..."

def chat_with_memory(user_msg, namespace="local-assistant"):
    # 1. Recall relevant memories
    search = requests.post(f"{REM_BASE}/v1/memory/search", json={
        "query": user_msg,
        "namespace": namespace,
        "limit": 5
    }, headers={"Authorization": f"Bearer {REM_KEY}"})

    memories = search.json().get("results", [])
    context = "\n".join([m["value"] for m in memories])

    # 2. Build prompt with context
    system = f"You are a helpful assistant. Relevant memories:\n{context}" if context else "You are a helpful assistant."

    # 3. Call Ollama
    resp = requests.post(f"{OLLAMA_BASE}/api/chat", json={
        "model": "llama3.1",
        "messages": [
            {"role": "system", "content": system},
            {"role": "user", "content": user_msg}
        ],
        "stream": False
    })
    reply = resp.json()["message"]["content"]

    # 4. Store the interaction
    requests.post(f"{REM_BASE}/v1/memory-set", json={
        "key": "ollama-chat",
        "value": f"User: {user_msg}\nAssistant: {reply}",
        "namespace": namespace,
        "tags": ["conversation"]
    }, headers={"Authorization": f"Bearer {REM_KEY}"})

    return reply

# Session 1
print(chat_with_memory("My name is Alex and I'm learning Rust."))

# Session 2 (even after Ollama restart)
print(chat_with_memory("What language am I learning?"))
# "You mentioned you're learning Rust."

Step 4: Node.js Version

const REM_BASE = "https://remlabs.ai";
const REM_KEY = "sk-rem-...";
const OLLAMA = "http://localhost:11434";

async function chatWithMemory(userMsg, namespace = "local-assistant") {
  // Recall
  const search = await fetch(`${REM_BASE}/v1/memory/search`, {
    method: "POST",
    headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" },
    body: JSON.stringify({ query: userMsg, namespace, limit: 5 })
  });
  const { results } = await search.json();
  const context = results.map(r => r.value).join("\n");

  // Chat with Ollama
  const resp = await fetch(`${OLLAMA}/api/chat`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: "llama3.1",
      messages: [
        { role: "system", content: `Relevant memories:\n${context}` },
        { role: "user", content: userMsg }
      ],
      stream: false
    })
  });
  const { message } = await resp.json();

  // Store
  await fetch(`${REM_BASE}/v1/memory-set`, {
    method: "POST",
    headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" },
    body: JSON.stringify({ key: "ollama-chat", value: `User: ${userMsg}\nAssistant: ${message.content}`, namespace })
  });

  return message.content;
}

Privacy Considerations

Your LLM inference stays entirely local -- Ollama never sends data to the cloud. The only data that leaves your machine is what you explicitly store via the REM Labs API. You control exactly what gets stored: full conversations, summaries, extracted facts, or nothing at all. If you need fully local memory too, the REM Labs API can be self-hosted.

Works with any Ollama model: Llama 3.1, Mistral, Phi-3, Gemma 2, CodeLlama, or any model in the Ollama library. The memory layer is model-agnostic.

Give your local LLM a memory

Free tier. No credit card. Works with every model Ollama supports.

Get started free →

Local AI Memory with Ollama + REM Labs

Why Local LLMs Need Persistent Memory

Step 1: Set Up Ollama

Step 2: Get a REM Labs API Key

Step 3: Store and Recall Memories

Step 4: Node.js Version

Privacy Considerations

Related articles

Give your local LLM a memory