Local AI Memory with Ollama + REM Labs

Ollama lets you run LLMs locally -- Llama 3, Mistral, Phi, Gemma, and dozens more. But local models have the same memory problem as cloud APIs: they forget everything between sessions. This guide shows how to pair your self-hosted Ollama setup with REM Labs for persistent, searchable memory that survives restarts.

Why Local LLMs Need Persistent Memory

Running models locally with Ollama gives you privacy, zero API costs, and complete control. But Ollama's chat endpoint is stateless -- each request starts from scratch. If you close your terminal, restart Ollama, or start a new conversation, all prior context is lost.

For personal assistants, local RAG pipelines, or development tools backed by Ollama, you need a memory layer that persists. REM Labs provides exactly this: a memory API with semantic search, full-text indexing, and entity extraction that works with any model, local or cloud.

Step 1: Set Up Ollama

If you have not already, install Ollama and pull a model.

# Install Ollama (macOS/Linux) curl -fsSL https://ollama.com/install.sh | sh # Pull a model ollama pull llama3.1 # Verify it's running ollama list

Ollama runs a local API server on http://localhost:11434 by default.

Step 2: Get a REM Labs API Key

Sign up at remlabs.ai/console or run npx @remlabs/memory. The free tier includes 1,000 memory operations per month.

Step 3: Store and Recall Memories

import requests OLLAMA_BASE = "http://localhost:11434" REM_BASE = "https://api.api.remlabs.ai" REM_KEY = "sk-slop-..." def chat_with_memory(user_msg, namespace="local-assistant"): # 1. Recall relevant memories search = requests.post(f"{REM_BASE}/v1/memory/search", json={ "query": user_msg, "namespace": namespace, "limit": 5 }, headers={"Authorization": f"Bearer {REM_KEY}"}) memories = search.json().get("results", []) context = "\n".join([m["value"] for m in memories]) # 2. Build prompt with context system = f"You are a helpful assistant. Relevant memories:\n{context}" if context else "You are a helpful assistant." # 3. Call Ollama resp = requests.post(f"{OLLAMA_BASE}/api/chat", json={ "model": "llama3.1", "messages": [ {"role": "system", "content": system}, {"role": "user", "content": user_msg} ], "stream": False }) reply = resp.json()["message"]["content"] # 4. Store the interaction requests.post(f"{REM_BASE}/v1/memory-set", json={ "key": "ollama-chat", "value": f"User: {user_msg}\nAssistant: {reply}", "namespace": namespace, "tags": ["conversation"] }, headers={"Authorization": f"Bearer {REM_KEY}"}) return reply # Session 1 print(chat_with_memory("My name is Alex and I'm learning Rust.")) # Session 2 (even after Ollama restart) print(chat_with_memory("What language am I learning?")) # "You mentioned you're learning Rust."

Step 4: Node.js Version

const REM_BASE = "https://api.api.remlabs.ai"; const REM_KEY = "sk-slop-..."; const OLLAMA = "http://localhost:11434"; async function chatWithMemory(userMsg, namespace = "local-assistant") { // Recall const search = await fetch(`${REM_BASE}/v1/memory/search`, { method: "POST", headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" }, body: JSON.stringify({ query: userMsg, namespace, limit: 5 }) }); const { results } = await search.json(); const context = results.map(r => r.value).join("\n"); // Chat with Ollama const resp = await fetch(`${OLLAMA}/api/chat`, { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ model: "llama3.1", messages: [ { role: "system", content: `Relevant memories:\n${context}` }, { role: "user", content: userMsg } ], stream: false }) }); const { message } = await resp.json(); // Store await fetch(`${REM_BASE}/v1/memory-set`, { method: "POST", headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" }, body: JSON.stringify({ key: "ollama-chat", value: `User: ${userMsg}\nAssistant: ${message.content}`, namespace }) }); return message.content; }

Privacy Considerations

Your LLM inference stays entirely local -- Ollama never sends data to the cloud. The only data that leaves your machine is what you explicitly store via the REM Labs API. You control exactly what gets stored: full conversations, summaries, extracted facts, or nothing at all. If you need fully local memory too, the REM Labs API can be self-hosted.

Works with any Ollama model: Llama 3.1, Mistral, Phi-3, Gemma 2, CodeLlama, or any model in the Ollama library. The memory layer is model-agnostic.

Give your local LLM a memory

Free tier. No credit card. Works with every model Ollama supports.

Get started free →