Integrations
Tutorial
April 13, 2026
Local AI Memory with Ollama + REM Labs
Ollama lets you run LLMs locally -- Llama 3, Mistral, Phi, Gemma, and dozens more. But local models have the same memory problem as cloud APIs: they forget everything between sessions. This guide shows how to pair your self-hosted Ollama setup with REM Labs for persistent, searchable memory that survives restarts.
Why Local LLMs Need Persistent Memory
Running models locally with Ollama gives you privacy, zero API costs, and complete control. But Ollama's chat endpoint is stateless -- each request starts from scratch. If you close your terminal, restart Ollama, or start a new conversation, all prior context is lost.
For personal assistants, local RAG pipelines, or development tools backed by Ollama, you need a memory layer that persists. REM Labs provides exactly this: a memory API with semantic search, full-text indexing, and entity extraction that works with any model, local or cloud.
Step 1: Set Up Ollama
If you have not already, install Ollama and pull a model.
# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model
ollama pull llama3.1
# Verify it's running
ollama list
Ollama runs a local API server on http://localhost:11434 by default.
Step 2: Get a REM Labs API Key
Sign up at remlabs.ai/console or run npx @remlabs/memory. The free tier includes 1,000 memory operations per month.
Step 3: Store and Recall Memories
import requests
OLLAMA_BASE = "http://localhost:11434"
REM_BASE = "https://api.api.remlabs.ai"
REM_KEY = "sk-slop-..."
def chat_with_memory(user_msg, namespace="local-assistant"):
# 1. Recall relevant memories
search = requests.post(f"{REM_BASE}/v1/memory/search", json={
"query": user_msg,
"namespace": namespace,
"limit": 5
}, headers={"Authorization": f"Bearer {REM_KEY}"})
memories = search.json().get("results", [])
context = "\n".join([m["value"] for m in memories])
# 2. Build prompt with context
system = f"You are a helpful assistant. Relevant memories:\n{context}" if context else "You are a helpful assistant."
# 3. Call Ollama
resp = requests.post(f"{OLLAMA_BASE}/api/chat", json={
"model": "llama3.1",
"messages": [
{"role": "system", "content": system},
{"role": "user", "content": user_msg}
],
"stream": False
})
reply = resp.json()["message"]["content"]
# 4. Store the interaction
requests.post(f"{REM_BASE}/v1/memory-set", json={
"key": "ollama-chat",
"value": f"User: {user_msg}\nAssistant: {reply}",
"namespace": namespace,
"tags": ["conversation"]
}, headers={"Authorization": f"Bearer {REM_KEY}"})
return reply
# Session 1
print(chat_with_memory("My name is Alex and I'm learning Rust."))
# Session 2 (even after Ollama restart)
print(chat_with_memory("What language am I learning?"))
# "You mentioned you're learning Rust."
Step 4: Node.js Version
const REM_BASE = "https://api.api.remlabs.ai";
const REM_KEY = "sk-slop-...";
const OLLAMA = "http://localhost:11434";
async function chatWithMemory(userMsg, namespace = "local-assistant") {
// Recall
const search = await fetch(`${REM_BASE}/v1/memory/search`, {
method: "POST",
headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" },
body: JSON.stringify({ query: userMsg, namespace, limit: 5 })
});
const { results } = await search.json();
const context = results.map(r => r.value).join("\n");
// Chat with Ollama
const resp = await fetch(`${OLLAMA}/api/chat`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "llama3.1",
messages: [
{ role: "system", content: `Relevant memories:\n${context}` },
{ role: "user", content: userMsg }
],
stream: false
})
});
const { message } = await resp.json();
// Store
await fetch(`${REM_BASE}/v1/memory-set`, {
method: "POST",
headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" },
body: JSON.stringify({ key: "ollama-chat", value: `User: ${userMsg}\nAssistant: ${message.content}`, namespace })
});
return message.content;
}
Privacy Considerations
Your LLM inference stays entirely local -- Ollama never sends data to the cloud. The only data that leaves your machine is what you explicitly store via the REM Labs API. You control exactly what gets stored: full conversations, summaries, extracted facts, or nothing at all. If you need fully local memory too, the REM Labs API can be self-hosted.
Works with any Ollama model: Llama 3.1, Mistral, Phi-3, Gemma 2, CodeLlama, or any model in the Ollama library. The memory layer is model-agnostic.
Give your local LLM a memory
Free tier. No credit card. Works with every model Ollama supports.
Get started free →