Integration
April 13, 2026
Redis + REM Labs: Fast AI Memory Caching
REM Labs handles persistent AI memory. Redis handles speed. This guide shows how to use Redis as a cache layer in front of REM Labs, cutting recall latency for repeated queries and enabling real-time AI features that need sub-10ms memory access.
The Cache-Through Pattern
The idea is simple: before calling rem.recall(), check Redis. If the query was recently asked, return the cached result. If not, call REM, cache the result, and return it. This eliminates redundant API calls and drops latency for hot queries to under 2ms.
Step 1: Install Dependencies
npm install @remlabs/sdk ioredis
Step 2: Build the Cached Memory Client
import { RemClient } from "@remlabs/sdk";
import Redis from "ioredis";
import crypto from "crypto";
const rem = new RemClient({ apiKey: process.env.REMLABS_API_KEY });
const redis = new Redis(process.env.REDIS_URL);
function cacheKey(query: string, namespace?: string): string {
const hash = crypto.createHash("sha256")
.update(`${namespace || ""}:${query}`).digest("hex").slice(0, 16);
return `rem:recall:${hash}`;
}
export async function cachedRecall(opts: {
query: string;
namespace?: string;
limit?: number;
ttl?: number;
}) {
const key = cacheKey(opts.query, opts.namespace);
const cached = await redis.get(key);
if (cached) {
return JSON.parse(cached);
}
const results = await rem.recall({
query: opts.query,
namespace: opts.namespace,
limit: opts.limit || 5
});
// Cache for 5 minutes by default
await redis.setex(key, opts.ttl || 300, JSON.stringify(results));
return results;
}
Step 3: Write-Through on Remember
When you store a new memory, invalidate related cache entries so stale results are not served:
export async function cachedRemember(opts: {
content: string;
namespace?: string;
tags?: string[];
metadata?: Record<string, any>;
}) {
// Store in REM Labs
const result = await rem.remember(opts);
// Invalidate cache for this namespace
const pattern = `rem:recall:*`;
const keys = await redis.keys(pattern);
if (keys.length > 0) {
await redis.del(...keys);
}
return result;
}
Step 4: Session Memory with Redis Pub/Sub
For real-time AI chat, use Redis to share session context across instances while REM handles long-term persistence:
const pub = new Redis(process.env.REDIS_URL);
const sub = new Redis(process.env.REDIS_URL);
// When a user sends a message, publish to the session channel
async function onUserMessage(sessionId: string, message: string) {
// Short-term: broadcast to all instances via Redis
await pub.publish(`session:${sessionId}`, JSON.stringify({
role: "user", content: message, ts: Date.now()
}));
// Long-term: persist important context to REM
await rem.remember({
content: message,
namespace: `session:${sessionId}`,
tags: ["conversation"],
metadata: { session_id: sessionId }
});
}
// Subscribe to session updates
sub.subscribe("session:*");
sub.on("message", (channel, message) => {
const data = JSON.parse(message);
// Update local session state in real-time
});
Performance Numbers
- Redis cache hit: 1-3ms (local) / 5-8ms (remote)
- REM Labs recall: 40-80ms (multi-signal fusion)
- Cache hit rate: Typically 60-80% for conversational AI workloads
TTL strategy: Use shorter TTLs (60s) for active conversations where context changes fast, and longer TTLs (300-600s) for reference queries like project docs or architecture decisions that change rarely.
Add sub-millisecond memory to your AI
Free tier. Redis cache layer. Multi-signal retrieval when you need depth.
Get Started