Ultra-Fast AI Memory with Groq + REM Labs

Groq's LPU inference engine delivers tokens faster than any other provider. But speed without memory means your agent forgets everything between requests. This guide shows how to pair Groq's blazing-fast inference with REM Labs persistent memory -- so your agent responds in milliseconds and remembers across sessions.

Why Groq + Persistent Memory

Groq is built for speed. Their custom LPU hardware delivers hundreds of tokens per second on models like Llama 3 and Mixtral. Developers choose Groq when latency matters -- real-time chat, voice interfaces, live coding assistants. But the Groq API is stateless, just like every other LLM API. Fast responses without context are just fast guesses.

Adding REM Labs to a Groq agent means you get sub-100ms memory retrieval on top of Groq's sub-second inference. The total round trip -- recall memories, generate response, store new context -- typically stays under 1.5 seconds. That is fast enough for real-time conversational AI that actually remembers.

Step 1: Get Your API Keys

Get a Groq API key from console.groq.com and a REM Labs key from remlabs.ai/console (or run npx @remlabs/memory). Both have free tiers.

Step 2: Build a Groq Agent with Memory

from groq import Groq import requests groq = Groq(api_key="gsk_...") REM_KEY = "sk-slop-..." REM_BASE = "https://api.api.remlabs.ai" def chat(user_msg, namespace="groq-agent"): # Recall relevant memories search = requests.post(f"{REM_BASE}/v1/memory/search", json={ "query": user_msg, "namespace": namespace, "limit": 5 }, headers={"Authorization": f"Bearer {REM_KEY}"}) memories = search.json().get("results", []) context = "\n".join([m["value"] for m in memories]) # Build messages with memory context messages = [] if context: messages.append({"role": "system", "content": f"Relevant context from prior conversations:\n{context}"}) messages.append({"role": "user", "content": user_msg}) # Fast inference via Groq resp = groq.chat.completions.create( model="llama-3.1-70b-versatile", messages=messages ) reply = resp.choices[0].message.content # Store the new interaction requests.post(f"{REM_BASE}/v1/memory-set", json={ "key": "groq-chat", "value": f"User: {user_msg}\nAssistant: {reply}", "namespace": namespace, "tags": ["conversation"] }, headers={"Authorization": f"Bearer {REM_KEY}"}) return reply # Session 1 print(chat("I'm building a voice assistant for my smart home. I use Home Assistant.")) # Session 2 print(chat("What platform am I using for my smart home?")) # "You mentioned you use Home Assistant for your smart home setup."

Step 3: Node.js with the Groq SDK

import Groq from "groq-sdk"; const groq = new Groq({ apiKey: "gsk_..." }); const REM_BASE = "https://api.api.remlabs.ai"; const REM_KEY = "sk-slop-..."; async function chat(userMsg, namespace = "groq-agent") { // Recall const search = await fetch(`${REM_BASE}/v1/memory/search`, { method: "POST", headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" }, body: JSON.stringify({ query: userMsg, namespace, limit: 5 }) }); const { results } = await search.json(); const context = results.map(r => r.value).join("\n"); // Groq inference const resp = await groq.chat.completions.create({ model: "llama-3.1-70b-versatile", messages: [ ...(context ? [{ role: "system", content: `Memories:\n${context}` }] : []), { role: "user", content: userMsg } ] }); const reply = resp.choices[0].message.content; // Store await fetch(`${REM_BASE}/v1/memory-set`, { method: "POST", headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" }, body: JSON.stringify({ key: "groq-chat", value: `User: ${userMsg}\nAssistant: ${reply}`, namespace }) }); return reply; }

Latency Breakdown

Here is what the typical round trip looks like with Groq + REM Labs:

The memory store can be fired asynchronously so it does not block the response. Total perceived latency is Groq inference plus memory recall -- typically well under one second for most responses.

Groq uses OpenAI-compatible endpoints: If you are already using the OpenAI SDK pointed at Groq, the same memory pattern from our OpenAI integration guide works with zero changes.

Give your Groq agent a memory

Free tier. No credit card. Fast memory for fast inference.

Get started free →