Integrations Tutorial April 13, 2026

Ultra-Fast AI Memory with Groq + REM Labs

Groq's LPU inference engine delivers tokens faster than any other provider. But speed without memory means your agent forgets everything between requests. This guide shows how to pair Groq's blazing-fast inference with REM Labs persistent memory -- so your agent responds in milliseconds and remembers across sessions.

Why Groq + Persistent Memory

Groq is built for speed. Their custom LPU hardware delivers hundreds of tokens per second on models like Llama 3 and Mixtral. Developers choose Groq when latency matters -- real-time chat, voice interfaces, live coding assistants. But the Groq API is stateless, just like every other LLM API. Fast responses without context are just fast guesses.

Adding REM Labs to a Groq agent means you get sub-100ms memory retrieval on top of Groq's sub-second inference. The total round trip -- recall memories, generate response, store new context -- typically stays under 1.5 seconds. That is fast enough for real-time conversational AI that actually remembers.

Step 1: Get Your API Keys

Get a Groq API key from console.groq.com and a REM Labs key from remlabs.ai/console (or run npx @remlabs/memory). Both have free tiers.

Step 2: Build a Groq Agent with Memory

from groq import Groq
import requests

groq = Groq(api_key="gsk_...")
REM_KEY = "sk-rem-..."
REM_BASE = "https://remlabs.ai"

def chat(user_msg, namespace="groq-agent"):
    # Recall relevant memories
    search = requests.post(f"{REM_BASE}/v1/memory/search", json={
        "query": user_msg,
        "namespace": namespace,
        "limit": 5
    }, headers={"Authorization": f"Bearer {REM_KEY}"})

    memories = search.json().get("results", [])
    context = "\n".join([m["value"] for m in memories])

    # Build messages with memory context
    messages = []
    if context:
        messages.append({"role": "system", "content": f"Relevant context from prior conversations:\n{context}"})
    messages.append({"role": "user", "content": user_msg})

    # Fast inference via Groq
    resp = groq.chat.completions.create(
        model="llama-3.1-70b-versatile",
        messages=messages
    )
    reply = resp.choices[0].message.content

    # Store the new interaction
    requests.post(f"{REM_BASE}/v1/memory-set", json={
        "key": "groq-chat",
        "value": f"User: {user_msg}\nAssistant: {reply}",
        "namespace": namespace,
        "tags": ["conversation"]
    }, headers={"Authorization": f"Bearer {REM_KEY}"})

    return reply

# Session 1
print(chat("I'm building a voice assistant for my smart home. I use Home Assistant."))

# Session 2
print(chat("What platform am I using for my smart home?"))
# "You mentioned you use Home Assistant for your smart home setup."

Step 3: Node.js with the Groq SDK

import Groq from "groq-sdk";

const groq = new Groq({ apiKey: "gsk_..." });
const REM_BASE = "https://remlabs.ai";
const REM_KEY = "sk-rem-...";

async function chat(userMsg, namespace = "groq-agent") {
  // Recall
  const search = await fetch(`${REM_BASE}/v1/memory/search`, {
    method: "POST",
    headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" },
    body: JSON.stringify({ query: userMsg, namespace, limit: 5 })
  });
  const { results } = await search.json();
  const context = results.map(r => r.value).join("\n");

  // Groq inference
  const resp = await groq.chat.completions.create({
    model: "llama-3.1-70b-versatile",
    messages: [
      ...(context ? [{ role: "system", content: `Memories:\n${context}` }] : []),
      { role: "user", content: userMsg }
    ]
  });
  const reply = resp.choices[0].message.content;

  // Store
  await fetch(`${REM_BASE}/v1/memory-set`, {
    method: "POST",
    headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" },
    body: JSON.stringify({ key: "groq-chat", value: `User: ${userMsg}\nAssistant: ${reply}`, namespace })
  });

  return reply;
}

Latency Breakdown

Here is what the typical round trip looks like with Groq + REM Labs:

Memory recall (REM Labs search): ~30-50ms
LLM inference (Groq LPU): ~200-800ms depending on output length
Memory store (REM Labs write, async): ~20-40ms

The memory store can be fired asynchronously so it does not block the response. Total perceived latency is Groq inference plus memory recall -- typically well under one second for most responses.

Groq uses OpenAI-compatible endpoints: If you are already using the OpenAI SDK pointed at Groq, the same memory pattern from our OpenAI integration guide works with zero changes.

Give your Groq agent a memory

Free tier. No credit card. Fast memory for fast inference.

Get started free →

Ultra-Fast AI Memory with Groq + REM Labs

Why Groq + Persistent Memory

Step 1: Get Your API Keys

Step 2: Build a Groq Agent with Memory

Step 3: Node.js with the Groq SDK

Latency Breakdown

Related articles

Give your Groq agent a memory