Integrations
Tutorial
April 13, 2026
Memory Layer for Fireworks AI
Fireworks AI provides fast, cost-effective inference for open-source models with features like function calling and JSON mode. But like all LLM APIs, it is stateless. This guide shows how to add a persistent memory layer to any Fireworks AI application using REM Labs -- so your agent remembers users, facts, and context across sessions.
Why Fireworks AI Needs a Memory Layer
Fireworks AI is popular for production workloads because of its speed, reliability, and support for features like structured output and function calling. Developers use it to run Llama 3, Mixtral, and other open-source models at scale. But Fireworks does not store any context between API calls -- that is by design, and it is your job to manage state.
For applications that need continuity -- chatbots that remember users, assistants that learn preferences, agents that accumulate knowledge -- you need a memory backend. REM Labs provides one that is purpose-built for AI: semantic search, full-text indexing, entity extraction, and multi-signal fusion in a single API.
Step 1: Get Your API Keys
Get a Fireworks API key from fireworks.ai and a REM Labs key from remlabs.ai/console or run npx @remlabs/memory.
Step 2: Build a Fireworks Agent with Memory
Fireworks AI uses the OpenAI-compatible format, so you can use the OpenAI SDK with a different base URL.
import openai
import requests
# Fireworks uses OpenAI-compatible endpoints
client = openai.OpenAI(
api_key="fw_...",
base_url="https://api.fireworks.ai/inference/v1"
)
REM_KEY = "sk-slop-..."
REM_BASE = "https://api.api.remlabs.ai"
def chat_with_memory(user_msg, namespace="fireworks-agent"):
# Recall
search = requests.post(f"{REM_BASE}/v1/memory/search", json={
"query": user_msg,
"namespace": namespace,
"limit": 5
}, headers={"Authorization": f"Bearer {REM_KEY}"})
memories = search.json().get("results", [])
context = "\n".join([m["value"] for m in memories])
messages = []
if context:
messages.append({"role": "system", "content": f"Relevant context:\n{context}"})
messages.append({"role": "user", "content": user_msg})
# Fireworks inference
resp = client.chat.completions.create(
model="accounts/fireworks/models/llama-v3p1-70b-instruct",
messages=messages
)
reply = resp.choices[0].message.content
# Store
requests.post(f"{REM_BASE}/v1/memory-set", json={
"key": "fireworks-chat",
"value": f"User: {user_msg}\nAssistant: {reply}",
"namespace": namespace,
"tags": ["conversation"]
}, headers={"Authorization": f"Bearer {REM_KEY}"})
return reply
# Session 1
print(chat_with_memory("I'm a DevOps engineer. We use Kubernetes and ArgoCD."))
# Session 2
print(chat_with_memory("What deployment tool do we use?"))
# "You mentioned your team uses ArgoCD for deployments."
Step 3: Node.js Example
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "fw_...",
baseURL: "https://api.fireworks.ai/inference/v1"
});
const REM_BASE = "https://api.api.remlabs.ai";
const REM_KEY = "sk-slop-...";
async function chatWithMemory(userMsg, namespace = "fireworks-agent") {
// Recall
const search = await fetch(`${REM_BASE}/v1/memory/search`, {
method: "POST",
headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" },
body: JSON.stringify({ query: userMsg, namespace, limit: 5 })
});
const { results } = await search.json();
const context = results.map(r => r.value).join("\n");
// Generate
const resp = await client.chat.completions.create({
model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
messages: [
...(context ? [{ role: "system", content: `Memories:\n${context}` }] : []),
{ role: "user", content: userMsg }
]
});
const reply = resp.choices[0].message.content;
// Store
await fetch(`${REM_BASE}/v1/memory-set`, {
method: "POST",
headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" },
body: JSON.stringify({ key: "fireworks-chat", value: `User: ${userMsg}\nAssistant: ${reply}`, namespace })
});
return reply;
}
Function Calling with Memory
Fireworks AI supports function calling, which pairs well with REM Labs. You can define a remember function that the model calls when it detects information worth saving, and a recall function that retrieves context on demand. This lets the model decide what to store rather than blindly saving every exchange.
OpenAI-compatible: Because Fireworks uses the OpenAI API format, the full code from our OpenAI integration guide works with Fireworks -- just change the base_url and API key.
Give your Fireworks agent a memory
Free tier. No credit card. Works with every model on Fireworks AI.
Get started free →