Memory Layer for Fireworks AI

Fireworks AI provides fast, cost-effective inference for open-source models with features like function calling and JSON mode. But like all LLM APIs, it is stateless. This guide shows how to add a persistent memory layer to any Fireworks AI application using REM Labs -- so your agent remembers users, facts, and context across sessions.

Why Fireworks AI Needs a Memory Layer

Fireworks AI is popular for production workloads because of its speed, reliability, and support for features like structured output and function calling. Developers use it to run Llama 3, Mixtral, and other open-source models at scale. But Fireworks does not store any context between API calls -- that is by design, and it is your job to manage state.

For applications that need continuity -- chatbots that remember users, assistants that learn preferences, agents that accumulate knowledge -- you need a memory backend. REM Labs provides one that is purpose-built for AI: semantic search, full-text indexing, entity extraction, and multi-signal fusion in a single API.

Step 1: Get Your API Keys

Get a Fireworks API key from fireworks.ai and a REM Labs key from remlabs.ai/console or run npx @remlabs/memory.

Step 2: Build a Fireworks Agent with Memory

Fireworks AI uses the OpenAI-compatible format, so you can use the OpenAI SDK with a different base URL.

import openai import requests # Fireworks uses OpenAI-compatible endpoints client = openai.OpenAI( api_key="fw_...", base_url="https://api.fireworks.ai/inference/v1" ) REM_KEY = "sk-slop-..." REM_BASE = "https://api.api.remlabs.ai" def chat_with_memory(user_msg, namespace="fireworks-agent"): # Recall search = requests.post(f"{REM_BASE}/v1/memory/search", json={ "query": user_msg, "namespace": namespace, "limit": 5 }, headers={"Authorization": f"Bearer {REM_KEY}"}) memories = search.json().get("results", []) context = "\n".join([m["value"] for m in memories]) messages = [] if context: messages.append({"role": "system", "content": f"Relevant context:\n{context}"}) messages.append({"role": "user", "content": user_msg}) # Fireworks inference resp = client.chat.completions.create( model="accounts/fireworks/models/llama-v3p1-70b-instruct", messages=messages ) reply = resp.choices[0].message.content # Store requests.post(f"{REM_BASE}/v1/memory-set", json={ "key": "fireworks-chat", "value": f"User: {user_msg}\nAssistant: {reply}", "namespace": namespace, "tags": ["conversation"] }, headers={"Authorization": f"Bearer {REM_KEY}"}) return reply # Session 1 print(chat_with_memory("I'm a DevOps engineer. We use Kubernetes and ArgoCD.")) # Session 2 print(chat_with_memory("What deployment tool do we use?")) # "You mentioned your team uses ArgoCD for deployments."

Step 3: Node.js Example

import OpenAI from "openai"; const client = new OpenAI({ apiKey: "fw_...", baseURL: "https://api.fireworks.ai/inference/v1" }); const REM_BASE = "https://api.api.remlabs.ai"; const REM_KEY = "sk-slop-..."; async function chatWithMemory(userMsg, namespace = "fireworks-agent") { // Recall const search = await fetch(`${REM_BASE}/v1/memory/search`, { method: "POST", headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" }, body: JSON.stringify({ query: userMsg, namespace, limit: 5 }) }); const { results } = await search.json(); const context = results.map(r => r.value).join("\n"); // Generate const resp = await client.chat.completions.create({ model: "accounts/fireworks/models/llama-v3p1-70b-instruct", messages: [ ...(context ? [{ role: "system", content: `Memories:\n${context}` }] : []), { role: "user", content: userMsg } ] }); const reply = resp.choices[0].message.content; // Store await fetch(`${REM_BASE}/v1/memory-set`, { method: "POST", headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" }, body: JSON.stringify({ key: "fireworks-chat", value: `User: ${userMsg}\nAssistant: ${reply}`, namespace }) }); return reply; }

Function Calling with Memory

Fireworks AI supports function calling, which pairs well with REM Labs. You can define a remember function that the model calls when it detects information worth saving, and a recall function that retrieves context on demand. This lets the model decide what to store rather than blindly saving every exchange.

OpenAI-compatible: Because Fireworks uses the OpenAI API format, the full code from our OpenAI integration guide works with Fireworks -- just change the base_url and API key.

Give your Fireworks agent a memory

Free tier. No credit card. Works with every model on Fireworks AI.

Get started free →