Persistent Memory for Together AI Models

Together AI gives you access to hundreds of open-source models through a single API -- Llama 3, Mixtral, Qwen, DeepSeek, and more. But every model on the platform is stateless. This guide shows how to add persistent memory to any Together AI model using REM Labs, so your agent remembers across sessions regardless of which model you choose.

Why Together AI Needs External Memory

Together AI is one of the best platforms for running open-source models in production. You get competitive pricing, fast inference, and the freedom to switch between models without changing your code. But Together AI's chat completions endpoint follows the same stateless pattern as OpenAI -- each request is independent, no context is saved server-side.

If you are building an application that needs to remember user preferences, past interactions, or accumulated knowledge, you need an external memory layer. REM Labs provides exactly this: a persistent, searchable memory backend that works with any model on any platform.

Step 1: Get Your API Keys

Get a Together AI key from api.together.xyz and a REM Labs key from remlabs.ai/console or run npx @remlabs/memory.

Step 2: Store and Recall with Together AI

Together AI uses the OpenAI-compatible chat completions format, so the integration pattern is straightforward.

import openai import requests # Together AI uses OpenAI-compatible endpoints client = openai.OpenAI( api_key="...", base_url="https://api.together.xyz/v1" ) REM_KEY = "sk-slop-..." REM_BASE = "https://api.api.remlabs.ai" def chat_with_memory(user_msg, namespace="together-agent"): # Recall relevant memories search = requests.post(f"{REM_BASE}/v1/memory/search", json={ "query": user_msg, "namespace": namespace, "limit": 5 }, headers={"Authorization": f"Bearer {REM_KEY}"}) memories = search.json().get("results", []) context = "\n".join([m["value"] for m in memories]) messages = [] if context: messages.append({"role": "system", "content": f"Context from prior conversations:\n{context}"}) messages.append({"role": "user", "content": user_msg}) # Call Together AI (any model) resp = client.chat.completions.create( model="meta-llama/Llama-3.1-70B-Instruct-Turbo", messages=messages ) reply = resp.choices[0].message.content # Store the interaction requests.post(f"{REM_BASE}/v1/memory-set", json={ "key": "together-chat", "value": f"User: {user_msg}\nAssistant: {reply}", "namespace": namespace, "tags": ["conversation"] }, headers={"Authorization": f"Bearer {REM_KEY}"}) return reply # Session 1 print(chat_with_memory("I'm a data scientist at Netflix. I mostly use PyTorch.")) # Session 2 (new process, different day) print(chat_with_memory("What ML framework do I use?")) # "You mentioned you mostly use PyTorch."

Step 3: Switch Models, Keep Memories

One of Together AI's strengths is easy model switching. With REM Labs, your memory persists across model changes. Start a conversation with Llama 3, switch to Mixtral, then try Qwen -- the memory layer stays the same.

# Same memory, different models chat_with_memory("I prefer concise explanations.") # Uses Llama 3 # Switch model -- memory carries over client_mixtral = openai.OpenAI(api_key="...", base_url="https://api.together.xyz/v1") # Use the same namespace, memories persist across models

Step 4: Node.js Example

import OpenAI from "openai"; const client = new OpenAI({ apiKey: "...", baseURL: "https://api.together.xyz/v1" }); const REM_BASE = "https://api.api.remlabs.ai"; const REM_KEY = "sk-slop-..."; // Store await fetch(`${REM_BASE}/v1/memory-set`, { method: "POST", headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" }, body: JSON.stringify({ key: "together-chat", value: "User is a data scientist at Netflix. Uses PyTorch.", namespace: "ds-user-42" }) }); // Recall and generate const search = await fetch(`${REM_BASE}/v1/memory/search`, { method: "POST", headers: { "Authorization": `Bearer ${REM_KEY}`, "Content-Type": "application/json" }, body: JSON.stringify({ query: "user background", namespace: "ds-user-42", limit: 5 }) }); const { results } = await search.json(); const context = results.map(r => r.value).join("\n");

OpenAI-compatible: Because Together AI uses the OpenAI SDK format, the same code from our OpenAI integration guide works with Together AI -- just change the base_url.

Give your Together AI agent a memory

Free tier. No credit card. Works with every model on Together AI.

Get started free →