Memory Plugin for Microsoft Semantic Kernel

Semantic Kernel's plugin architecture makes it straightforward to extend AI agents with new capabilities. This guide shows how to build a REM Labs memory plugin that gives your Semantic Kernel agents persistent, cross-session recall -- with vector, full-text, and entity graph retrieval built in.

Why Semantic Kernel Needs External Memory

Semantic Kernel provides a flexible orchestration layer for building AI agents in C# and Python. But its built-in memory connectors are limited to vector-only stores. For production use cases -- where you need to recall proper nouns, handle knowledge updates, and search across thousands of past interactions -- you need multi-signal retrieval. That is what REM Labs provides.

Step 1: Install

pip install remlabs-memory semantic-kernel

Step 2: Build the Memory Plugin

from semantic_kernel.functions import kernel_function from remlabs import RemMemory class RemMemoryPlugin: def __init__(self, api_key: str, namespace: str = "semantic-kernel"): self.mem = RemMemory(api_key=api_key) self.namespace = namespace @kernel_function(description="Search persistent memory for relevant context") def recall(self, query: str) -> str: results = self.mem.search(query, namespace=self.namespace, limit=5) if not results: return "No relevant memories found." return "\n".join([f"- {r['value']} (score: {r['score']:.2f})" for r in results]) @kernel_function(description="Store a fact or observation in persistent memory") def remember(self, value: str, tags: str = "") -> str: tag_list = [t.strip() for t in tags.split(",") if t.strip()] self.mem.store(value=value, namespace=self.namespace, tags=tag_list) return f"Stored: {value}"

The plugin exposes two kernel functions: recall for semantic search and remember for storing new facts. Semantic Kernel's planner can invoke these automatically when the agent decides it needs context or wants to persist something.

Step 3: Register and Use

import semantic_kernel as sk from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion kernel = sk.Kernel() kernel.add_service(OpenAIChatCompletion( service_id="chat", ai_model_id="gpt-4o", api_key="..." )) # Register the REM memory plugin kernel.add_plugin( RemMemoryPlugin(api_key="sk-slop-..."), plugin_name="memory" ) # The agent can now call memory.recall and memory.remember result = await kernel.invoke_prompt( "What do you know about the user's deployment preferences? " "Use the memory recall function to check.", settings={"function_choice_behavior": "auto"} ) print(result)

With function_choice_behavior set to "auto", the agent automatically calls memory.recall when it determines context would help, and memory.remember when it encounters facts worth persisting.

Step 4: Direct API Access

from remlabs import RemMemory mem = RemMemory(api_key="sk-slop-...") # Store memories outside the kernel mem.store( value="User prefers Azure over AWS for new deployments.", namespace="semantic-kernel", tags=["preferences", "infrastructure"] ) # Search with tag filtering results = mem.search( "cloud provider preference", namespace="semantic-kernel", limit=5 ) for r in results: print(r["value"], r["score"])

Triple-Indexed Retrieval

Every memory stored through REM is automatically indexed three ways: vector embeddings for semantic similarity, full-text indexing for exact keyword matching, and entity graph extraction for structured relationships. Multi-signal fusion retrieval combines all three at query time, reaching 90% on LongMemEval -- far beyond what vector-only connectors achieve.

Works with C# too: The REM Labs API is a standard REST API. Use HttpClient to call POST /v1/memory/store and POST /v1/memory/search from any .NET application. See the API docs for the full endpoint reference.

Give your Semantic Kernel agents a memory

Free tier. No credit card. pip install and go.

Get started free →