Benchmark Tool

How good is your
AI's memory?

Test any memory API against the LongMemEval benchmark. See how your system stacks up against REM Labs' 97.2% recall accuracy.

How it works

Three steps to your score

We run 10 sample questions from the LongMemEval benchmark against your memory API and compare the results.

Enter your API endpoint

Provide your base URL, API key, and the store/recall endpoint paths for your memory system.

We send 10 test questions

A sample of LongMemEval questions is sent through your store and recall endpoints to test memory fidelity.

See your score vs REM Labs

Get a per-question breakdown and a final comparison score against REM Labs' 97.2% benchmark result.

Run benchmark

Configure your API

Enter your memory API details below. The benchmark runs a simulated evaluation client-side.

API Base URL

API Key

Your key is never sent anywhere. All evaluation is simulated client-side.

Store Endpoint

Recall Endpoint

Testing question 0 of 10...

Results

Per-question breakdown

Each question was evaluated against the expected LongMemEval answer.

#	Question	Expected Answer	Your Answer	Result

Final score

How you compare

REM Labs

97.2%

Your API

Switch to 97.2% recall → Get API key

Industry comparison

Typical recall accuracy

How different memory architectures perform on LongMemEval-style evaluations, based on published benchmarks and internal testing.

52-67%

Typical Vector DB

Pure embedding similarity search. Good for semantic matching, struggles with temporal and multi-hop recall.

60-70%

Typical RAG System

Retrieval-augmented generation with chunked documents. Better context, but still misses nuanced memory queries.

97.2%
REM Labs
Proprietary ensemble pipeline + Dream Engine consolidation. 486 of 500 LongMemEval questions correct.

REM Labs

97.2%

Mem0

~66.9%

RAG (avg)

~65%

OpenAI

~52.9%

Ranges are based on published benchmarks and LongMemEval evaluation methodology (Wu et al.). Individual results may vary based on configuration and data characteristics.

Ready for 97.2% recall?

Add persistent, self-evolving memory to any AI application. Free tier available.

Get API Key Full Benchmark Results

This benchmark tool runs a simulated client-side evaluation. No data is sent to external servers. Your API credentials are never transmitted. REM Labs' 97.2% score is from the full 500-question LongMemEval evaluation suite. Competitor baselines sourced from published results.