Benchmark Tool

How good is your
AI's memory?

Test any memory API against the LongMemEval benchmark. See how your system stacks up against REM Labs' 97.2% recall accuracy.

How it works

Three steps to your score

We run 10 sample questions from the LongMemEval benchmark against your memory API and compare the results.

1
Enter your API endpoint
Provide your base URL, API key, and the store/recall endpoint paths for your memory system.
2
We send 10 test questions
A sample of LongMemEval questions is sent through your store and recall endpoints to test memory fidelity.
3
See your score vs REM Labs
Get a per-question breakdown and a final comparison score against REM Labs' 97.2% benchmark result.
Run benchmark

Configure your API

Enter your memory API details below. The benchmark runs a simulated evaluation client-side.

Your key is never sent anywhere. All evaluation is simulated client-side.

Testing question 0 of 10...

Results

Per-question breakdown

Each question was evaluated against the expected LongMemEval answer.

# Question Expected Answer Your Answer Result
Final score

How you compare

REM Labs
97.2%
Your API
--
Switch to 97.2% recall → Get API key
Industry comparison

Typical recall accuracy

How different memory architectures perform on LongMemEval-style evaluations, based on published benchmarks and internal testing.

52-67%
Typical Vector DB
Pure embedding similarity search. Good for semantic matching, struggles with temporal and multi-hop recall.
60-70%
Typical RAG System
Retrieval-augmented generation with chunked documents. Better context, but still misses nuanced memory queries.
97.2%
REM Labs
Proprietary ensemble pipeline + Dream Engine consolidation. 486 of 500 LongMemEval questions correct.
REM Labs
97.2%
Mem0
~66.9%
RAG (avg)
~65%
OpenAI
~52.9%

Ranges are based on published benchmarks and LongMemEval evaluation methodology (Wu et al.). Individual results may vary based on configuration and data characteristics.

Ready for 97.2% recall?

Add persistent, self-evolving memory to any AI application. Free tier available.

Get API Key Full Benchmark Results
This benchmark tool runs a simulated client-side evaluation. No data is sent to external servers. Your API credentials are never transmitted. REM Labs' 97.2% score is from the full 500-question LongMemEval evaluation suite. Competitor baselines sourced from published results.