Persistent memory for AI coding agents — deployed on Azure, built for the enterprise.
Your coding agent remembers everything. No more re-explaining. No more context loss between sessions.
Based on agentmemory, re-engineered for Azure with multi-tenancy, managed services, and one-click deployment.
Install • Deploy • How It Works • Agents • Viewer • API • vs Original • Config
Every coding agent forgets everything when the session ends. You waste the first 5 minutes re-explaining your stack. Enterprise Agent Memory fixes this — it silently captures what your agent does, compresses it into searchable memory, builds a knowledge graph, and injects the right context when the next session starts.
Session 1: "Add JWT auth to the API"
Agent writes code, runs tests, fixes bugs
→ agentmemory captures every tool call silently
→ Observations compressed into structured memory via GPT-4o
→ Knowledge graph auto-extracted (entities, relationships)
→ Vectors embedded with text-embedding-3-large (3072 dims)
→ Everything stored in Cosmos DB + indexed in AI Search
Session 2: "Now add rate limiting"
Agent already knows:
✓ Auth uses jose middleware in src/middleware/auth.ts
✓ Tests in test/auth.test.ts cover token validation
✓ You chose jose over jsonwebtoken for Edge compatibility
→ Zero re-explaining. Starts working immediately.
What makes this different from the original agentmemory? This is the Azure enterprise edition — same memory pipeline, but running on managed Azure services with multi-tenant isolation, auto-scaling, and one-click deployment. No SQLite files, no local iii-engine runtime, no single-process limitations.
┌─────────────────────────────┐
│ AI Coding Agents │
│ Claude Code · Cursor · Codex│
│ Gemini CLI · Any MCP client │
└─────────────┬───────────────┘
│ REST API / MCP
▼
┌──────────────────────────────────────┐
│ Azure Container Apps │
│ Fastify v5 · Entra ID JWT Auth │
│ Multi-tenant · Rate Limited │
│ Auto-scales 1 → 10 replicas │
└────┬──────┬──────┬──────┬───────────┘
│ │ │ │
┌──────────┘ │ │ └──────────┐
▼ ▼ ▼ ▼
┌─────────────┐ ┌──────────┐ ┌─────────────┐ ┌─────────────┐
│ Cosmos DB │ │Azure AI │ │Azure OpenAI │ │Blob Storage │
│ (Serverless)│ │ Search │ │ (GPT-4o + │ │ (Archive) │
│ │ │(BM25 + │ │ embedding) │ │ │
│ Sessions │ │ Vector) │ │ │ │ Raw obs │
│ Observations│ │ │ │ Compress │ │ Audit trail │
│ Memories │ │ 3072-dim │ │ Embed │ │ │
│ Graph nodes │ │ vectors │ │ Graph extract│ │ │
│ Graph edges │ │ │ │ │ │ │
│ Audit log │ │ │ │ │ │ │
└─────────────┘ └──────────┘ └─────────────┘ └─────────────┘
│
└──────────────────┐
▼
┌─────────────────┐
│ App Insights │
│ (Monitoring) │
└─────────────────┘
Agent tool call fires
→ Archive raw observation to Blob Storage
→ LLM compress via GPT-4o → structured facts + concepts + narrative
→ Generate vector embedding (text-embedding-3-large, 3072 dimensions)
→ Store compressed observation in Cosmos DB
→ Index in Azure AI Search (BM25 + vector)
→ Extract knowledge graph entities (fire-and-forget)
→ Increment session observation count + audit entry
Enterprise Agent Memory exposes a standard REST API that any agent can call. It also works with the original agentmemory MCP server and plugins.
|
Claude Code hooks + MCP |
Codex CLI hooks + MCP |
Cursor MCP server |
Gemini CLI MCP server |
Windsurf MCP server |
Cline MCP server |
Aider REST API |
Works with any agent that speaks MCP or HTTP — one API, memories shared across all of them.
- Node.js 18+ and npm
- Azure subscription with the following services provisioned (or use Deploy to Azure):
- Azure Cosmos DB (NoSQL, serverless)
- Azure AI Search (Basic or Standard)
- Azure OpenAI (GPT-4o + text-embedding-3-large)
- Azure Blob Storage
# Clone
git clone https://github.com/msftse/enterprise-agent-memory.git
cd enterprise-agent-memory
# Install dependencies
npm install
# Configure (see Configuration section below)
cp .env.azure.example .env
# Edit .env with your Azure resource endpoints and keys
# Run in development mode
npm run dev
# Run tests (102 passing)
npm test
# Build for production
npm run build && npm startThe server starts on http://localhost:8080. Open http://localhost:8080/viewer for the built-in dashboard.
This deploys all required Azure services using our Bicep templates:
| Resource | What it creates |
|---|---|
| Cosmos DB | Serverless NoSQL account + agentmemory database with 6 containers |
| AI Search | Basic tier search service with vector index (3072 dims) |
| Azure OpenAI | GPT-4o + text-embedding-3-large deployments (optional, requires quota) |
| Blob Storage | LRS storage account for raw observation archive |
| Container Apps | Consumption-plan app with auto-scale (1–10 replicas) |
| App Insights | Application monitoring and telemetry |
| Container Registry | ACR for Docker image hosting |
# 1. Create resource group
az group create --name rg-agentmemory --location westus2
# 2. Deploy infrastructure (Bicep)
az deployment group create \
--resource-group rg-agentmemory \
--template-file infra/main.bicep \
--parameters baseName=agentmem environment=dev
# 3. Build & push Docker image to ACR
az acr build \
--registry <your-acr-name> \
--image agent-memory:latest \
--file Dockerfile .
# 4. Update Container App with new image
az containerapp update \
--name app-<baseName>-dev \
--resource-group rg-agentmemory \
--image <acr>.azurecr.io/agent-memory:latestThe infra/ directory contains 8 Bicep modules for full Azure deployment:
infra/
├── main.bicep # Orchestrator — wires all modules together
└── modules/
├── cosmos.bicep # Cosmos DB NoSQL (serverless)
├── ai-search.bicep # Azure AI Search
├── openai.bicep # Azure OpenAI (conditional)
├── storage.bicep # Blob Storage
├── container-app.bicep # Container Apps + Environment
├── monitoring.bicep # App Insights + Log Analytics
└── networking.bicep # VNet + private endpoints (optional)
┌─────────────────────────────────────────────────────────────────┐
│ OBSERVATION PIPELINE │
│ │
│ Raw Input ──→ Blob Archive ──→ LLM Compress ──→ Embed (3072d) │
│ │ │
│ ▼ │
│ ┌───────────────┐ │
│ │ Compressed Obs │ │
│ │ • title │ │
│ │ • content │ │
│ │ • facts[] │ │
│ │ • concepts[] │ │
│ │ • importance │ │
│ └───────┬───────┘ │
│ │ │
│ ┌────────────────┼────────────────┐ │
│ ▼ ▼ ▼ │
│ Cosmos DB AI Search Graph Extract │
│ (store) (vector index) (fire & forget) │
└─────────────────────────────────────────────────────────────────┘
Entities and relationships are automatically extracted from every observation:
- Nodes: Files, concepts, libraries, functions, people, errors, patterns, projects, decisions
- Edges: imports, uses, depends_on, related_to, caused_by, solves, implements, tested_by
- Deduplication: Nodes matched by
name + type + tenantId; edges bysource + target + type + tenantId - Weight: Edges start at 1.0, increment by 0.5 on re-observation (capped at 10)
Every record is scoped by tenantId. Every query filters by it. Tenant A never sees Tenant B's data.
Tenant A Tenant B
├── Sessions (scoped) ├── Sessions (scoped)
├── Observations ├── Observations
├── Memories ├── Memories
├── Knowledge Graph ├── Knowledge Graph
└── Search Index (filtered) └── Search Index (filtered)
Authentication uses Microsoft Entra ID JWT tokens. The x-tenant-id header provides tenant scoping. For development, set AUTH_DISABLED=true.
A built-in web viewer is served at /viewer (root / redirects there). No separate process, no extra dependencies — it's a self-contained SPA served directly from the Fastify API.
Features:
| View | What it shows |
|---|---|
| 📊 Dashboard | Stats overview — sessions, observations, memories, graph node counts |
| 💚 Health | Real-time status of all Azure services (Cosmos, AI Search, Blob) |
| 📁 Sessions | Browse sessions with drill-down detail (project, model, tags, obs count) |
| 👁 Observations | Browse by session — view compressed content, facts, concepts, importance |
| 💡 Memories | List memories with strength bars, versioning, concept tags |
| 🔍 Search | Semantic search across all observations and memories |
| 🕸️ Knowledge Graph | Interactive force-directed graph visualization with click-to-inspect |
The viewer supports dark/light theme toggle and auto-detects system preference. The API URL and tenant ID are configurable in the top bar.
This project takes the core concepts from rohitg00/agentmemory and re-engineers them for Azure enterprise deployment.
| agentmemory (original) | Enterprise Agent Memory (Azure) | |
|---|---|---|
| Storage | SQLite on local disk | Azure Cosmos DB (serverless, globally distributed) |
| Vector Search | BM25 + local embeddings (MiniLM) | Azure AI Search (BM25 + vector, 3072-dim text-embedding-3-large) |
| LLM | Any OpenAI-compatible provider | Azure OpenAI (GPT-4o for compression + graph extraction) |
| Multi-tenancy | Single user | Full tenant isolation (Entra ID + tenantId scoping) |
| Scaling | Single process | Container Apps auto-scale (1–10 replicas) |
| Runtime | iii-engine (Rust binary required) | Pure Node.js — no external runtime needed |
| Knowledge Graph | Optional (iii-engine) | Auto-extraction on every observation (fire-and-forget) |
| Auth | HMAC secret | Microsoft Entra ID JWT + RBAC |
| Deployment | npm install / Docker Compose | One-click Azure deploy (Bicep IaC) |
| Compliance | — | GDPR purge endpoint, audit trail in Blob Storage |
| Viewer | Port 3113 (separate process) | Built-in at /viewer (same server, no proxy) |
| Tests | 950+ | 102 (unit + integration, vitest) |
Base URL: https://your-app.azurecontainerapps.io/api/v1
All endpoints (except /health) require:
- Authorization: Bearer token (Entra ID JWT) — or set
AUTH_DISABLED=truefor development - x-tenant-id: Tenant identifier header
| Method | Path | Description |
|---|---|---|
POST |
/sessions |
Create a new agent session |
GET |
/sessions |
List sessions (paginated) |
GET |
/sessions/:id |
Get session by ID |
PATCH |
/sessions/:id |
Update session metadata |
POST |
/sessions/:id/end |
End session (set status to completed) |
| Method | Path | Description |
|---|---|---|
POST |
/observations |
Capture observation → compress → embed → store → index → graph |
GET |
/observations/:id |
Get observation by ID |
GET |
/sessions/:id/observations |
List observations for a session |
| Method | Path | Description |
|---|---|---|
POST |
/memories |
Create a memory (with embedding) |
GET |
/memories |
List memories (paginated) |
GET |
/memories/:id |
Get memory by ID |
PUT |
/memories/:id/evolve |
Evolve memory (creates new version) |
DELETE |
/memories/:id |
Forget memory (soft delete, sets strength to 0) |
| Method | Path | Description |
|---|---|---|
POST |
/search |
Hybrid search — BM25 + vector + semantic reranking |
curl -X POST https://your-app.azurecontainerapps.io/api/v1/search \
-H "Content-Type: application/json" \
-H "x-tenant-id: my-team" \
-d '{"query": "how does authentication work", "limit": 5}'| Method | Path | Description |
|---|---|---|
GET |
/graph/nodes |
List graph nodes (filter by type) |
GET |
/graph/edges |
List graph edges (filter by nodeId) |
POST |
/graph/nodes |
Create node |
POST |
/graph/edges |
Create edge |
POST |
/graph/traverse |
BFS traversal from a start node |
POST |
/graph/extract |
Extract entities from a single observation |
POST |
/graph/extract-batch |
Extract entities from all observations in a session |
| Method | Path | Description |
|---|---|---|
GET |
/health |
Health check — all Azure service statuses (no auth) |
GET |
/admin/metrics |
Per-tenant usage metrics |
DELETE |
/admin/tenant/:id |
GDPR purge — delete all data for a tenant |
| Service | Purpose | SKU | Scaling |
|---|---|---|---|
| Cosmos DB | Sessions, observations, memories, graph nodes/edges, audit | Serverless | Auto-scales RU/s per request |
| Azure AI Search | Hybrid search (BM25 + 3072-dim vector) | Basic → Standard | Add replicas for throughput, partitions for index size |
| Azure OpenAI | GPT-4o (compression, graph extraction) + text-embedding-3-large | Standard | TPM-based rate limiting |
| Blob Storage | Raw observation archive + audit trail | LRS | Unlimited |
| Container Apps | Stateless API runtime | Consumption | 0.5 vCPU / 1GB → auto-scales to 10 replicas |
| App Insights | Distributed tracing + monitoring | — | — |
| Tier | Users | Monthly Cost (est.) |
|---|---|---|
| Dev | 1–5 | ~$15–30 (serverless Cosmos + basic Search) |
| Team | 5–50 | ~$80–150 (basic Search + moderate Cosmos RU) |
| Enterprise | 50+ | ~$300+ (standard Search + provisioned Cosmos) |
# Required — Azure service endpoints
COSMOS_ENDPOINT=https://your-cosmos.documents.azure.com:443/
AI_SEARCH_ENDPOINT=https://your-search.search.windows.net
STORAGE_ACCOUNT_URL=https://yourstorage.blob.core.windows.net
# Optional — Azure OpenAI (LLM features disabled without this)
AZURE_OPENAI_ENDPOINT=https://your-openai.openai.azure.com
AZURE_OPENAI_API_KEY=your-key # or use Managed Identity
# Optional — key-based auth for local development
COSMOS_KEY=your-cosmos-account-key
STORAGE_ACCOUNT_KEY=your-storage-key
AI_SEARCH_ADMIN_KEY=your-search-admin-key
# Server
PORT=8080 # default: 8080
LOG_LEVEL=info # debug | info | warn | error
AUTH_DISABLED=true # skip Entra ID auth (dev only)In production, the app uses Managed Identity (DefaultAzureCredential) — no keys needed. Set COSMOS_KEY / STORAGE_ACCOUNT_KEY only for local development where RBAC isn't configured.
├── src/
│ ├── types/ # 60+ domain model interfaces (tenantId multi-tenancy)
│ │ ├── models.ts # Session, CompressedObservation, Memory, GraphNode, GraphEdge
│ │ └── api.ts # Request/response types
│ ├── config/ # Zod-validated Azure config with graceful degradation
│ ├── adapters/ # Azure service adapters
│ │ ├── cosmos.adapter.ts # Cosmos DB (key + DefaultAzureCredential)
│ │ ├── ai-search.adapter.ts # AI Search (vector + BM25)
│ │ ├── azure-openai.adapter.ts # OpenAI (compress, embed, graph extract)
│ │ ├── blob-storage.adapter.ts # Blob Storage (archive, audit)
│ │ └── fabric/lakehouse.adapter.ts # Fabric Lakehouse (analytics)
│ ├── engine/ # Core memory pipeline
│ │ ├── observe.ts # 7-step pipeline: archive → compress → embed → store → index → graph → audit
│ │ ├── compress.ts # GPT-4o observation compression
│ │ ├── remember.ts # Memory creation + versioning
│ │ ├── forget.ts # Soft deletion (set strength to 0)
│ │ ├── search.ts # Hybrid search orchestration
│ │ └── graph.ts # Knowledge graph CRUD + entity extraction + deduplication
│ ├── middleware/ # Auth (Entra ID JWT), tenant isolation, rate limiting
│ ├── routes/ # 20+ Fastify route handlers
│ │ ├── sessions.routes.ts
│ │ ├── observations.routes.ts
│ │ ├── memories.routes.ts
│ │ ├── search.routes.ts
│ │ ├── graph.routes.ts
│ │ ├── admin.routes.ts
│ │ └── viewer.routes.ts # Serves built-in dashboard
│ ├── viewer/
│ │ └── index.html # Self-contained SPA dashboard
│ └── index.ts # Fastify server entrypoint
├── infra/ # 8 Bicep modules for Azure deployment
│ ├── main.bicep
│ └── modules/
├── src/__tests__/ # 102 tests (vitest)
│ ├── unit/ # 9 unit test suites
│ └── integration/ # API integration tests
├── docs/
│ ├── PRD.md # Product requirements document
│ └── architecture.excalidraw
├── Dockerfile # Multi-stage Node 22 Alpine build
├── vitest.config.ts
└── package.json
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Run tests (
npm test— all 102 should pass) - Commit your changes
- Push to the branch and open a Pull Request
This project is licensed under the Apache License 2.0 — see the LICENSE file for details.
- agentmemory by Rohit Ghumare — the original persistent memory system for AI coding agents that inspired this enterprise edition.
- iii engine — the runtime that powers the original agentmemory.
- Built with Azure Cosmos DB, Azure AI Search, Azure OpenAI, and Azure Container Apps.
Built with ❤️ by the Microsoft SE team · Powered by Azure