Local-first MCP server for Cloudflare developer documentation.
It downloads the latest https://developers.cloudflare.com/llms-full.txt, builds a local SQLite index, and serves Cloudflare docs search over MCP Streamable HTTP for multiple local agents.
- Downloads and indexes the latest Cloudflare docs corpus during
setupandsync - Works offline for retrieval after setup completes
- Exposes Cloudflare docs search over HTTP-only MCP
- Preserves original markdown for stored pages
- Combines lexical search and semantic retrieval
- Excludes
404 - Page Not Found*andThird party licenses*pages entirely
Implemented:
- Remote corpus sync from Cloudflare
- Local SQLite storage with FTS5
- Chunked document indexing
- MCP tools for search and markdown fetch
- MCP resource for page markdown
- Prompt telling agents to use this server first
Not implemented yet:
- Local answer generation
- Legacy SSE transport compatibility
- Client-specific auto-install flows
- Node.js 24+
- npm
- Internet access for the initial
setupand futuresync - Enough disk space for:
- the downloaded docs corpus
- the SQLite index
- local Hugging Face model cache
npm installFor a separate GPU box, this repo now includes a CUDA-focused Docker image and Compose example:
docker compose -f docker-compose.cuda.yml up --build -dWhat this does:
- builds a CUDA 12 / Node 24 image from
Dockerfile.cuda - persists the SQLite index and model cache in
./data - binds the MCP server to
0.0.0.0:8787for LAN access - auto-runs
setupon first start before serving
Requirements on the remote machine:
- Docker Engine
- Docker Compose
- NVIDIA Container Toolkit
- a working NVIDIA driver on the host
Useful commands:
docker compose -f docker-compose.cuda.yml logs -f
docker compose -f docker-compose.cuda.yml exec cloudflare-docs-mcp npm run devices
docker compose -f docker-compose.cuda.yml exec cloudflare-docs-mcp npm run status
docker compose -f docker-compose.cuda.yml run --rm cloudflare-docs-mcp syncDefault LAN URL:
http://SERVER_IP:8787/mcp
Health check:
http://SERVER_IP:8787/healthz
This server has no auth. If you bind it to 0.0.0.0, keep it on a trusted LAN or VPN.
If you want Host-header protection while exposing it on your LAN, set CLOUDFLARE_DOCS_MCP_ALLOWED_HOSTS in docker-compose.cuda.yml to the IPs or DNS names clients will use, for example:
127.0.0.1,localhost,192.168.1.50,docsbox
Downloads the latest corpus, warms model assets, and builds the local index.
npm run setupTo explicitly prefer GPU-backed execution providers during setup:
npm run setup -- --device gpuTo explicitly require CUDA:
npm run setup -- --device cudaRefreshes the local index from the latest remote llms-full.txt.
npm run syncYou can also override the model device during sync:
npm run sync -- --device gpuShows local DB/index status and last sync metadata.
npm run statusShows which model devices this local runtime can target, the current configured device, which ONNX Runtime backends the package reports, and which GPU provider libraries are actually installed on disk.
npm run devicesStarts the local MCP server.
npm run serveTo serve using a different device than the configured default:
npm run serve -- --device gpuDefault endpoint:
http://127.0.0.1:8787/mcp
Health endpoint:
http://127.0.0.1:8787/healthz
Start the local MCP server first:
npm run serveDefault server URL:
http://127.0.0.1:8787/mcp
Add the server from the CLI:
codex mcp add cloudflareDocs --url http://127.0.0.1:8787/mcpOr add it manually in ~/.codex/config.toml:
[mcp_servers.cloudflareDocs]
url = "http://127.0.0.1:8787/mcp"Add the server to ~/.codeium/windsurf/mcp_config.json:
{
"mcpServers": {
"cloudflareDocs": {
"serverUrl": "http://127.0.0.1:8787/mcp"
}
}
}Add the server from the CLI:
claude mcp add --transport http --scope user cloudflareDocs http://127.0.0.1:8787/mcpOr add it in a project .mcp.json:
{
"mcpServers": {
"cloudflareDocs": {
"type": "http",
"url": "http://127.0.0.1:8787/mcp"
}
}
}Whichever client you use, add an instruction telling the agent to consult this MCP server first for Cloudflare documentation lookups.
Inputs:
querymode:hybrid | keyword | semanticlimitproductoptional product/path filterincludeDeprioritized
Returns ranked results with:
- title
- source URLs
- heading path
- snippet
- ranking info
- resource URI
Inputs:
docIdorurl
Returns the exact stored markdown payload for the page.
Returns the stored markdown for a page.
Reminds agents to use this MCP server before relying on built-in Cloudflare knowledge.
Search is retrieval-first:
- Lexical search: SQLite FTS5 / BM25
- Semantic search: local embeddings
- Fusion: lexical + semantic candidate merge
- Rerank: top candidates reranked locally
The default model configuration is:
- Embeddings:
jinaai/jina-embeddings-v2-base-code - Reranker:
Xenova/ms-marco-MiniLM-L-6-v2 - Device:
cpu
Model execution now supports an explicit device setting.
Supported values:
cpugpucudaautowasmwebgpudmlwebnnwebnn-npuwebnn-gpuwebnn-cpu
For this Node.js project, the practical choices are usually:
cpufor the default, predictable pathgputo prefer a GPU execution provider when one is availablecudato explicitly require CUDA on Linux x64
Important:
gpuis an alias. In Node.js it maps to whichever GPU providers the local runtime exposes.- A device may be selectable by platform but still fail at warmup if the necessary ONNX Runtime GPU binaries are not installed.
onnxRuntimeBackendsreflects package metadata.installedProvidersreflects the provider.sofiles that are actually present innode_modules/onnxruntime-node.setupandsyncvalidate the selected device during model warmup and fail early if the choice is not usable.
For Linux x64 with CUDA 12, install the ONNX Runtime CUDA provider bundle with:
ONNXRUNTIME_NODE_INSTALL_CUDA=v12 npm rebuild onnxruntime-nodeAfter that, confirm the local runtime sees the provider:
npm run devicesLook for "installedProviders": ["cuda", ...].
If CUDA warmup still fails after that, the usual remaining issue is system library loading. On this machine, the runtime needed LD_LIBRARY_PATH to include the CUDA and cuDNN library directories, for example:
export LD_LIBRARY_PATH=/usr/local/cuda-13.2/targets/x86_64-linux/lib:/usr/local/cuda-12.2/lib64:$LD_LIBRARY_PATHThen rerun:
npm run setup -- --device cudaIf indexing still crashes on a smaller GPU, lower search.embeddingBatchSize in cloudflare-docs-mcp.config.json. Setup now embeds document chunks in batches instead of sending a whole page at once, and a smaller batch size reduces VRAM pressure during indexing.
Configuration is loaded from:
cloudflare-docs-mcp.config.json
Supported top-level sections:
serverstoragecorpusmodelssearch
Example:
{
"search": {
"embeddingBatchSize": 4
},
"models": {
"device": "cpu",
"embeddingModelId": "jinaai/jina-embeddings-v2-base-code",
"rerankerModelId": "Xenova/ms-marco-MiniLM-L-6-v2"
}
}Useful environment overrides:
CLOUDFLARE_DOCS_MCP_CONFIGCLOUDFLARE_DOCS_MCP_HOSTCLOUDFLARE_DOCS_MCP_PORTCLOUDFLARE_DOCS_MCP_DATA_DIRCLOUDFLARE_DOCS_MCP_MODEL_DEVICE
Example:
CLOUDFLARE_DOCS_MCP_MODEL_DEVICE=gpu npm run setupBuild:
npm run buildTest:
npm testRun CLI directly in dev:
npm run dev -- --helpserverequires an existing local index. Runnpm run setupfirst.- After setup, retrieval does not require internet access.
- If model assets are not present locally, semantic retrieval and reranking depend on setup having downloaded them already.
npm run devicesis the quickest way to see whether this machine exposescpu,cuda, or another target before running setup.setupreusesdata/llms-full.txt.downloadordata/llms-full.txtif one is already present, so repeated runs do not require re-downloading the corpus first.- The checked-in
llms-full.txtfile is only local reference material for development. Runtime setup and sync fetch the latest copy from Cloudflare.