OpenAI-compatible mock server for testing LLM integrations.
- OpenAI API compatibility with key endpoints (
/models,/chat/completions,/responses) - Default mirror strategy (echoes input as output)
- Tool calling support — trigger phrase–driven tool call responses when
toolsare present in the request usingcall tool '<name>' with '<json>' - Error simulation — trigger phrase–driven error responses using
raise error <json>in the last user message - Streaming support for both Chat Completions and Responses APIs (including
stream_options.include_usage)
docker container run -p 8000:8000 ghcr.io/modai-systems/llmock:latestTest with this sample request:
curl http://localhost:8000/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'What the request does is simply mirror the input, so it returns Hello!.
Prerequisites:
- Python 3.14+
- uv (package manager)
Installation:
uv sync --all-extrasRun the Server:
uv run python -m llmockFor development with auto-reload:
uv run uvicorn llmock.app:app --host 0.0.0.0 --port 8000 --reloadThe server will be available at http://localhost:8000. Health check available at /health.
Edit config.yaml to configure available models and response strategies:
# Port for the HTTP server (default: 8000)
port: 8000
# API key for authentication (optional - if not set, no auth required)
api-key:
# CORS configuration
cors:
allow-origins:
- "http://localhost:8000"
# Ordered list of strategies to try (first non-empty result wins)
# Available: ErrorStrategy, ToolCallStrategy, MirrorStrategy
strategies:
- ErrorStrategy
- ToolCallStrategy
- MirrorStrategy
models:
- id: "gpt-4o"
created: 1715367049
owned_by: "openai"
- id: "gpt-4o-mini"
created: 1721172741
owned_by: "openai"You can override values from config.yaml using environment variables with the LLMOCK_ prefix.
Nested keys are joined with underscores, and dashes are converted to underscores.
Examples:
# Port
export LLMOCK_PORT=9000
# API key (enables authentication)
export LLMOCK_API_KEY=your-secret-api-key
# Lists — always use a JSON array
export LLMOCK_CORS_ALLOW_ORIGINS='["http://localhost:8000","http://localhost:5173"]'
# Models — JSON array of model objects
export LLMOCK_MODELS='[{"id":"my-model","created":1715367049,"owned_by":"custom"},{"id":"other-model","created":1715367049,"owned_by":"custom"}]'Docker example using a custom port and API key:
docker container run -p 9000:9000 \
-e LLMOCK_PORT=9000 \
-e LLMOCK_API_KEY=secret \
ghcr.io/modai-systems/llmock:latestNotes:
- Lists must be passed as JSON arrays (
[...]). - Only keys that exist in
config.yamlare overridden.
When ToolCallStrategy is included in the strategies list, llmock watches the last user message for lines matching the pattern:
call tool '<name>' with '<json>'
<name>is used verbatim — no check against thetoolslist in the request is performed.<json>is the arguments string passed to the tool (use'{}'for no arguments).- Multiple matching lines produce multiple tool calls.
- If no line matches, the strategy falls through to the next one (e.g.
MirrorStrategy).
No extra config keys are needed — adding ToolCallStrategy to the strategies list is sufficient.
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "call tool 'calculate' with '{\"expression\": \"6*7\"}'"}],
tools=[{
"type": "function",
"function": {
"name": "calculate",
"parameters": {
"type": "object",
"properties": {"expression": {"type": "string"}},
"required": ["expression"]
}
}
}]
)
tool_call = response.choices[0].message.tool_calls[0]
# tool_call.function.name == "calculate"
# tool_call.function.arguments == '{"expression": "6*7"}' (from trigger phrase)This works on both /chat/completions and /responses endpoints.
When ErrorStrategy is included in the strategies list, llmock watches the last user message for lines matching the pattern:
raise error <json>
The JSON payload must contain:
code(integer) — HTTP status code to returnmessage(string) — error messagetype(string, optional) — OpenAI error type (e.g."rate_limit_error")error_code(string, optional) — OpenAI error code (e.g."rate_limit_exceeded")
The first matching line wins. If no line matches, the strategy falls through to the next one.
No extra config keys are needed — adding ErrorStrategy to the strategies list is sufficient.
client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": 'raise error {"code": 429, "message": "Rate limit exceeded", "type": "rate_limit_error", "error_code": "rate_limit_exceeded"}'}]
)Only the last user message is checked. System/assistant/tool messages are ignored.
Works on both /chat/completions and /responses endpoints.
uv run pytest -vuv run ruff format src tests # Format code
uv run ruff check src tests # Lint code- Architecture: See docs/ARCHITECTURE.md
- Decisions: See docs/DECISIONS.md
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Run tests and linting (
uv run pytest && uv run ruff check src tests) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.