Ansible collection for deploying Haidra authored services, AI Horde workers, monitoring infrastructure, and supporting applications.
New here? Start with the Quick Start guide — test an AI-Horde code change in ~4 minutes or run the full stack locally in ~6. Want to contribute? See CONTRIBUTING.md.
This collection is intentionally opinionated and is not a general-purpose Ansible toolkit. It targets three audiences:
- The AI Horde team operating the stack.
- Developers contributing to AI Horde services.
- External groups adopting or vendoring the AI Horde stack and seeking reference deployment patterns.
- Generic, vendor-neutral deployment abstractions for arbitrary software.
- Replacing mature community roles for broad infrastructure concerns.
- Hiding stack assumptions required by AI Horde topology and workflows.
In short, if you are looking for a general Ansible collection for deploying the software therein, this is not it. If you are looking for a reference deployment for the AI Horde stack, this is exactly it.
Install Ansible (Linux only):
python -m pip install ansibleEnsure your control host can SSH to targets using key-based authentication via
an ssh-agent. If the remote user requires a sudo password, append -K to all
ansible-playbook commands.
Install this collection and its dependencies:
wget https://raw.githubusercontent.com/Haidra-Org/deployments/main/examples/requirements.yml
ansible-galaxy collection install -r requirements.ymlEach role provides its own README with full variable documentation and examples.
Adjust an example inventory with your hostnames, then run the
corresponding example playbook — or build your own site.yml.
| Role | Description |
|---|---|
| ai_horde | AI Horde backend (Flask + Postgres + Redis) |
| aihorde_frontpage | AiHordeFrontpage (Angular SSR website) |
| horde_model_reference | FastAPI service for AI Horde model metadata |
| artbot | Web frontend for AI Horde |
| artbot_revproxy | HAProxy reverse proxy for Artbot |
| horde_regen_worker | AI Horde worker (Dreamer, Scribe, Alchemist) |
| amd_gpu_drivers | AMD GPU driver and ROCm setup |
| Role | Description |
|---|---|
| horde_monitoring | Mimir + Grafana + S3 storage monitoring stack (Docker Compose) |
| horde_stats_exporter | AI Horde API → Prometheus metrics exporter |
| horde_alloy | Grafana Alloy telemetry collector for app hosts |
See MONITORING.md for the architecture overview, quick start, and how the monitoring roles work together.
| Document | Contents |
|---|---|
| Quick Start | Get running in minutes — 4 tiers from code change to production |
| Contributing | Dev setup, test conventions, PR guidelines |
| Monitoring Guide | Architecture, quick start, troubleshooting |
| Observability Stack | Loki, Tempo, and Alloy deep-dive |
| Backup & Restore | RPO/RTO, backup configuration, restore procedures |
| Credentials | Credential management and rotation |
| Upgrading | Component version upgrade procedures |
| Migration | Host migration runbook (planned and forced) |
The collection ships a two-tier test suite under tests/.
Validate Ansible template rendering, variable defaults, and negative (expected-failure) cases. Run entirely in check mode — no Docker daemon required for the test playbooks themselves.
# All render tests (builds a Docker systemd container per test):
./tests/run_tests.sh
# List all discoverable tests without running them:
./tests/run_tests.sh --list
# By role:
./tests/run_tests.sh monitoring
./tests/run_tests.sh ai_horde
./tests/run_tests.sh regen_worker
./tests/run_tests.sh artbot
./tests/run_tests.sh frontpage
./tests/run_tests.sh full_stack
# Specific test:
./tests/run_tests.sh monitoring/test_full_stackEvery run_tests.sh invocation writes per-test log files and a structured
summary under tests/test-results/<YYYYMMDD-HHMMSS>/:
tests/test-results/20260325-143012/
├── monitoring__test_full_stack.log # full Ansible output
├── monitoring__test_full_stack__idempotency.log # idempotency re-run
├── monitoring__test_runtime_services.log
├── ai_horde__test_deploy.log
└── summary.txt # machine-readable results
The runner prints a colour-coded summary table at the end with one-line failure reasons extracted from the Ansible output:
TEST STATUS DETAILS
────────────────────────────────────────────────────────────────────────────
monitoring/test_full_stack PASS
ai_horde/test_deploy FAIL {"msg": "No package matching 'python3-venv'"}
────────────────────────────────────────────────────────────────────────────
summary.txt is pipe-delimited for scripted analysis:
# FORMAT: STATUS | LABEL | LOG_FILE | REASON
PASS | monitoring/test_full_stack | monitoring__test_full_stack.log |
FAIL | ai_horde/test_deploy | ai_horde__test_deploy.log | {"msg": "No package matching..."}
Every playbook (except runtime and local_deploy tests) is automatically
re-run after the first pass; the idempotency check fails the test if any
task reports changed on the second run.
Test playbooks support YAML comment markers near the top of the file (within the first 5 lines) to control runner behaviour:
| Marker | Effect |
|---|---|
# idempotency: skip |
Skip the idempotency re-run for this test |
# requires: docker-daemon |
Skip the entire test when the target container has no Docker daemon |
Multi-play tests that intentionally overwrite the same files with different
variable sets (e.g. test_alloy_role.yml) should declare # idempotency: skip.
Exercise cross-role coherence and optionally spin up live services.
# Smoke test — config-only, CI-friendly:
./tests/run_tests.sh integration
# Local deploy — starts AI-Horde in Docker:
./tests/integration/local_deploy.sh up
./tests/integration/local_deploy.sh down
# With GPU worker (requires NVIDIA GPU + nvidia-container-toolkit):
./tests/integration/local_deploy.sh up --with-worker| Role | Render | Negative | Integration | Full-stack |
|---|---|---|---|---|
| horde_monitoring | ✅ | ✅ | — | ✅ |
| ai_horde | ✅ | ✅ | ✅ | ✅ |
| aihorde_frontpage | ✅ | — | — | ✅ |
| horde_regen_worker | ✅ | — | ✅ | — |
| artbot / revproxy | ✅ | — | — | — |
| horde_stats_exporter | — | — | — | ✅ |
| horde_alloy | — | — | — | — |
See also the Quick Start for a use-case driven introduction.
Spins up the complete Horde business stack on one machine: Backend (AI-Horde + Postgres + Redis), Frontend (AiHordeFrontpage), Stats Exporter, and HAProxy as the unified edge router. Monitoring and the GPU worker are optional tiers.
# Core stack (backend + frontpage + exporter + HAProxy):
./tests/full_stack/local_deploy.sh up
# With monitoring (Grafana, Mimir, Prometheus, Alertmanager, Alloy):
./tests/full_stack/local_deploy.sh up --with-monitoring
# With GPU worker (requires NVIDIA GPU):
./tests/full_stack/local_deploy.sh up --with-worker
# With Artbot on a separate port (8080):
./tests/full_stack/local_deploy.sh up --with-artbot
# Everything:
./tests/full_stack/local_deploy.sh up --all
# Tear down (unconditional — stops all tiers):
./tests/full_stack/local_deploy.sh down
# Status:
./tests/full_stack/local_deploy.sh status
# Logs for a specific tier:
./tests/full_stack/local_deploy.sh logs backend
./tests/full_stack/local_deploy.sh logs frontpage
./tests/full_stack/local_deploy.sh logs haproxy
./tests/full_stack/local_deploy.sh logs monitoring
./tests/full_stack/local_deploy.sh logs artbotLocal-deploy layout:
local-deploy/static/contains committed overlays/config files used by local deploy scripts.local-deploy/runtime/contains generated configs, cloned sources, and runtime data.
Reset local deploy state safely:
rm -rf local-deploy/runtimePort assignments (full-stack local deploy):
| Service | Port | Notes |
|---|---|---|
| HAProxy (main) | 80 | Unified edge router |
| HAProxy stats | 8404 | http://localhost:8404/stats |
| AiHordeFrontpage | 8006 | Angular SSR (also via HAProxy on 80) |
| AI-Horde API | 7001 | Direct; also via /api on port 80 |
| Stats Exporter | 9109 | Prometheus metrics |
| Grafana | 3000 | Monitoring dashboards |
| Prometheus | 9090 | Metrics collection |
| Artbot HAProxy | 8080 | Artbot site (--with-artbot) |