Ignite

Get local AI running with guided setup instead of manual runtime wiring.

Ignite is a local runtime manager for GGUF models and llama.cpp. It handles the work between "I have a GPU" and "I want an OpenAI-compatible local endpoint": engine setup, hardware-aware model discovery, downloads, model configuration, process management, swapping, and runtime visibility. No Docker. No Python runtime. One Go binary with the web UI embedded.

Ignite v2 is a complete rewrite in Go. The original Python/Docker version is archived on the v1-archive branch.

The problem

Running local models today means picking a llama.cpp build, compiling it with the right CUDA flags, finding a GGUF that fits your VRAM, writing config, managing processes, and figuring out model swapping. If you have done it before, it takes an afternoon. If you have not, it can take a weekend.

What Ignite does

Build Ignite, run it, and open the local web UI. Ignite detects your GPUs, helps build llama.cpp, shows GGUF models that fit your hardware, downloads files from Hugging Face, and exposes an OpenAI-compatible API at localhost:8091/v1.

After setup, Ignite manages your inference stack:

OpenAI-compatible API - drop-in endpoint for apps that speak OpenAI-style chat, completions, and embeddings
Automatic model loading and swapping - request a configured model by ID or alias and Ignite loads it if it is not already running
Multi-GPU support - assign models to specific GPUs and run models on different GPUs concurrently
Model discovery - search Hugging Face GGUF repos and see hardware fit badges before downloading
Engine management - clone, build, update, and switch llama.cpp backends from the UI
Web dashboard - GPU monitoring, loaded model status, recent activity, endpoint snippets, runtime traffic, config, and playground

Everything runs locally. Ignite uses native llama.cpp subprocesses and keeps config, logs, state, model files, and backend checkouts on your machine.

Quick start

Requirements

NVIDIA GPU
CUDA toolkit for building/running CUDA llama.cpp builds
Go 1.24+
Node.js 20+ and npm for building the embedded web UI from source

Build and run

git clone https://github.com/Spadav/Ignite.git
cd Ignite
make build
./ignite

First launch walks you through setup. After that, open:

http://localhost:8091

The OpenAI-compatible endpoint is:

http://localhost:8091/v1

Run as a service

sudo cp ignite /usr/local/bin/
sudo tee /etc/systemd/system/ignite.service << 'EOF'
[Unit]
Description=Ignite
After=network.target

[Service]
ExecStart=/usr/local/bin/ignite --config /path/to/ignite.yaml
Restart=always
User=your-username

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl enable --now ignite

How it works

Ignite manages llama.cpp as native subprocesses. When a request comes in for a configured model:

Resolves the model ID or alias from config
Loads the model if it is not already running
Applies runtime-group swap rules for models in the same group
Starts llama-server with the configured flags, GPU assignment, and model file
Waits for health checks, proxies the request, and records request/response timing
Tracks idle time and unloads models after the configured TTL

Models on different GPUs can run concurrently. Models in the same runtime group can swap each other out depending on group settings.

Configuration

Ignite uses a single YAML config file. The web UI reads and writes it, and you can also edit it by hand.

listen: "0.0.0.0:8091"

backends:
  mainline:
    path: ./llama-backends/mainline
    binary: build-ignite/bin/llama-server
    buildDir: build-ignite
    repo: https://github.com/ggml-org/llama.cpp

activeBackend: mainline
modelsPath: ./models
mmprojectsPath: ./models/mmproj

ttl:
  global: 600

models:
  my-model:
    family: My Models
    profile: Default
    file: some-model-Q4_K_M.gguf
    gpu: GPU-abc123
    args: >-
      --jinja -ngl 99 -c 32768 -fa on
      --cache-type-k q4_0 --cache-type-v q4_0
      --split-mode none --main-gpu 0
    aliases:
      - default

groups:
  main:
    swap: true
    persistent: true
    members:
      - my-model

See ignite.example.yaml for a complete safe default.

UI

Page	Purpose
Dashboard	Live GPU stats, loaded models, recent requests, and endpoint info
Models	Local GGUF library, Hugging Face discovery with hardware fit badges, and downloads
Config	Per-model settings, launch args, GPU assignment, aliases, and runtime groups
Runtime	Live request/response traffic viewer with timing and token counts
Engines	llama.cpp backend management: clone, build, update, and inspect
Playground	Test configured models with request options, image input, response parsing, and expert JSON
Settings	Global paths, ports, TTL, health checks, about links, and update notification

API

curl http://localhost:8091/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "my-model", "messages": [{"role": "user", "content": "hello"}]}'

curl http://localhost:8091/v1/models

Request any configured model by model ID or alias. Ignite forwards the request to the appropriate llama.cpp server after loading the model if needed.

Looking for local TTS and STT?

Check out Timbre - a local voice gateway with OpenAI-compatible endpoints and swappable backends.

Development

make backend
make frontend
make test
make build

make build builds the web UI first, embeds it into the Go binary, and writes ./ignite.

Previous version

Ignite v2 is a complete rewrite. The original version is archived on the v1-archive branch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ignite

The problem

What Ignite does

Quick start

Requirements

Build and run

Run as a service

How it works

Configuration

UI

API

Looking for local TTS and STT?

Development

Previous version

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
api		api
backend		backend
backups		backups
config		config
docs/screenshots		docs/screenshots
downloads		downloads
gpu		gpu
llama-backends		llama-backends
logger		logger
logs		logs
models		models
process		process
router		router
runtime		runtime
state		state
ttl		ttl
version		version
web		web
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
ignite.example.yaml		ignite.example.yaml
main.go		main.go

Folders and files

Latest commit

History

Repository files navigation

Ignite

The problem

What Ignite does

Quick start

Requirements

Build and run

Run as a service

How it works

Configuration

UI

API

Looking for local TTS and STT?

Development

Previous version

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages