A flexible FastAPI service for serving computer vision models used by the Wildbook platform. Supports detection, classification, orientation estimation, embedding extraction, explainability, and part-body assignment across multiple model architectures.
| Type | Architecture | Use Case |
|---|---|---|
yolo-ultralytics |
YOLOv11 (Ultralytics) | Object detection |
megadetector |
MegaDetector (PytorchWildlife) | Animal/person/vehicle detection |
lightnet |
PyDarknet YOLO v2/v3 | Species-specific detection (WBIA legacy models) |
efficientnetv2 |
EfficientNet-B4 (timm) | Species/viewpoint classification |
densenet-orientation |
DenseNet-201 (torchvision) | Orientation classification |
miewid |
MiewID transformer | Embedding extraction for re-identification |
POST /predict/
Runs object detection and returns bounding boxes. Works with any detection model type (YOLO, MegaDetector, LightNet).
Request:
{
"model_id": "msv3",
"image_uri": "https://example.com/image.jpg",
"model_params": {
"conf": 0.6,
"imgsz": 640
}
}image_uri accepts:
- URL:
https://example.com/image.jpg(fetched server-side) - Local path:
/data/db/images/image.jpg(read from server filesystem) - Data URI:
data:image/jpeg;base64,/9j/4AAQ...(inline base64-encoded image)
Data URIs are supported across all endpoints (/predict/, /classify/, /extract/, /pipeline/).
Response:
{
"bboxes": [[68.0, 134.6, 71.5, 130.7]],
"scores": [0.9054],
"thetas": [0.0],
"class_names": ["dog"],
"class_ids": [16]
}bboxes:[x, y, width, height]in pixels (top-left origin)thetas: Rotation angle in radians (0.0 for axis-aligned boxes)scores: Confidence scores (0.0 to 1.0)
POST /classify/
Runs image classification. Works with EfficientNet and DenseNet orientation models.
Request:
{
"model_id": "efficientnet-classifier",
"image_uri": "https://example.com/image.jpg",
"bbox": [100, 100, 300, 200],
"theta": 0.0
}bbox(optional): Crop region[x, y, width, height]before classifyingtheta(optional): Rotation angle in radians
Response:
{
"model_id": "efficientnet-classifier",
"predictions": [
{"index": 0, "label": "back", "probability": 0.811}
],
"all_probabilities": [0.811, 0.0, 0.0003, 0.007, 0.457, 0.00003],
"threshold": 0.5,
"bbox": [100, 100, 300, 200],
"theta": 0.0
}When parse_compound_labels is enabled in the model config, predictions include parsed species and viewpoint:
{
"predictions": [
{
"label": "chelonia_mydas:left",
"species": "chelonia_mydas",
"viewpoint": "left",
"probability": 0.92
}
]
}POST /extract/
Extracts feature embeddings for re-identification using MiewID models.
Request:
{
"model_id": "miewid-msv4.1",
"image_uri": "https://example.com/image.jpg",
"bbox": [50, 50, 200, 200],
"theta": 0.0
}Response:
{
"model_id": "miewid-msv4.1",
"embeddings": [0.1234, -0.5678, 0.9012],
"embeddings_shape": [1, 512],
"bbox": [50, 50, 200, 200],
"theta": 0.0
}POST /explain/
Generates visual explanations of what features two images share, using PAIR-X.
Request:
{
"image1_uris": ["image_a.jpg"],
"bb1": [[100, 100, 300, 200]],
"theta1": [0.0],
"image2_uris": ["image_b.jpg"],
"bb2": [[0, 0, 0, 0]],
"theta2": [0.0],
"model_id": "miewid-msv3",
"algorithm": "pairx",
"visualization_type": "lines_and_colors",
"layer_key": "backbone.blocks.3",
"k_lines": 20,
"k_colors": 5,
"crop_bbox": false
}bbof[0,0,0,0]means no crop (use full image)visualization_type:lines_and_colors,only_lines, oronly_colorslayer_key: Earlier layers (e.g.backbone.blocks.1) focus on specific points; later layers focus on broad areas- Returns a list of numpy arrays (images)
POST /pipeline/
Runs detection, then classifies and extracts embeddings for each detected region above a confidence threshold.
Request:
{
"predict_model_id": "msv3",
"classify_model_id": "efficientnet-classifier",
"extract_model_id": "miewid-msv4.1",
"image_uri": "https://example.com/image.jpg",
"bbox_score_threshold": 0.5,
"predict_model_params": {"conf": 0.6}
}| Parameter | Required | Description |
|---|---|---|
predict_model_id |
yes | Detection model (YOLO, MegaDetector, or LightNet) |
classify_model_id |
yes | Classification model (EfficientNet) |
extract_model_id |
yes | Embedding model (MiewID) |
image_uri |
yes | URL, local path, or data: URI |
orientation_model_id |
no | DenseNet orientation model; when provided, orientation is estimated for each detection and included in results |
bbox_score_threshold |
no | Minimum detection confidence (default: 0.5, range: 0.0-1.0) |
predict_model_params |
no | Override detection model parameters |
When orientation_model_id is provided, each result includes an orientation field:
{
"pipeline_results": [
{
"bbox": [68.0, 134.6, 71.5, 130.7],
"theta": 0.0,
"bbox_score": 0.91,
"detection_class": "elephant+head",
"classification": {"class": "elephant:left", "probability": 0.99, "class_id": 2},
"orientation": {"label": "left", "probability": 0.95},
"embedding": [0.1234, -0.5678],
"embedding_shape": [1, 512]
}
]
}Response:
{
"image_uri": "https://example.com/image.jpg",
"models_used": {
"predict_model_id": "msv3",
"classify_model_id": "efficientnet-classifier",
"extract_model_id": "miewid-msv4.1"
},
"total_predictions": 15,
"filtered_predictions": 3,
"pipeline_results": [
{
"bbox": [68.0, 134.6, 71.5, 130.7],
"theta": 0.0,
"bbox_score": 0.9054,
"detection_class": "dog",
"detection_class_id": 16,
"classification": {
"class": "back",
"probability": 0.811,
"class_id": 0
},
"embedding": [0.1234, -0.5678],
"embedding_shape": [1, 512]
}
]
}POST /assign/
Matches "part" annotations (e.g. lion+head) to "body" annotations using geometric features and species-specific scikit-learn classifiers.
Request:
{
"species": "lion",
"annotations": [
{"aid": 1, "bbox": [100, 50, 200, 300], "theta": 0.0, "viewpoint": "left", "is_part": false},
{"aid": 2, "bbox": [120, 60, 80, 80], "theta": 0.0, "viewpoint": "left", "is_part": true}
],
"image_width": 1024,
"image_height": 768,
"cutoff_score": 0.5
}Response:
{
"assigned_pairs": [
{"part_aid": 2, "body_aid": 1, "score": 0.87}
],
"unassigned_aids": []
}The assignment algorithm computes geometric features (IoU, distances, containment, aspect ratios, viewpoint matches) for every (part, body) pair, scores them with the species classifier, then greedily assigns highest-scoring pairs.
Supported species include wild dog (default fallback), lion, zebra (Grevy's, plains), sea turtles, hyena, and others.
GET /health
Returns service health including GPU status, CUDA availability, and loaded model count.
For backward compatibility with Wildbook, ml-service provides endpoints that mimic WBIA's async job queue pattern. Wildbook can point to ml-service as a drop-in replacement for WBIA's detection/labeling pipeline without changing its HTTP client code.
- Submit job:
POST /api/engine/detect/cnn/returns ajobid - Poll status:
GET /api/engine/job/status/?jobid=Xreturns{"jobstatus": "completed"} - Fetch result:
GET /api/engine/job/result/?jobid=Xreturns detection results
All responses are wrapped in WBIA's standard envelope:
{
"status": {"success": true, "code": "", "message": "", "cache": -1},
"response": "<data>"
}POST /api/engine/detect/cnn/
POST /api/engine/detect/cnn/yolo/
POST /api/engine/detect/cnn/lightnet/
Request:
{
"image_uuid_list": ["/path/to/image1.jpg", "/path/to/image2.jpg"],
"model_tag": "detect-hyaena",
"labeler_model_tag": "labeler-hyaena",
"use_labeler_species": true,
"sensitivity": 0.3,
"nms_thresh": 0.4,
"assigner_algo": null,
"callback_url": null
}| Parameter | Description |
|---|---|
image_uuid_list |
List of image file paths or URLs |
model_tag |
Detection model ID (must be loaded in model config) |
labeler_model_tag |
Optional classification model for viewpoint/species labeling |
viewpoint_model_tag |
Alias for labeler_model_tag |
use_labeler_species |
If true, override detection species with labeler's species prediction |
sensitivity |
Minimum detection confidence threshold |
nms_thresh |
NMS threshold |
assigner_algo |
If set, run part-body assignment after detection |
callback_url |
Accepted but not used (Wildbook polls instead) |
Response (immediate): job ID string wrapped in WBIA envelope.
GET /api/engine/job/status/?jobid=<jobid>
Response:
{
"status": {"success": true, "code": "", "message": "", "cache": -1},
"response": {"jobstatus": "completed"}
}Job statuses: received, working, completed, exception, unknown.
GET /api/engine/job/result/?jobid=<jobid>
Response:
{
"status": {"success": true, "code": "", "message": "", "cache": -1},
"response": {
"json_result": {
"image_uuid_list": ["/path/to/image1.jpg"],
"results_list": [
[
{
"id": 1,
"uuid": "a1b2c3d4-...",
"xtl": 120,
"ytl": 45,
"left": 120,
"top": 45,
"width": 200,
"height": 150,
"theta": 0.0,
"confidence": 0.92,
"class": "hyaena",
"species": "hyaena",
"viewpoint": "left",
"quality": null,
"multiple": false,
"interest": false
}
]
],
"score_list": [0.0],
"has_assignments": false
}
}
}Each entry in results_list is a list of annotation dicts for the corresponding image. Annotation fields match WBIA's format:
| Field | Description |
|---|---|
xtl, ytl / left, top |
Top-left corner (pixels) |
width, height |
Bounding box dimensions |
theta |
Rotation angle (radians) |
confidence |
Detection score |
class, species |
Detected/labeled species |
viewpoint |
Viewpoint label (set by labeler, null if no labeler) |
quality, multiple, interest |
WBIA-compatible flags (defaults) |
When labeler_model_tag is provided, the endpoint runs a combined pipeline per image:
- Detect with
model_tag-- get bounding boxes - Label with
labeler_model_tag-- classify each detection for viewpoint (and optionally species) - Assign (if
assigner_algoset) -- match parts to bodies
Images are loaded once and passed through all pipeline steps as bytes.
GET /api/engine/job/
Returns a list of all job IDs. The job store is bounded to 10,000 entries with LRU eviction of completed jobs.
Models are configured in app/model_config.json:
{
"models": [
{
"model_id": "msv3",
"model_type": "yolo-ultralytics",
"model_path": "/path/to/detect.yolov11.msv3.pt",
"imgsz": 640,
"conf": 0.5
},
{
"model_id": "mdv6",
"model_type": "megadetector",
"model_path": "/path/to/mdv6-yolov10-e.pt",
"imgsz": 1280,
"conf": 0.1,
"iou": 0.45
},
{
"model_id": "detect-hyaena",
"model_type": "lightnet",
"config_path": "/path/to/detect.lightnet.hyaena.v0.py",
"weight_path": "/path/to/detect.lightnet.hyaena.v0.weights",
"conf": 0.1,
"nms_thresh": 0.4
},
{
"model_id": "efficientnet-classifier",
"model_type": "efficientnetv2",
"checkpoint_path": "/path/to/vplabeler-msv3.pt",
"img_size": 512,
"threshold": 0.5
},
{
"model_id": "labeler-seaturtle",
"model_type": "efficientnetv2",
"checkpoint_path": "/path/to/classifier.seaturtle.v0.pth",
"img_size": 512,
"threshold": 0.5,
"model_arch": "tf_efficientnet_b4_ns",
"multi_label": true,
"parse_compound_labels": true
},
{
"model_id": "orientation-seaturtle",
"model_type": "densenet-orientation",
"checkpoint_path": "/path/to/orientation.seaturtle.v0.pth",
"img_size": 224
},
{
"model_id": "miewid-msv4.1",
"model_type": "miewid",
"checkpoint_path": "/path/to/miew_id.msv4_1_main.bin",
"imgsz": 440
}
]
}YOLO Ultralytics (yolo-ultralytics):
model_path: Path or URL to.ptweightsimgsz: Input image size (default: 640)conf: Confidence threshold (default: 0.5)dilation_factors: Optional[x, y]bbox dilation
MegaDetector (megadetector):
model_path: Path or URL to.ptweightsimgsz: Input image size (default: 1280)conf: Confidence threshold (default: 0.1)iou: IoU threshold for NMS (default: 0.45)
LightNet (lightnet):
config_path: Path or URL to.pyHyperParameters configweight_path: Path or URL to.weightsbinaryconf: Confidence threshold (default: 0.1)nms_thresh: NMS threshold (default: 0.4)batch_size: Batch size for multi-image inference (default: 192)
EfficientNet (efficientnetv2):
checkpoint_path: Path or URL to checkpointimg_size: Input image size (default: 512)threshold: Classification threshold (default: 0.5)model_arch: timm architecture name (default:tf_efficientnet_b4_ns)label_map: Optional dict of{index: "label"}(otherwise loaded from checkpoint)n_classes: Optional explicit class countmulti_label: Use sigmoid + threshold (true) or softmax + argmax (false) (default: true)parse_compound_labels: Split labels on:into species/viewpoint fields (default: false)
DenseNet Orientation (densenet-orientation):
checkpoint_path: Path or URL to checkpoint (format:{"state": state_dict, "classes": [...]})img_size: Input image size (default: 224)label_map: Optional explicit label map (otherwise loaded from checkpointclasseskey)
MiewID (miewid):
checkpoint_path: Path or URL to model binaryimgsz: Input image size (default: 440)
All path parameters accept URLs, which are downloaded and cached on startup.
- Python 3.10+ (3.12 recommended)
- NVIDIA GPU with CUDA 12.1+ drivers (for GPU inference)
- NVIDIA Container Toolkit (for Docker GPU access)
- Docker and Docker Compose v2 (for containerized deployment)
cd docker
cp _env .envEdit .env and set MODELS_DIR to the directory containing your model weight files:
# Required: directory with .pt, .weights, .bin, .pth model files
MODELS_DIR=/data0/models
# Optional overrides
# GPU_ID=0 # which GPU (default: 0)
# DEVICE=cuda # cuda or cpu (default: cuda)
# WORKERS=1 # uvicorn workers (default: 1, use 1 for GPU)
# ML_SERVICE_PORT=6050 # host port (default: 6050)
# DATA_DB_DIR=/data/db # shared image path with WBIA/WildbookEdit app/model_config.json to list the models you want to load. Paths in the config should use /models/ (the container mount point for MODELS_DIR):
{
"models": [
{
"model_id": "msv3",
"model_type": "yolo-ultralytics",
"model_path": "/models/detect.yolov11.msv3.pt",
"imgsz": 640,
"conf": 0.5
}
]
}Model weights can also be URLs — they will be downloaded on first startup.
cd docker
docker compose up --build -dThe service starts on port 6050 (or ML_SERVICE_PORT if set). Check health:
curl http://localhost:6050/healthWatch logs:
docker compose logs -f ml-servicedocker compose down| Container path | Source | Purpose |
|---|---|---|
/models/ |
MODELS_DIR |
Model weight files (read-only) |
/datasets/ |
MODELS_DIR |
Alias for /models/ (backward compat with existing configs) |
/app/app/model_config.json |
app/model_config.json |
Model configuration (read-only) |
/data/db/ |
DATA_DB_DIR |
Shared image directory with WBIA/Wildbook (optional) |
To run on CPU (e.g. for testing), set DEVICE=cpu in .env and remove the deploy.resources.reservations.devices block from docker-compose.yml, or use:
DEVICE=cpu docker compose up --buildIf Wildbook and ml-service run on the same host, create a shared Docker network so Wildbook can reach ml-service by container name:
docker network create shared_netThen add to docker-compose.yml:
services:
ml-service:
networks:
- shared_net
networks:
shared_net:
external: trueWildbook can then call http://ml-service:6050/api/engine/detect/cnn/.
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtStart the server:
# Development (auto-reload on code changes)
python3 -m app.main --device cuda --host 0.0.0.0 --port 6050 --reload
# Production
python3 -m app.main --device cuda --host 0.0.0.0 --port 6050 --workers 1Or with uvicorn directly:
uvicorn app.main:app --host 0.0.0.0 --port 6050| Flag | Default | Description |
|---|---|---|
--device |
cuda |
PyTorch device: cuda, cpu, or mps |
--host |
0.0.0.0 |
Bind address |
--port |
8888 |
Listen port |
--workers |
1 |
Uvicorn worker count (use 1 for GPU to avoid VRAM contention) |
--reload |
off | Auto-reload on code changes (development only) |