Deployment stuck in "running" forever after the build OOMs/saturates the host - no way to cancel or clear the queue without rebooting

### To Reproduce

Environment: a multi-server setup - Dokploy panel on the manager node, app deployed to a remote server. Small nodes, no swap.

1. Create a Docker Compose application whose docker-compose.yml has TWO services that both `build:` from the SAME Dockerfile (a common pattern: a Next.js app with a `web` service and a `worker` service sharing one image).
2. Deploy it. Dokploy runs `docker compose ... up -d --build`, and Compose/BuildKit builds BOTH services in parallel - i.e. two simultaneous `npm ci` + `next build` from the same Dockerfile.
3. On a resource-constrained host with no swap, the two parallel builds exhaust CPU/RAM. The host goes to ~100% CPU and becomes unreachable (SSH times out during banner exchange), the build process is killed.
4. Regain access (only possible via a hard reboot from the hosting dashboard).
5. Observe the deployment in the Dokploy UI: it is stuck in "running" state and never finishes.

Under the hood after the crash:
- Postgres `deployment` table has a row stuck at `status = running`.
- Redis still has the BullMQ job for it: `bull:deployments:<id>` present, listed in `bull:deployments:active`, with a leftover `bull:deployments:<id>:lock`. The worker that held it was killed, so the job never completes and is not recovered.

There is no documented UI/CLI way to cancel or clear this. The only workarounds I found:
- reboot the server, or
- manually: `UPDATE deployment SET status='error' WHERE status='running'` in the dokploy Postgres, delete the `bull:deployments:*` keys in the dokploy Redis, then `docker service update --force dokploy`.

### Current vs. Expected behavior

Current: After a build is killed because it exhausted host resources, the deployment stays "running" indefinitely. The BullMQ job remains active+locked and is never marked stalled/failed. There is no working "Cancel" path in the UI, so the queue is wedged until the server is rebooted or Redis/Postgres are edited by hand. New deploys queue behind the stuck one.

Expected:
1. An orphaned/stalled deployment whose worker died should be auto-recovered (marked failed via BullMQ stalled-job handling) and/or there should be a reliable "Cancel deployment" button in the UI that works even when the build process/worker is gone.
2. Builds should have guardrails so a single deploy cannot take down the host: e.g. build an image only ONCE when multiple compose services share the same build/Dockerfile (instead of building every service in parallel), and/or a configurable build concurrency / memory limit.

### Provide environment information

```bash
Operating System:
OS: <manager node: Ubuntu 24.04; target node: Ubuntu 24.04>
Arch: x86_64
Dokploy version: 0.29.4
VPS Provider: Hetzner (manager) + Another Provider (target)
Manager node: 4 vCPU / 7.6 GB RAM / no swap
Target node:  2 vCPU / 1.9 GB RAM / no swap
```

### Which area(s) are affected? (Select all that apply)

Remote server, Application

### Are you deploying the applications where Dokploy is installed or on a remote server?

Remote server

### Additional context

- The host became unreachable specifically during the build step of a redeploy; CPU hit 100% and SSH timed out at banner exchange.
- Trigger appears to be parallel builds of two compose services that share one Dockerfile (BuildKit builds both at once). Switching the compose so the image builds once (one service `build:`, the other reuses the same `image:` tag) greatly reduces the build footprint.
- Neither /docs/core nor /docs/core/troubleshooting documents how to cancel/clear a stuck deployment, which is why this is hard to recover from.

### Will you send a PR to fix it?

No

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deployment stuck in "running" forever after the build OOMs/saturates the host - no way to cancel or clear the queue without rebooting #4461

To Reproduce

Current vs. Expected behavior

Provide environment information

Which area(s) are affected? (Select all that apply)

Are you deploying the applications where Dokploy is installed or on a remote server?

Additional context

Will you send a PR to fix it?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Deployment stuck in "running" forever after the build OOMs/saturates the host - no way to cancel or clear the queue without rebooting #4461

Description

To Reproduce

Current vs. Expected behavior

Provide environment information

Which area(s) are affected? (Select all that apply)

Are you deploying the applications where Dokploy is installed or on a remote server?

Additional context

Will you send a PR to fix it?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions