Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions docs/en/advanced/delta-weight-sync.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
- [Quick Start](#quick-start)
- [Mode vs Transport](#mode-vs-transport)
- [How It Works](#how-it-works)
- [Publish-Only Disk Delta](#publish-only-disk-delta)
- [Encoding Choice](#encoding-choice)
- [Why Not Colocated](#why-not-colocated)

Expand Down Expand Up @@ -92,6 +93,35 @@ For both transports, the receiver ends up calling the same `_apply_delta_payload

Selective overwrite has no arithmetic — the receiver writes the trainer's exact bytes at changed positions — so it's lossless by construction and there's no notion of drift to fight with periodic base re-syncs.

## Publish-Only Disk Delta

The disk path above pushes each version to known engines: rank 0 calls every engine's `update_weights_from_disk(load_format="delta")` and the sync ends when all engines acknowledge. That requires stable engine handles. When the serving side is an elastic fleet that consumes published versions on its own schedule — e.g. behind an [opaque HTTP rollout endpoint](external-rollout-engines.md#opaque-http-rollout-endpoint) — invert the direction with publish-only mode:

```bash
--update-weight-mode delta
--update-weight-transport disk
--update-weight-delta-publish-only
--custom-delta-publish-path my_pkg.publish.publish_delta
--update-weight-delta-keep-files
```

Instead of firing per-engine RPCs, rank 0 invokes your publish hook once per sync, after every delta file has been written and the optional `--custom-delta-pre-push-path` hook has committed:

```python
def publish_delta(args, version_dir: str, files: list[str], weight_version: str, rollout_engines) -> list | None:
... # e.g. upload version_dir to object storage, then announce weight_version
```

Returned Ray ObjectRefs are awaited before the version counts as settled. Behavior differences from the direct disk path:

- **One complete version per sync.** Direct disk transport publishes at each pass boundary so receivers can overlap apply with later encoding; publish-only defers everything to finalize, so external consumers never observe a partially published version.
- **Publish wait is configurable.** By default, `--update-weight-delta-publish-wait=next-sync` leaves the dispatched publish in flight across the next training step and settles it at the start of the next sync (or on disconnect). A failed publish therefore surfaces one sync late, on rank 0. Set `--update-weight-delta-publish-wait=sync` when the publish hook should block `update_weights`, for example because it polls an external rollout fleet until enough replicas report the new version ready.
- **Engines are left alone.** Generation is not paused, caches are not flushed, and no update RPCs are issued; consumers decide when to pick up a version. If the rollout endpoint supports request-level weight constraints, attach them from a `--custom-rollout-request-hook-path` hook so requests routed to lagging replicas fail/retry before doing unusable rollout compute.
- **No cleanup.** slime cannot know when consumers finish reading a version, so `--update-weight-delta-keep-files` is required and version-directory lifecycle belongs to you (e.g. the publish hook can prune old versions once uploaded).
- **No-op versions still publish.** If a sync produces no changed bytes, the hook is still called with an empty file list so consumers' version counters can advance.

`--update-weight-delta-root` optionally names a root directory for publish-side metadata; it defaults to the parent of `--update-weight-disk-dir` and is passed through to hooks via `args`.

## Encoding Choice

`--update-weight-encoding` picks how positions are packed. All three share the same on-wire layout (`__positions__` uint8 blob + `__values__` tensor + per-param manifest); decoder dispatches on the metadata.
Expand Down
30 changes: 27 additions & 3 deletions docs/en/advanced/external-rollout-engines.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,15 @@

An external rollout engine is an SGLang engine that is not launched by the slime training job. Another system deploys and owns the engine lifecycle; slime connects to those engines during training, registers a router, and syncs updated actor weights when needed.

This page is a roadmap. Use it to decide when to use `--rollout-external-engine-addrs`, when to stay with `--sglang-config`, and which weight-update path to pick for external deployments.
This page is a roadmap. Use it to decide when to use `--rollout-external-engine-addrs`, when to use `--rollout-http-endpoint-url`, when to stay with `--sglang-config`, and which weight-update path to pick for external deployments.

## Where To Start

| Goal | Recommended entry point |
| :--- | :--- |
| Engines are already launched externally and slime should only connect for rollout | `--rollout-external-engine-addrs` |
| Rollout serving is an elastic fleet behind a single HTTP URL, with no stable per-engine handles | `--rollout-http-endpoint-url` |
| The serving side pulls published weight versions instead of receiving direct update RPCs | `--update-weight-delta-publish-only`, see [Publish-Only Disk Delta](delta-weight-sync.md#publish-only-disk-delta) |
| slime should still launch engines, but you need PD disaggregation, multi-model serving, heterogeneous server groups, or per-group overrides | [SGLang Config](sglang-config.md) |
| Trainer and external engines can form an NCCL group | Default `--update-weight-mode full --update-weight-transport nccl` |
| Trainer and external engines cannot form an NCCL group, but can see the same filesystem path | `--update-weight-mode full --update-weight-transport disk` |
Expand Down Expand Up @@ -38,6 +40,27 @@ slime queries each engine's `/server_info` or `/get_server_info` endpoint and in

This path fits deployments where serving is owned outside the training job: a separate inference cluster, a separate Ray cluster, manually warmed SGLang engines, or a rollout service managed by another orchestrator.

## Opaque HTTP Rollout Endpoint

`--rollout-external-engine-addrs` still assumes SGLang engines with stable addresses: slime queries `/server_info` per engine, registers each one with a router, and pushes weight updates to known engine handles. Some deployments cannot offer that contract — for example a serverless or autoscaled inference fleet behind one URL, where workers come and go and no worker-management API is exposed. For those, point slime at the endpoint directly:

```bash
python train.py \
--rollout-http-endpoint-url https://rollout.example.com \
...
```

In this mode slime launches no engines and no router, and assumes nothing about the endpoint beyond the generation route: rollout requests POST to `{url}/generate`, and `get_model_url(args, ...)` in custom rollout functions resolves to the endpoint as well. No rollout GPUs are allocated in the placement group, `/server_info` is never queried, and slime fault tolerance does not manage the fleet — recovery is the endpoint operator's job. `--rollout-http-endpoint-url` and `--rollout-external-engine-addrs` are mutually exclusive.

Two companion flags adapt the default SGLang rollout to an endpoint that lacks router APIs:

- `--rollout-http-endpoint-abort-strategy {cancel-only,router-workers}`: how `abort` behaves between rollouts. `cancel-only` (the default when an endpoint URL is set) cancels slime's local pending generation tasks without calling the router's worker-list or per-worker abort APIs. `router-workers` keeps the existing router-based abort and remains the default otherwise. Note that `cancel-only` does not collect partial samples, so it does not compose with `--partial-rollout`.
- `--custom-rollout-request-hook-path`: optional hook called before each default SGLang `/generate` request. Signature: `def hook(args, sample, request) -> None | dict`. The `request` dict contains `url`, `payload`, `headers`, `max_retries`, `retry_sleep`, `rollout_id`, and `evaluation`; mutate it in place or return a dict of updates.

Use the request hook for rollout-endpoint admission control. For example, a hook may attach `"weight_version": {"exact_version": <ready_version>}` or `"weight_version": {"min_required_version": <minimum_version>}` and increase `max_retries`/`retry_sleep`. Those request fields avoid wasted rollout compute when an opaque router sends the request to a replica that has not loaded a usable version yet. They do not define SLIME's off-policy or staleness semantics; the trainer schedule and loss/correction path still decide which versions are valid.

For weight sync, an elastic fleet usually cannot receive per-engine `update_weights_from_disk` RPCs either. Combine the endpoint with publish-only delta sync, where the trainer publishes each complete weight version through a custom hook and the serving side consumes it on its own schedule — see [Publish-Only Disk Delta](delta-weight-sync.md#publish-only-disk-delta). If request-level minimum-version retry is enough, leave publish-only in its default pipelined mode. If the publish hook polls rollout-fleet status and you want the next rollout dispatch to wait for that readiness threshold, set `--update-weight-delta-publish-wait=sync`.

## Relationship With `--sglang-config`

`--rollout-external-engine-addrs` and `--sglang-config` are mutually exclusive because they own different boundaries:
Expand Down Expand Up @@ -108,8 +131,9 @@ For encoding choices, wire layout, receiver-side selective overwrite, and tuning
- External engines can use an independent SGLang environment; they do not need the slime or Megatron training environment.
- Disk transport supports different GPU models or vendors between training and rollout, as long as SGLang supports the target hardware and model format.
- Disk transport requires trainer and SGLang engines to see the same `--update-weight-disk-dir` path; a path visible only to the trainer is not enough.
- External engines are not recovered by slime fault tolerance; their lifecycle belongs to the external deployment system.
- `--sglang-config` and `--rollout-external-engine-addrs` are mutually exclusive.
- External engines are not recovered by slime fault tolerance; their lifecycle belongs to the external deployment system. The same applies to fleets behind `--rollout-http-endpoint-url`.
- `--sglang-config` and `--rollout-external-engine-addrs` are mutually exclusive, as are `--rollout-external-engine-addrs` and `--rollout-http-endpoint-url`.
- An opaque HTTP endpoint only needs to serve the generation route; worker-management APIs are never called. If the fleet cannot accept direct weight-update RPCs, use publish-only delta sync.
- Delta mode does not support `--colocate`, because colocated sync uses CUDA IPC handles and delta encoding does not reduce the actual transfer.

## Related Work
Expand Down
22 changes: 21 additions & 1 deletion docs/en/get_started/customization.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ Below is a summary of all available customization interfaces and their purposes.
| [`--custom-megatron-init-path`](#17-megatron-hooks) | Custom initialization after Megatron setup. |
| [`--custom-megatron-before-log-prob-hook-path`](#17-megatron-hooks) | Custom logic before log probability computation. |
| [`--custom-megatron-before-train-step-hook-path`](#17-megatron-hooks) | Custom logic before each training step. |
| [`--custom-rollout-request-hook-path`](#19-rollout-request-hook---custom-rollout-request-hook-path) | Customize each default SGLang `/generate` request before dispatch. |

## Agentic workflows through customization interfaces

Expand Down Expand Up @@ -457,6 +458,25 @@ Stabilize MoE RL training by recording and replaying expert routing decisions to
| `--use-routing-replay` | Forward-backward routing consistency in training. ([arXiv:2507.18071](https://arxiv.org/abs/2507.18071)) |
| `--use-rollout-routing-replay` | R3: Replay routing from rollout during training. Supported by slime's default `sglang_router` path. ([arXiv:2510.11370](https://arxiv.org/abs/2510.11370)) |

---

### 19. Rollout Request Hook (`--custom-rollout-request-hook-path`)

**Signature**:
```python
def hook(args, sample, request) -> None | dict
```

**Purpose**: Customize each default SGLang rollout `/generate` request before it
is sent. `request` contains `url`, `payload`, `headers`, `max_retries`,
`retry_sleep`, `rollout_id`, and `evaluation`. Mutate it in place or return a
dict of updates.

This hook is useful for external rollout providers that need request-level
admission control, for example adding `payload["weight_version"]` so a request
routed to a lagging replica fails and retries before doing unusable rollout
compute.

## Testing Custom Function Paths

slime also provides CPU-only contract tests for customization interfaces. These tests resolve components through import-path strings, so they can validate both built-in hooks and user-defined implementations passed through the same CLI arguments used by training.
Expand All @@ -470,7 +490,7 @@ The tests live under `tests/plugin_contracts/` and are grouped by hook shape:
- `tests/plugin_contracts/test_plugin_path_loading_contracts.py`
Covers `--eval-function-path`, `--custom-rm-path`, `--dynamic-sampling-filter-path`, `--buffer-filter-path`, `--data-source-path`, `--rollout-sample-filter-path`, and `--rollout-all-samples-process-path`
- `tests/plugin_contracts/test_plugin_runtime_hook_contracts.py`
Covers `--custom-rollout-log-function-path`, `--custom-eval-rollout-log-function-path`, `--custom-reward-post-process-path`, `--custom-convert-samples-to-train-data-path`, and `--rollout-data-postprocess-path`
Covers `--custom-rollout-log-function-path`, `--custom-eval-rollout-log-function-path`, `--custom-reward-post-process-path`, `--custom-convert-samples-to-train-data-path`, `--rollout-data-postprocess-path`, and `--custom-rollout-request-hook-path`

Run all customization contract tests locally:

Expand Down
30 changes: 30 additions & 0 deletions docs/zh/advanced/delta-weight-sync.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
- [快速开始](#快速开始)
- [同步模式与传输方式](#同步模式与传输方式)
- [工作原理](#工作原理)
- [Publish-Only 磁盘 Delta](#publish-only-磁盘-delta)
- [编码选择](#编码选择)
- [为何不支持 colocated](#为何不支持-colocated)

Expand Down Expand Up @@ -88,6 +89,35 @@ Delta NCCL 和 delta 磁盘共用同一条发送管线、同一种 wire 布局

选择性覆写没有任何算术运算 —— 接收端在变化位置直接写入训练端的精确字节 —— 因此天然无损,也不存在数值漂移问题,无需周期性 base 同步。

## Publish-Only 磁盘 Delta

上面的磁盘路径把每个版本推送给已知 engine:rank 0 调用每个 engine 的 `update_weights_from_disk(load_format="delta")`,所有 engine 确认后同步才结束。这要求 engine 句柄稳定。当 serving 侧是一个按自己节奏消费已发布版本的弹性集群——例如位于 [opaque HTTP rollout endpoint](external-rollout-engines.md#opaque-http-rollout-endpoint) 之后——可以用 publish-only 模式反转方向:

```bash
--update-weight-mode delta
--update-weight-transport disk
--update-weight-delta-publish-only
--custom-delta-publish-path my_pkg.publish.publish_delta
--update-weight-delta-keep-files
```

rank 0 不再发出 per-engine RPC,而是在每次同步中调用一次你的 publish hook——此时所有 delta 文件已经写完,可选的 `--custom-delta-pre-push-path` hook 也已提交:

```python
def publish_delta(args, version_dir: str, files: list[str], weight_version: str, rollout_engines) -> list | None:
... # 例如把 version_dir 上传到对象存储,然后公告 weight_version
```

返回的 Ray ObjectRef 会在该版本视为完成之前被等待。与直接磁盘路径的行为差异:

- **每次同步发布一个完整版本。** 直接磁盘传输在每个 pass 边界发布,让接收端的 apply 与后续编码重叠;publish-only 把所有发布推迟到 finalize,外部消费者永远不会看到只发布了一半的版本。
- **发布等待可配置。** 默认 `--update-weight-delta-publish-wait=next-sync` 会让已派发的 publish 在下一个训练 step 期间保持 in flight,并在下一次同步开始时(或 disconnect 时)结算。因此 publish 失败会晚一个同步周期才在 rank 0 上暴露。如果 publish hook 会轮询外部 rollout 集群、并且希望下一次 rollout dispatch 等到足够副本就绪后再开始,可以设置 `--update-weight-delta-publish-wait=sync`。
- **不打扰 engine。** 不暂停生成、不清空 cache、不发出任何 update RPC;消费者自己决定何时拉取新版本。如果 rollout endpoint 支持请求级权重约束,可以在 `--custom-rollout-request-hook-path` hook 中附加这些约束,让路由到落后副本的请求尽早失败并重试,避免生成不可用样本。
- **不做清理。** slime 无法知道消费者何时读完一个版本,所以必须加 `--update-weight-delta-keep-files`,版本目录的生命周期由你负责(例如 publish hook 可以在上传完成后清理旧版本)。
- **空 delta 也会发布。** 如果某次同步没有任何字节变化,hook 仍会以空文件列表被调用,让消费者的版本计数得以推进。

`--update-weight-delta-root` 可选地指定发布侧元数据的根目录;缺省为 `--update-weight-disk-dir` 的父目录,并通过 `args` 透传给 hook。

## 编码选择

`--update-weight-encoding` 决定位置如何打包。三种编码共用同一种 wire 布局(`__positions__` uint8 块 + `__values__` 张量 + per-param manifest),解码端根据 metadata 分派。
Expand Down
Loading
Loading