fix(vllm_model): preserve required_prefix_token_ids#1545
Conversation
Signed-off-by: Ajay Mittur <amittur@nvidia.com>
075781e to
46bc5c0
Compare
|
Thanks for the PR! Could you compare to here? #1294 I think that I am not sure that this top level field is sent as-is from Gym today, hence 1294 'auto-derives' it. do you have an example of token id mismatch or motivation for this PR? |
|
Thanks for taking a look. Yes gym does not send the field in the top level and the API between Nemo Gym and Nemo RL was a little bit unclear since the field is not set as a private or internal variable (like Discussed this with @bxyu-nvidia offline and think it would be better to keep a single source of origin ( Happy to close this if we don't want to allow sending |
|
Thanks @ajaymittur
Are you saying that your harness does not inject the fields Here is an example of how we have to similarly handle this for other harness such as Hermes Agent NousResearch/hermes-agent@main...nemo-gym-changes I do think we need a better solution than these changes on all harnesses (as it is not possible for all, such as claude code), maybe solution is token id caching or auto derive or something |
My harness was using them to construct I do like the token caching and auto derive directions but that would probably require a larger design discussion. I think ProRL does something similar with their ManagedSession which makes integrating harnesses easy according to their docs https://github.com/NVIDIA-NeMo/ProRL-Agent-Server/tree/stable/src/polar/agent#the-harness-contract We can also make the API interface between Nemo RL and Gym clearer in the docs. |
|
Closing in favor of #1554 |
Summary
Preserve
required_prefix_token_idswhen Gym proxies Chat Completions requests to vLLM and when it falls back to/tokenizeto recover prompt token IDs.This field can be passed through OpenAI
extra_bodyby RL agents that need exact multi-turn token continuity. Without declaring the field in Gym's request model, Pydantic drops it duringmodel_dump(exclude_unset=True). Without forwarding it to/tokenize, Gym can still compute unrepaired prompt token IDs even if generation used the repaired prefix.Why
NeMo-RL checks that generated multi-turn trajectories remain token-contiguous with previous messages. For tool-use trajectories, decoded assistant/tool-call text can retokenize differently from the original generated token IDs.
NeMo-RL's vLLM OpenAI preprocessing hook uses
_replace_prefix_tokensto splice the exact prior model-emitted prefix back into the templated prompt. Gym needs to preserve the externally suppliedrequired_prefix_token_idsfield so that both generation and tokenize fallback use the same prefix repair path.Changes
required_prefix_token_idstoNeMoGymChatCompletionCreateParamsNonStreaming.required_prefix_token_idsinto the vLLM/tokenizefallback used when top-levelprompt_token_idsare absent from the chat completion response.Related
Related to the NeMo-RL vLLM worker prefix-token repair path:
https://github.com/NVIDIA-NeMo/RL/blob/main/nemo_rl/models/generation/vllm/vllm_worker_async.py