Change enforce_eager default to False#1818
Conversation
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request changes the default value of enforce_eager from True to False in InferenceEngineConfig within skyrl/train/config/config.py to enable CUDA graphs by default for higher performance. Consequently, various training run scripts under examples/train/ have been updated to either explicitly set ENFORCE_EAGER=false or to pass the $ENFORCE_EAGER variable to the generator's inference engine configuration. For DAPO-related recipes, ENFORCE_EAGER remains true with updated comments indicating a TODO to reproduce results with it set to false. There are no review comments, and I have no additional feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
What does this PR do?
It's been a while since #569 . Our internal multi-turn RL training runs use
enforce_eagerFalsenow. We've previously used default True because of lower reward with DAPO last year. I believe it is safe to turn it to False now. We also have better rollout train mismatch corrections like geometric sequence masking now.Since our current reproduction curves all use
enforce_eagerfor DAPO, i've retained that in the script. For scripts that aren't exact DAPO reproductions ,enforce_eagercan be left tofalse- I've made this change for many GSM8k scripts nowWe should also reproduce the DAPO recipe again with cuda graphs enabled.