[chore]Upgrade vllm to 0.23.0 by SumanthRH · Pull Request #1800 · NovaSky-AI/SkyRL

SumanthRH · 2026-06-17T23:01:39Z

No description provided.

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

gemini-code-assist

Code Review

This pull request updates several dependencies in pyproject.toml. Specifically, it upgrades vllm from 0.20.2 to 0.23.0 and the flashinfer packages (flashinfer-python, flashinfer-jit-cache, and flashinfer-cubin) from 0.6.8.post1 to 0.6.11.post2 across the fsdp and megatron environments. It also removes the constraint-dependencies block and introduces override-dependencies for the updated flashinfer packages to resolve version conflicts with Megatron-Bridge. There are no review comments, so I have no additional feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

SumanthRH · 2026-06-18T01:36:43Z

We still can't migrate to the native vllm weight sync APIs because vllm-project/vllm#42577 is not merged yet. I am just hacking around the limitations right now and renaming the worker wrap functions to avoid conflicts with the native start_weight_update and finish_weight_update methods that I added in vllm-project/vllm#39212

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

SumanthRH · 2026-06-18T01:47:15Z

        logger.info(f"Exporting `SKYRL_RAY_PG_TIMEOUT_IN_S` to ray runtime env: {pg_timeout}")
        env_vars["SKYRL_RAY_PG_TIMEOUT_IN_S"] = pg_timeout

+    # Health-check timeout for the inference server actor. Forwarded so `VLLMServerActor.start`


this is unrelated to upgrade but a good fix

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH · 2026-06-18T06:22:32Z

GPU CI faillures for 020ec92:

Non-Megatron

=========================== short test summary info ============================
FAILED tests/backends/skyrl_train/gpu/gpu_ci/inference_servers/test_weight_sync.py::TestWeightUpdateFlow::test_update_weights_flow[no_pd] - aiohttp.client_exceptions.ClientResponseError: 500, message="Call to collective_rpc method failed: Worker failed with error 'start_weight_update must be called before update_weights.', please check the stack trace above for the root cause", url='http://10.0.74.103:8000/update_weights'
FAILED tests/backends/skyrl_train/gpu/gpu_ci/inference_servers/test_weight_sync.py::TestWeightUpdateFlow::test_update_weights_flow[pd_1P1D_non_colocated] - aiohttp.client_exceptions.ClientResponseError: 500, message="Call to collective_rpc method failed: Worker failed with error 'start_weight_update must be called before update_weights.', please check the stack trace above for the root cause", url='http://10.0.74.103:8100/update_weights'
FAILED tests/backends/skyrl_train/gpu/gpu_ci/inference_servers/test_weight_sync.py::TestColocatedIpcWeightUpdateFlow::test_update_weights_ipc - aiohttp.client_exceptions.ClientResponseError: 500, message="Call to collective_rpc method failed: Worker failed with error 'start_weight_update must be called before update_weights.', please check the stack trace above for the root cause", url='http://10.0.74.103:8000/update_weights'
FAILED tests/backends/skyrl_train/gpu/gpu_ci/inference_servers/test_weight_sync.py::test_worker_wrap_load_weights_preserves_moe_forward - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
FAILED tests/backends/skyrl_train/gpu/gpu_ci/inference_servers/test_weight_sync.py::test_worker_wrap_multichunk_reload_preserves_moe_forward - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
FAILED tests/backends/skyrl_train/gpu/gpu_ci/test_policy_local_engines_e2e.py::test_policy_local_engines_e2e[non_colocated_nccl_fsdp_vllm_dp] - ray.exceptions.RayTaskError(RuntimeError): ray::VLLMServerActor.start() (pid=82517, ip=10.0.74.103, actor_id=6729dfaed99d2436317174754b000000, repr=<skyrl.backends.skyrl_train.inference_servers.vllm_server_actor.VLLMServerActor object at 0x7ef13eccea20>)

the failures tests/backends/skyrl_train/gpu/gpu_ci/inference_servers/test_weight_sync.py::test_worker_wrap_load_weights_preserves_moe_forward and tests/backends/skyrl_train/gpu/gpu_ci/inference_servers/test_weight_sync.py::test_worker_wrap_multichunk_reload_preserves_moe_forward are unrelated
tests/backends/skyrl_train/gpu/gpu_ci/test_policy_local_engines_e2e.py::test_policy_local_engines_e2e[non_colocated_nccl_fsdp_vllm_dp] failed due to an Address already in use error: I was not able to repro on 4x h100s.
Other failures are fixed by migrating the tests to use the new skyrl_start_weight_update and skyrl_finish_weight_update_methods: 1ef68fb

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH · 2026-06-18T06:34:22Z

Megatron

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/backends/skyrl_train/gpu/gpu_ci/megatron/test_megatron_extractor_consistency.py::test_megatron_extractor_iteration_order_consistency[qwen3_5_35b_a3b_mm_moe] - ray.exceptions.ActorDiedError: The actor died because of an error raised in its creation task, ray::_ProbeMegatronRefWorker.__init__() (pid=11084, ip=10.0.56.206, actor_id=711b2fe07b59b440f7ae6c8d03000000, repr=<tests.backends.skyrl_train.gpu.gpu_ci.megatron.test_megatron_extractor_consistency.FunctionActorManager._create_fake_actor_class.<locals>.TemporaryActor object at 0x7e5bfa710b00>)
  File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
           ^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: The actor with name _ProbeMegatronRefWorker failed to import on the worker. This may be because needed library dependencies are not installed in the worker environment:

ray::_ProbeMegatronRefWorker.__init__() (pid=11084, ip=10.0.56.206, actor_id=711b2fe07b59b440f7ae6c8d03000000, repr=<tests.backends.skyrl_train.gpu.gpu_ci.megatron.test_megatron_extractor_consistency.FunctionActorManager._create_fake_actor_class.<locals>.TemporaryActor object at 0x7e5bfa710b00>)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/tests/backends/skyrl_train/gpu/gpu_ci/megatron/test_megatron_extractor_consistency.py", line 36, in <module>
    from tests.backends.skyrl_train.gpu.utils import init_worker_with_type
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/tests/backends/skyrl_train/gpu/utils.py", line 36, in <module>
    from skyrl.backends.skyrl_train.inference_servers.setup import create_inference_servers
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/skyrl/backends/skyrl_train/inference_servers/setup.py", line 20, in <module>
    from .utils import (
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/skyrl/backends/skyrl_train/inference_servers/utils.py", line 7, in <module>
    from skyrl.backends.skyrl_train.inference_servers.new_inference_worker_wrap import (
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/skyrl/backends/skyrl_train/inference_servers/new_inference_worker_wrap.py", line 31, in <module>
    from skyrl.backends.skyrl_train.inference_servers.layerwise_reload import (
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/skyrl/backends/skyrl_train/inference_servers/layerwise_reload.py", line 37, in <module>
    from vllm.model_executor.model_loader.reload.meta import (
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/model_executor/model_loader/__init__.py", line 12, in <module>
    from vllm.model_executor.model_loader.bitsandbytes_loader import BitsAndBytesModelLoader
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/model_executor/model_loader/bitsandbytes_loader.py", line 24, in <module>
    from vllm.lora.utils import is_moe_model
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/lora/utils.py", line 17, in <module>
    from vllm.lora.layers import (
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/lora/layers/__init__.py", line 15, in <module>
    from vllm.lora.layers.fused_moe import FusedMoE3DWithLoRA, FusedMoEWithLoRA
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/lora/layers/fused_moe.py", line 13, in <module>
    from vllm.model_executor.layers.fused_moe import FusedMoE
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/__init__.py", line 21, in <module>
    from vllm.model_executor.layers.fused_moe.layer import (
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/layer.py", line 50, in <module>
    from vllm.model_executor.layers.fused_moe.unquantized_fused_moe_method import (
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/unquantized_fused_moe_method.py", line 27, in <module>
    from vllm.model_executor.layers.fused_moe.oracle.unquantized import (
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/oracle/unquantized.py", line 14, in <module>
    from vllm.model_executor.layers.fused_moe.all2all_utils import (
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/all2all_utils.py", line 46, in <module>
    from .prepare_finalize.nixl_ep import (
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/prepare_finalize/nixl_ep.py", line 11, in <module>
    from vllm.distributed.device_communicators.all2all import NixlEPAll2AllManager
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/distributed/device_communicators/all2all.py", line 22, in <module>
    if has_flashinfer_nvlink_two_sided():
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/utils/flashinfer.py", line 189, in has_flashinfer_nvlink_two_sided
    mod = _get_submodule(module_name)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/utils/flashinfer.py", line 86, in _get_submodule
    return importlib.import_module(module_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/anaconda3/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/flashinfer/comm/__init__.py", line 1, in <module>
    from .cuda_ipc import CudaRTLibrary, create_shared_buffer, free_shared_buffer
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/flashinfer/comm/cuda_ipc.py", line 194, in <module>
    cudart = CudaRTLibrary()
             ^^^^^^^^^^^^^^^
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/flashinfer/comm/cuda_ipc.py", line 134, in __init__
    f = getattr(self.lib, func.name)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/anaconda3/lib/python3.12/ctypes/__init__.py", line 392, in __getattr__
    func = self.__getitem__(name)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/anaconda3/lib/python3.12/ctypes/__init__.py", line 397, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: /home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/tilelang/lib/libcudart_stub.so: undefined symbol: cudaDeviceReset
FAILED tests/backends/skyrl_train/gpu/gpu_ci/megatron/test_sft_packing_parity.py::test_sft_packing_cp_logprob_parity - ray.exceptions.ActorDiedError: The actor died because of an error raised in its creation task, ray::_ParityProbeWorkerBase.__init__() (pid=98386, ip=10.0.56.206, actor_id=1ef9255b16a4f2b3ce10a3e23a000000, repr=<tests.backends.skyrl_train.gpu.gpu_ci.megatron.test_sft_packing_parity.FunctionActorManager._create_fake_actor_class.<locals>.TemporaryActor object at 0x7c3159929d30>)
  File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
           ^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: The actor with name _ParityProbeWorkerBase failed to import on the worker. This may be because needed library dependencies are not installed in the worker environment:

ray::_ParityProbeWorkerBase.__init__() (pid=98386, ip=10.0.56.206, actor_id=1ef9255b16a4f2b3ce10a3e23a000000, repr=<tests.backends.skyrl_train.gpu.gpu_ci.megatron.test_sft_packing_parity.FunctionActorManager._create_fake_actor_class.<locals>.TemporaryActor object at 0x7c3159929d30>)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/tests/backends/skyrl_train/gpu/gpu_ci/megatron/test_sft_packing_parity.py", line 39, in <module>
    from tests.backends.skyrl_train.gpu.utils import ray_init_for_tests
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/tests/backends/skyrl_train/gpu/utils.py", line 36, in <module>
    from skyrl.backends.skyrl_train.inference_servers.setup import create_inference_servers
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/skyrl/backends/skyrl_train/inference_servers/setup.py", line 20, in <module>
    from .utils import (
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/skyrl/backends/skyrl_train/inference_servers/utils.py", line 7, in <module>
    from skyrl.backends.skyrl_train.inference_servers.new_inference_worker_wrap import (
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/skyrl/backends/skyrl_train/inference_servers/new_inference_worker_wrap.py", line 31, in <module>
    from skyrl.backends.skyrl_train.inference_servers.layerwise_reload import (
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/skyrl/backends/skyrl_train/inference_servers/layerwise_reload.py", line 37, in <module>
    from vllm.model_executor.model_loader.reload.meta import (
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/model_executor/model_loader/__init__.py", line 12, in <module>
    from vllm.model_executor.model_loader.bitsandbytes_loader import BitsAndBytesModelLoader
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/model_executor/model_loader/bitsandbytes_loader.py", line 24, in <module>
    from vllm.lora.utils import is_moe_model
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/lora/utils.py", line 17, in <module>
    from vllm.lora.layers import (
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/lora/layers/__init__.py", line 15, in <module>
    from vllm.lora.layers.fused_moe import FusedMoE3DWithLoRA, FusedMoEWithLoRA
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/lora/layers/fused_moe.py", line 13, in <module>
    from vllm.model_executor.layers.fused_moe import FusedMoE
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/__init__.py", line 21, in <module>
    from vllm.model_executor.layers.fused_moe.layer import (
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/layer.py", line 50, in <module>
    from vllm.model_executor.layers.fused_moe.unquantized_fused_moe_method import (
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/unquantized_fused_moe_method.py", line 27, in <module>
    from vllm.model_executor.layers.fused_moe.oracle.unquantized import (
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/oracle/unquantized.py", line 14, in <module>
    from vllm.model_executor.layers.fused_moe.all2all_utils import (
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/all2all_utils.py", line 46, in <module>
    from .prepare_finalize.nixl_ep import (
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/prepare_finalize/nixl_ep.py", line 11, in <module>
    from vllm.distributed.device_communicators.all2all import NixlEPAll2AllManager
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/distributed/device_communicators/all2all.py", line 22, in <module>
    if has_flashinfer_nvlink_two_sided():
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/utils/flashinfer.py", line 189, in has_flashinfer_nvlink_two_sided
    mod = _get_submodule(module_name)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/utils/flashinfer.py", line 86, in _get_submodule
    return importlib.import_module(module_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/anaconda3/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/flashinfer/comm/__init__.py", line 1, in <module>
    from .cuda_ipc import CudaRTLibrary, create_shared_buffer, free_shared_buffer
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/flashinfer/comm/cuda_ipc.py", line 194, in <module>
    cudart = CudaRTLibrary()
             ^^^^^^^^^^^^^^^
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/flashinfer/comm/cuda_ipc.py", line 134, in __init__
    f = getattr(self.lib, func.name)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/anaconda3/lib/python3.12/ctypes/__init__.py", line 392, in __getattr__
    func = self.__getitem__(name)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/anaconda3/lib/python3.12/ctypes/__init__.py", line 397, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: /home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/tilelang/lib/libcudart_stub.so: undefined symbol: cudaDeviceReset

seem to be crashes at import time due to the upgrade.

This is related to the fix for SKIP_TENSORS for Nemotron/ Mamba models, which is no longer needed since vllm-project/vllm#42481 has made it into vllm 0.23.0

SumanthRH added 2 commits June 17, 2026 23:01

vllm 0.23.0

05de2ce

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

Merge remote-tracking branch 'origin/main' into upgrade-vllm-0.23.0

eea532a

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH added run_train_gpu_ci run_train_megatron_gpu_ci run_train_old_inference_gpu_ci run_train_megatron_gpu_ci_models labels Jun 17, 2026

SumanthRH changed the title ~~[chore] Upgrade vllm to 0.23.0~~ [chore][DNM] Upgrade vllm to 0.23.0 Jun 17, 2026

SumanthRH marked this pull request as ready for review June 17, 2026 23:59

SumanthRH added run_train_gpu_ci run_train_megatron_gpu_ci run_train_megatron_gpu_ci_models run_train_old_inference_gpu_ci and removed run_train_gpu_ci run_train_megatron_gpu_ci run_train_megatron_gpu_ci_models run_train_old_inference_gpu_ci labels Jun 17, 2026

gemini-code-assist Bot reviewed Jun 18, 2026

View reviewed changes

rename the weight update APIs

020ec92

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

SumanthRH added run_train_gpu_ci run_train_megatron_gpu_ci run_train_megatron_gpu_ci_models run_train_old_inference_gpu_ci and removed run_train_gpu_ci run_train_megatron_gpu_ci run_train_megatron_gpu_ci_models run_train_old_inference_gpu_ci labels Jun 18, 2026

SumanthRH changed the title ~~[chore][DNM] Upgrade vllm to 0.23.0~~ [chore]Upgrade vllm to 0.23.0 Jun 18, 2026

x

2044995

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

SumanthRH commented Jun 18, 2026

View reviewed changes

SumanthRH and others added 2 commits June 18, 2026 02:07

Merge remote-tracking branch 'origin/main' into upgrade-vllm-0.23.0

b4f6049

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

migrate weight sync tests

1ef68fb

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH removed run_train_gpu_ci run_train_megatron_gpu_ci run_train_megatron_gpu_ci_models run_train_old_inference_gpu_ci labels Jun 18, 2026

x

d3fd990

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH added run_train_megatron_gpu_ci run_train_gpu_ci run_h100_gpu_ci Run H100 GPU CI labels Jun 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[chore]Upgrade vllm to 0.23.0#1800

[chore]Upgrade vllm to 0.23.0#1800
SumanthRH wants to merge 7 commits into
mainfrom
upgrade-vllm-0.23.0

SumanthRH commented Jun 17, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

SumanthRH commented Jun 18, 2026

Uh oh!

SumanthRH Jun 18, 2026

Uh oh!

SumanthRH commented Jun 18, 2026

Uh oh!

SumanthRH commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SumanthRH commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

SumanthRH commented Jun 18, 2026

Uh oh!

SumanthRH Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

SumanthRH commented Jun 18, 2026

Non-Megatron

Uh oh!

SumanthRH commented Jun 18, 2026

Megatron

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SumanthRH commented Jun 17, 2026 •

edited

Loading