Skip to content

[chore]Upgrade vllm to 0.23.0#1800

Open
SumanthRH wants to merge 7 commits into
mainfrom
upgrade-vllm-0.23.0
Open

[chore]Upgrade vllm to 0.23.0#1800
SumanthRH wants to merge 7 commits into
mainfrom
upgrade-vllm-0.23.0

Conversation

@SumanthRH

@SumanthRH SumanthRH commented Jun 17, 2026

Copy link
Copy Markdown
Member

No description provided.

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates several dependencies in pyproject.toml. Specifically, it upgrades vllm from 0.20.2 to 0.23.0 and the flashinfer packages (flashinfer-python, flashinfer-jit-cache, and flashinfer-cubin) from 0.6.8.post1 to 0.6.11.post2 across the fsdp and megatron environments. It also removes the constraint-dependencies block and introduces override-dependencies for the updated flashinfer packages to resolve version conflicts with Megatron-Bridge. There are no review comments, so I have no additional feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
@SumanthRH

Copy link
Copy Markdown
Member Author

We still can't migrate to the native vllm weight sync APIs because vllm-project/vllm#42577 is not merged yet. I am just hacking around the limitations right now and renaming the worker wrap functions to avoid conflicts with the native start_weight_update and finish_weight_update methods that I added in vllm-project/vllm#39212

@SumanthRH SumanthRH changed the title [chore][DNM] Upgrade vllm to 0.23.0 [chore]Upgrade vllm to 0.23.0 Jun 18, 2026
x
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
logger.info(f"Exporting `SKYRL_RAY_PG_TIMEOUT_IN_S` to ray runtime env: {pg_timeout}")
env_vars["SKYRL_RAY_PG_TIMEOUT_IN_S"] = pg_timeout

# Health-check timeout for the inference server actor. Forwarded so `VLLMServerActor.start`

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unrelated to upgrade but a good fix

SumanthRH and others added 2 commits June 18, 2026 02:07
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
@SumanthRH

Copy link
Copy Markdown
Member Author

GPU CI faillures for 020ec92:

Non-Megatron

=========================== short test summary info ============================
FAILED tests/backends/skyrl_train/gpu/gpu_ci/inference_servers/test_weight_sync.py::TestWeightUpdateFlow::test_update_weights_flow[no_pd] - aiohttp.client_exceptions.ClientResponseError: 500, message="Call to collective_rpc method failed: Worker failed with error 'start_weight_update must be called before update_weights.', please check the stack trace above for the root cause", url='http://10.0.74.103:8000/update_weights'
FAILED tests/backends/skyrl_train/gpu/gpu_ci/inference_servers/test_weight_sync.py::TestWeightUpdateFlow::test_update_weights_flow[pd_1P1D_non_colocated] - aiohttp.client_exceptions.ClientResponseError: 500, message="Call to collective_rpc method failed: Worker failed with error 'start_weight_update must be called before update_weights.', please check the stack trace above for the root cause", url='http://10.0.74.103:8100/update_weights'
FAILED tests/backends/skyrl_train/gpu/gpu_ci/inference_servers/test_weight_sync.py::TestColocatedIpcWeightUpdateFlow::test_update_weights_ipc - aiohttp.client_exceptions.ClientResponseError: 500, message="Call to collective_rpc method failed: Worker failed with error 'start_weight_update must be called before update_weights.', please check the stack trace above for the root cause", url='http://10.0.74.103:8000/update_weights'
FAILED tests/backends/skyrl_train/gpu/gpu_ci/inference_servers/test_weight_sync.py::test_worker_wrap_load_weights_preserves_moe_forward - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
FAILED tests/backends/skyrl_train/gpu/gpu_ci/inference_servers/test_weight_sync.py::test_worker_wrap_multichunk_reload_preserves_moe_forward - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
FAILED tests/backends/skyrl_train/gpu/gpu_ci/test_policy_local_engines_e2e.py::test_policy_local_engines_e2e[non_colocated_nccl_fsdp_vllm_dp] - ray.exceptions.RayTaskError(RuntimeError): ray::VLLMServerActor.start() (pid=82517, ip=10.0.74.103, actor_id=6729dfaed99d2436317174754b000000, repr=<skyrl.backends.skyrl_train.inference_servers.vllm_server_actor.VLLMServerActor object at 0x7ef13eccea20>)
  1. the failures tests/backends/skyrl_train/gpu/gpu_ci/inference_servers/test_weight_sync.py::test_worker_wrap_load_weights_preserves_moe_forward and tests/backends/skyrl_train/gpu/gpu_ci/inference_servers/test_weight_sync.py::test_worker_wrap_multichunk_reload_preserves_moe_forward are unrelated
  2. tests/backends/skyrl_train/gpu/gpu_ci/test_policy_local_engines_e2e.py::test_policy_local_engines_e2e[non_colocated_nccl_fsdp_vllm_dp] failed due to an Address already in use error: I was not able to repro on 4x h100s.
  3. Other failures are fixed by migrating the tests to use the new skyrl_start_weight_update and skyrl_finish_weight_update_methods: 1ef68fb

x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
@SumanthRH

Copy link
Copy Markdown
Member Author

Megatron

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/backends/skyrl_train/gpu/gpu_ci/megatron/test_megatron_extractor_consistency.py::test_megatron_extractor_iteration_order_consistency[qwen3_5_35b_a3b_mm_moe] - ray.exceptions.ActorDiedError: The actor died because of an error raised in its creation task, ray::_ProbeMegatronRefWorker.__init__() (pid=11084, ip=10.0.56.206, actor_id=711b2fe07b59b440f7ae6c8d03000000, repr=<tests.backends.skyrl_train.gpu.gpu_ci.megatron.test_megatron_extractor_consistency.FunctionActorManager._create_fake_actor_class.<locals>.TemporaryActor object at 0x7e5bfa710b00>)
  File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
           ^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: The actor with name _ProbeMegatronRefWorker failed to import on the worker. This may be because needed library dependencies are not installed in the worker environment:

ray::_ProbeMegatronRefWorker.__init__() (pid=11084, ip=10.0.56.206, actor_id=711b2fe07b59b440f7ae6c8d03000000, repr=<tests.backends.skyrl_train.gpu.gpu_ci.megatron.test_megatron_extractor_consistency.FunctionActorManager._create_fake_actor_class.<locals>.TemporaryActor object at 0x7e5bfa710b00>)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/tests/backends/skyrl_train/gpu/gpu_ci/megatron/test_megatron_extractor_consistency.py", line 36, in <module>
    from tests.backends.skyrl_train.gpu.utils import init_worker_with_type
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/tests/backends/skyrl_train/gpu/utils.py", line 36, in <module>
    from skyrl.backends.skyrl_train.inference_servers.setup import create_inference_servers
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/skyrl/backends/skyrl_train/inference_servers/setup.py", line 20, in <module>
    from .utils import (
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/skyrl/backends/skyrl_train/inference_servers/utils.py", line 7, in <module>
    from skyrl.backends.skyrl_train.inference_servers.new_inference_worker_wrap import (
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/skyrl/backends/skyrl_train/inference_servers/new_inference_worker_wrap.py", line 31, in <module>
    from skyrl.backends.skyrl_train.inference_servers.layerwise_reload import (
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/skyrl/backends/skyrl_train/inference_servers/layerwise_reload.py", line 37, in <module>
    from vllm.model_executor.model_loader.reload.meta import (
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/model_executor/model_loader/__init__.py", line 12, in <module>
    from vllm.model_executor.model_loader.bitsandbytes_loader import BitsAndBytesModelLoader
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/model_executor/model_loader/bitsandbytes_loader.py", line 24, in <module>
    from vllm.lora.utils import is_moe_model
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/lora/utils.py", line 17, in <module>
    from vllm.lora.layers import (
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/lora/layers/__init__.py", line 15, in <module>
    from vllm.lora.layers.fused_moe import FusedMoE3DWithLoRA, FusedMoEWithLoRA
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/lora/layers/fused_moe.py", line 13, in <module>
    from vllm.model_executor.layers.fused_moe import FusedMoE
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/__init__.py", line 21, in <module>
    from vllm.model_executor.layers.fused_moe.layer import (
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/layer.py", line 50, in <module>
    from vllm.model_executor.layers.fused_moe.unquantized_fused_moe_method import (
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/unquantized_fused_moe_method.py", line 27, in <module>
    from vllm.model_executor.layers.fused_moe.oracle.unquantized import (
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/oracle/unquantized.py", line 14, in <module>
    from vllm.model_executor.layers.fused_moe.all2all_utils import (
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/all2all_utils.py", line 46, in <module>
    from .prepare_finalize.nixl_ep import (
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/prepare_finalize/nixl_ep.py", line 11, in <module>
    from vllm.distributed.device_communicators.all2all import NixlEPAll2AllManager
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/distributed/device_communicators/all2all.py", line 22, in <module>
    if has_flashinfer_nvlink_two_sided():
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/utils/flashinfer.py", line 189, in has_flashinfer_nvlink_two_sided
    mod = _get_submodule(module_name)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/vllm/utils/flashinfer.py", line 86, in _get_submodule
    return importlib.import_module(module_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/anaconda3/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/flashinfer/comm/__init__.py", line 1, in <module>
    from .cuda_ipc import CudaRTLibrary, create_shared_buffer, free_shared_buffer
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/flashinfer/comm/cuda_ipc.py", line 194, in <module>
    cudart = CudaRTLibrary()
             ^^^^^^^^^^^^^^^
  File "/home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/flashinfer/comm/cuda_ipc.py", line 134, in __init__
    f = getattr(self.lib, func.name)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/anaconda3/lib/python3.12/ctypes/__init__.py", line 392, in __getattr__
    func = self.__getitem__(name)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/anaconda3/lib/python3.12/ctypes/__init__.py", line 397, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: /home/ray/.cache/uv/builds-v0/.tmpeD5Hm0/lib/python3.12/site-packages/tilelang/lib/libcudart_stub.so: undefined symbol: cudaDeviceReset
FAILED tests/backends/skyrl_train/gpu/gpu_ci/megatron/test_sft_packing_parity.py::test_sft_packing_cp_logprob_parity - ray.exceptions.ActorDiedError: The actor died because of an error raised in its creation task, ray::_ParityProbeWorkerBase.__init__() (pid=98386, ip=10.0.56.206, actor_id=1ef9255b16a4f2b3ce10a3e23a000000, repr=<tests.backends.skyrl_train.gpu.gpu_ci.megatron.test_sft_packing_parity.FunctionActorManager._create_fake_actor_class.<locals>.TemporaryActor object at 0x7c3159929d30>)
  File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/home/ray/anaconda3/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
           ^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: The actor with name _ParityProbeWorkerBase failed to import on the worker. This may be because needed library dependencies are not installed in the worker environment:

ray::_ParityProbeWorkerBase.__init__() (pid=98386, ip=10.0.56.206, actor_id=1ef9255b16a4f2b3ce10a3e23a000000, repr=<tests.backends.skyrl_train.gpu.gpu_ci.megatron.test_sft_packing_parity.FunctionActorManager._create_fake_actor_class.<locals>.TemporaryActor object at 0x7c3159929d30>)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/tests/backends/skyrl_train/gpu/gpu_ci/megatron/test_sft_packing_parity.py", line 39, in <module>
    from tests.backends.skyrl_train.gpu.utils import ray_init_for_tests
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/tests/backends/skyrl_train/gpu/utils.py", line 36, in <module>
    from skyrl.backends.skyrl_train.inference_servers.setup import create_inference_servers
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/skyrl/backends/skyrl_train/inference_servers/setup.py", line 20, in <module>
    from .utils import (
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/skyrl/backends/skyrl_train/inference_servers/utils.py", line 7, in <module>
    from skyrl.backends.skyrl_train.inference_servers.new_inference_worker_wrap import (
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/skyrl/backends/skyrl_train/inference_servers/new_inference_worker_wrap.py", line 31, in <module>
    from skyrl.backends.skyrl_train.inference_servers.layerwise_reload import (
  File "/tmp/ray/session_2026-06-18_01-57-00_051060_3905/runtime_resources/working_dir_files/s3_anyscale-production-data-cld-hxkifz7xa22mwicp21nzkds1lw_org_xc6lv84h3d7m9dljcc17esfw2i_cld_hxkifz7xa22mwicp21nzkds1lw_runtime_env_packages_pkg_ac7e43fb69029bd7138bb6093bbf4e84/skyrl/backends/skyrl_train/inference_servers/layerwise_reload.py", line 37, in <module>
    from vllm.model_executor.model_loader.reload.meta import (
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/model_executor/model_loader/__init__.py", line 12, in <module>
    from vllm.model_executor.model_loader.bitsandbytes_loader import BitsAndBytesModelLoader
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/model_executor/model_loader/bitsandbytes_loader.py", line 24, in <module>
    from vllm.lora.utils import is_moe_model
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/lora/utils.py", line 17, in <module>
    from vllm.lora.layers import (
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/lora/layers/__init__.py", line 15, in <module>
    from vllm.lora.layers.fused_moe import FusedMoE3DWithLoRA, FusedMoEWithLoRA
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/lora/layers/fused_moe.py", line 13, in <module>
    from vllm.model_executor.layers.fused_moe import FusedMoE
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/__init__.py", line 21, in <module>
    from vllm.model_executor.layers.fused_moe.layer import (
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/layer.py", line 50, in <module>
    from vllm.model_executor.layers.fused_moe.unquantized_fused_moe_method import (
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/unquantized_fused_moe_method.py", line 27, in <module>
    from vllm.model_executor.layers.fused_moe.oracle.unquantized import (
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/oracle/unquantized.py", line 14, in <module>
    from vllm.model_executor.layers.fused_moe.all2all_utils import (
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/all2all_utils.py", line 46, in <module>
    from .prepare_finalize.nixl_ep import (
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/prepare_finalize/nixl_ep.py", line 11, in <module>
    from vllm.distributed.device_communicators.all2all import NixlEPAll2AllManager
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/distributed/device_communicators/all2all.py", line 22, in <module>
    if has_flashinfer_nvlink_two_sided():
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/utils/flashinfer.py", line 189, in has_flashinfer_nvlink_two_sided
    mod = _get_submodule(module_name)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/vllm/utils/flashinfer.py", line 86, in _get_submodule
    return importlib.import_module(module_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/anaconda3/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/flashinfer/comm/__init__.py", line 1, in <module>
    from .cuda_ipc import CudaRTLibrary, create_shared_buffer, free_shared_buffer
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/flashinfer/comm/cuda_ipc.py", line 194, in <module>
    cudart = CudaRTLibrary()
             ^^^^^^^^^^^^^^^
  File "/home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/flashinfer/comm/cuda_ipc.py", line 134, in __init__
    f = getattr(self.lib, func.name)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/anaconda3/lib/python3.12/ctypes/__init__.py", line 392, in __getattr__
    func = self.__getitem__(name)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/anaconda3/lib/python3.12/ctypes/__init__.py", line 397, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: /home/ray/.cache/uv/builds-v0/.tmppbyIOh/lib/python3.12/site-packages/tilelang/lib/libcudart_stub.so: undefined symbol: cudaDeviceReset

seem to be crashes at import time due to the upgrade.

This is related to the fix for SKIP_TENSORS for Nemotron/ Mamba models, which is no longer needed since vllm-project/vllm#42481 has made it into vllm 0.23.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant