Skip to content

[Core] Cast multimodal input in hf processor #18862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

lgeiger
Copy link
Contributor

@lgeiger lgeiger commented May 28, 2025

This is a follow up to #18756 and instead directly does the multimodal input casting as part of the huggingface preprocessing.
I think this is a bit cleaner and has two advantages: It moves the blocking casting/copy off the main thread and now does the conversion before serialisation which also reduces the amount of data that needs to be serialised and de-serialised. @DarkLight1337 Let me know if you see any disadvantages of doing this.

Before:
Screenshot 2025-05-28 at 23 48 02
After:
Screenshot 2025-05-28 at 23 48 46

I've done some quick benchmark and for some models I'm seeing very small improvement in throughput (0.5%-0.7%).

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added multi-modality Related to multi-modality (#4194) speculative-decoding v1 tpu Related to Google TPUs labels May 28, 2025
@DarkLight1337
Copy link
Member

Overall this approach does make more sense than what I originally did in #18756, thanks!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) May 29, 2025 14:42
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label May 29, 2025
@DarkLight1337
Copy link
Member

DarkLight1337 commented May 29, 2025

Can you fix the failing tests? Looks like you need to import torch for real inside the function

auto-merge was automatically disabled May 29, 2025 16:46

Head branch was pushed to by a user without write access

@lgeiger lgeiger force-pushed the non-blocking-casting branch from e61670b to 0292449 Compare May 29, 2025 16:47
@lgeiger
Copy link
Contributor Author

lgeiger commented May 29, 2025

Can you fix the failing tests? Looks like you need to import torch for real inside the function

Ah sorry, should be fixed in 0292449. Looks like the type annotation needs to be a string to support older versions of python.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) May 29, 2025 16:50
auto-merge was automatically disabled May 29, 2025 18:50

Head branch was pushed to by a user without write access

@lgeiger
Copy link
Contributor Author

lgeiger commented May 29, 2025

Looks like not all items in the dict are tensors which breaks CI. 9d0d47c should fix that.

@DarkLight1337
Copy link
Member

Looks like V1 test is hanging in this PR, can you investigate it?

@lgeiger lgeiger force-pushed the non-blocking-casting branch from 9d0d47c to b48017f Compare May 30, 2025 09:35
@lgeiger
Copy link
Contributor Author

lgeiger commented May 30, 2025

Looks like V1 test is hanging in this PR, can you investigate it?

I rebased onto main, let's see if this fixes it or gives some more actionable error message

@lgeiger
Copy link
Contributor Author

lgeiger commented May 30, 2025

Looks like the v1-tests are still failing. I'm waiting for huggingface to approve my model access request, then I can try and reproduce locally

@mergify mergify bot added the ci/build label May 30, 2025
@lgeiger lgeiger force-pushed the non-blocking-casting branch from 95deab1 to b1344b4 Compare May 30, 2025 15:57
@lgeiger
Copy link
Contributor Author

lgeiger commented May 30, 2025

Looks like the deadlocking tests are due to the fact that vllm uses fork as a default multiprocessing method.

On main this test already throws a deprecation warning for Python 3.12

tests/v1/engine/test_async_llm.py::test_load[engine_args1-prompt1-RequestOutputKind.FINAL_ONLY]
  /usr/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=5020) is multi-threaded, use of fork() may lead to deadlocks in the child.
    self.pid = os.fork()

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.htm

and huggingface tokenizers already complain about not being compatible with fork

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

On main this doesn't cause any deadlocks but it looks like with this change it breaks. I can reproduce this locally as well.
Running the tests one by one makes them pass:

pytest "tests/v1/engine/test_async_llm.py::test_load[engine_args0-Hello my name is Robert and-RequestOutputKind.DELTA]"
pytest "tests/v1/engine/test_async_llm.py::test_load[engine_args0-Hello my name is Robert and-RequestOutputKind.FINAL_ONLY]"
pytest tests/v1/engine/test_async_llm.py::test_load[engine_args1-prompt1-RequestOutputKind.DELTA]
pytest tests/v1/engine/test_async_llm.py::test_load[engine_args1-prompt1-RequestOutputKind.FINAL_ONLY]

Whereas running them in one go deadlocks:

pytest tests/v1/engine/test_async_llm.py::test_load

Setting VLLM_WORKER_MULTIPROC_METHOD=spawn or VLLM_WORKER_MULTIPROC_METHOD=forkserver fixes the deadlock.

Not sure how to best deal with this. I guess switching the default vllm multiprocessing method might be a larger change.
For now I added VLLM_WORKER_MULTIPROC_METHOD=spawn to the failing tests in b1344b4

@DarkLight1337
Copy link
Member

Why would this PR cause the tests to fail when they weren't before though?

@lgeiger
Copy link
Contributor Author

lgeiger commented May 30, 2025

Why would this PR cause the tests to fail when they weren't before though?

You're right it's surprising especially since we don't actually change where the huggingface preprocessor is called.
Even on main the tokenizer warns about Disabling parallelism to avoid deadlocks... when using fork so maybe we just got lucky then 🤷

@DarkLight1337
Copy link
Member

cc @njhill any idea?

@njhill
Copy link
Member

njhill commented May 30, 2025

The tokenizers warning is a red herring and shouldn't be an issue. I don't think we should change the mp method in the test to workaround.

If you can repro locally, you could check where things may be stuck by running with env var PYTHONFAULTHANDLER=1 and then sending a SIGABRT to the front-end and back-end procs, which will dump all of the thread stacks.

@lgeiger
Copy link
Contributor Author

lgeiger commented May 31, 2025

If you can repro locally, you could check where things may be stuck by running with env var PYTHONFAULTHANDLER=1 and then sending a SIGABRT to the front-end and back-end procs, which will dump all of the thread stacks.

Here are the dumps from both processes:

Current thread 0x00007b3bc3ce3080 (most recent call first):
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/transformers/video_processing_utils.py", line 240 in _prepare_input_videos
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/transformers/video_processing_utils.py", line 263 in preprocess
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/transformers/video_processing_utils.py", line 197 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/transformers/models/qwen2_vl/processing_qwen2_vl.py", line 146 in __call__
  File "/home/ubuntu/vllm/vllm/inputs/registry.py", line 162 in call_hf_processor
  File "/home/ubuntu/vllm/vllm/model_executor/models/qwen2_vl.py", line 1018 in _call_hf_processor
  File "/home/ubuntu/vllm/vllm/multimodal/processing.py", line 1290 in _apply_hf_processor_text_mm
  File "/home/ubuntu/vllm/vllm/multimodal/processing.py", line 1360 in _apply_hf_processor_mm_only
  File "/home/ubuntu/vllm/vllm/multimodal/processing.py", line 1399 in _apply_hf_processor_main
  File "/home/ubuntu/vllm/vllm/multimodal/processing.py", line 1552 in _cached_apply_hf_processor
  File "/home/ubuntu/vllm/vllm/multimodal/processing.py", line 1786 in apply
  File "/home/ubuntu/vllm/vllm/multimodal/profiling.py", line 168 in _get_dummy_mm_inputs
  File "/home/ubuntu/vllm/vllm/multimodal/profiling.py", line 255 in get_mm_max_tokens
  File "/home/ubuntu/vllm/vllm/multimodal/registry.py", line 131 in get_max_tokens_per_item_by_modality
  File "/home/ubuntu/vllm/vllm/multimodal/registry.py", line 157 in get_max_tokens_per_item_by_nonzero_modality
  File "/home/ubuntu/vllm/vllm/v1/core/encoder_cache_manager.py", line 124 in _compute_encoder_budget_multimodal
  File "/home/ubuntu/vllm/vllm/v1/core/encoder_cache_manager.py", line 94 in compute_encoder_budget
  File "/home/ubuntu/vllm/vllm/v1/worker/gpu_model_runner.py", line 127 in __init__
  File "/home/ubuntu/vllm/vllm/v1/worker/gpu_worker.py", line 144 in init_device
  File "/home/ubuntu/vllm/vllm/worker/worker_base.py", line 604 in init_device
  File "/home/ubuntu/vllm/vllm/utils.py", line 2601 in run_method
  File "/home/ubuntu/vllm/vllm/executor/uniproc_executor.py", line 56 in collective_rpc
  File "/home/ubuntu/vllm/vllm/executor/uniproc_executor.py", line 46 in _init_executor
  File "/home/ubuntu/vllm/vllm/executor/executor_base.py", line 52 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 74 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 398 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 499 in run_engine_core
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108 in run
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314 in _bootstrap
  File "/usr/lib/python3.12/multiprocessing/popen_fork.py", line 71 in _launch
  File "/usr/lib/python3.12/multiprocessing/popen_fork.py", line 19 in __init__
  File "/usr/lib/python3.12/multiprocessing/context.py", line 282 in _Popen
  File "/usr/lib/python3.12/multiprocessing/process.py", line 121 in start
  File "/home/ubuntu/vllm/vllm/v1/utils.py", line 223 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/core_client.py", line 461 in _init_engines_direct
  File "/home/ubuntu/vllm/vllm/v1/engine/core_client.py", line 404 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/core_client.py", line 693 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/async_llm.py", line 126 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/async_llm.py", line 191 in from_engine_args
  File "/home/ubuntu/vllm/tests/v1/engine/test_async_llm.py", line 95 in test_load
  File "/usr/lib/python3.12/asyncio/events.py", line 88 in _run
  File "/usr/lib/python3.12/asyncio/base_events.py", line 1987 in _run_once
  File "/usr/lib/python3.12/asyncio/base_events.py", line 641 in run_forever
  File "/usr/lib/python3.12/asyncio/base_events.py", line 674 in run_until_complete
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pytest_asyncio/plugin.py", line 773 in inner
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/python.py", line 159 in pytest_pyfunc_call
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/python.py", line 1627 in runtest
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pytest_asyncio/plugin.py", line 508 in runtest
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 174 in pytest_runtest_call
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 242 in <lambda>
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 341 in from_call
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 241 in call_and_report
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 132 in runtestprotocol
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 113 in pytest_runtest_protocol
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 362 in pytest_runtestloop
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 337 in _main
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 283 in wrap_session
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 330 in pytest_cmdline_main
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/config/__init__.py", line 175 in main
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/config/__init__.py", line 201 in console_main
  File "/home/ubuntu/vllm/.venv/bin/pytest", line 10 in <module>

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, yaml._yaml, PIL._imaging, regex._regex, markupsafe._speedups, sklearn.__check_build._check_build, psutil._psutil_linux, psutil._psutil_posix, scipy._lib._ccallback_c, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._cython_nnls, scipy._lib._uarray._uarray, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.interpolate._bspl, scipy.special.cython_special, scipy.stats._stats, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.ndimage._nd_image, scipy.ndimage._rank_filter_1d, _ni_label, scipy.ndimage._ni_label, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, zmq.backend.cython._zmq, PIL._imagingft, msgspec._core, _cffi_backend, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, msgpack._cmsgpack, google._upb._message, setproctitle, uvloop.loop, ray._raylet, sentencepiece._sentencepiece, PIL._imagingmath, vllm.cumem_allocator, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.typing.builtins.itertools, numba.cpython.builtins.math, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box (total: 205)
Thread 0x00007b3aa61fe6c0 (most recent call first):
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/zmq/utils/garbage.py", line 46 in run
  File "/usr/lib/python3.12/threading.py", line 1073 in _bootstrap_inner
  File "/usr/lib/python3.12/threading.py", line 1030 in _bootstrap

Thread 0x00007b3bc3ce3080 (most recent call first):
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/zmq/sugar/poll.py", line 106 in poll
  File "/home/ubuntu/vllm/vllm/v1/utils.py", line 311 in wait_for_engine_startup
  File "/home/ubuntu/vllm/vllm/v1/engine/core_client.py", line 488 in _wait_for_engine_startup
  File "/home/ubuntu/vllm/vllm/v1/engine/core_client.py", line 473 in _init_engines_direct
  File "/home/ubuntu/vllm/vllm/v1/engine/core_client.py", line 404 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/core_client.py", line 693 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/async_llm.py", line 126 in __init__
  File "/home/ubuntu/vllm/vllm/v1/engine/async_llm.py", line 191 in from_engine_args
  File "/home/ubuntu/vllm/tests/v1/engine/test_async_llm.py", line 95 in test_load
  File "/usr/lib/python3.12/asyncio/events.py", line 88 in _run
  File "/usr/lib/python3.12/asyncio/base_events.py", line 1987 in _run_once
  File "/usr/lib/python3.12/asyncio/base_events.py", line 641 in run_forever
  File "/usr/lib/python3.12/asyncio/base_events.py", line 674 in run_until_complete
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pytest_asyncio/plugin.py", line 773 in inner
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/python.py", line 159 in pytest_pyfunc_call
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/python.py", line 1627 in runtest
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pytest_asyncio/plugin.py", line 508 in runtest
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 174 in pytest_runtest_call
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 242 in <lambda>
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 341 in from_call
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 241 in call_and_report
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 132 in runtestprotocol
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/runner.py", line 113 in pytest_runtest_protocol
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 362 in pytest_runtestloop
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 337 in _main
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 283 in wrap_session
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/main.py", line 330 in pytest_cmdline_main
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/config/__init__.py", line 175 in main
  File "/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/_pytest/config/__init__.py", line 201 in console_main
  File "/home/ubuntu/vllm/.venv/bin/pytest", line 10 in <module>

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, yaml._yaml, PIL._imaging, regex._regex, markupsafe._speedups, sklearn.__check_build._check_build, psutil._psutil_linux, psutil._psutil_posix, scipy._lib._ccallback_c, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._cython_nnls, scipy._lib._uarray._uarray, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.interpolate._bspl, scipy.special.cython_special, scipy.stats._stats, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.ndimage._nd_image, scipy.ndimage._rank_filter_1d, _ni_label, scipy.ndimage._ni_label, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, zmq.backend.cython._zmq, PIL._imagingft, msgspec._core, _cffi_backend, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, msgpack._cmsgpack, google._upb._message, setproctitle, uvloop.loop, ray._raylet, sentencepiece._sentencepiece, PIL._imagingmath (total: 195)
Aborted (core dumped)

lgeiger added 4 commits May 31, 2025 01:41
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
@lgeiger lgeiger force-pushed the non-blocking-casting branch from b1344b4 to 5d5c4b4 Compare May 31, 2025 01:42
@njhill
Copy link
Member

njhill commented May 31, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding tpu Related to Google TPUs v1
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants