idefics2: Sizes of tensors must match except in dimension 0. Expected size 448 but got size 447 for tensor number 2 in the list. #2056

pseudotensor · 2024-06-12T02:42:03Z

System Info

tgi 2.0.4

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Unclear cause at moment.

idefics2 failure for some images

2024-06-12T02:32:54.141044Z ERROR text_generation_launcher: Method Prefill encountered an error.
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 90, in serve
    server.serve(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 257, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
    return await self.intercept(
> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/interceptor.py", line 21, in intercept
    return await response
  File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 82, in _unary_interceptor
    raise error
  File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 73, in _unary_interceptor
    return await behavior(request_or_iterator, context)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 129, in Prefill
    batch = self.model.batch_type.from_pb_processor(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/vlm_causal_lm.py", line 235, in from_pb_processor
    batch_tokenized_inputs, image_inputs = cls.batch_tokenized_inputs(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/vlm_causal_lm.py", line 208, in batch_tokenized_inputs
    "pixel_values": torch.cat(
RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 448 but got size 447 for tensor number 2 in the list.

2024-06-12T02:32:54.141401Z ERROR health:health:health: text_generation_client: router/client/src/lib.rs:33: Server error: CANCELLED
2024-06-12T02:32:54.261535Z ERROR batch{batch_size=1}:prefill:prefill{id=4 size=1}:prefill{id=4 size=1}: text_generation_client: router/client/src/lib.rs:33: Server error: CANCELLED
2024-06-12T02:32:54.758240Z ERROR batch{batch_size=1}:prefill:clear_cache{batch_id=Some(4)}:clear_cache{batch_id=Some(4)}: text_generation_client: router/client/src/lib.rs:33: Server error: transport error
2024-06-12T02:32:54.758272Z ERROR compat_generate{default_return_full_text=false compute_type=Extension(ComputeType("1-nvidia-h100-80gb-hbm3"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.01), repetition_penalty: Some(1.07), frequency_penalty: None, top_k: Some(1), top_p: Some(0.999), typical_p: Some(0.95), do_sample: false, max_new_tokens: Some(1024), return_full_text: Some(false), stop: ["<end_of_utterance>", "</s>", "Assistant:", "User:"], truncate: None, watermark: false, details: true, decoder_input_details: false, seed: Some(1), top_n_tokens: None, grammar: None }}:async_stream:generate_stream:infer:send_error: text_generation_router::infer: router/src/infer.rs:865: Request failed during generation: Server error: CANCELLED
2024-06-12T02:32:54.876523Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:658: UserWarning: You are using a Backend <class 'text_generation_server.utils.dist.FakeGroup'> as a ProcessGroup. This usage is deprecated since PyTorch 2.0. Please use a public API of PyTorch Distributed instead.
  warnings.warn(
Exception ignored in: <function Server.__del__ at 0x70cf9f7195a0>
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/grpc/aio/_server.py", line 194, in __del__
    cygrpc.schedule_coro_threadsafe(
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/common.pyx.pxi", line 120, in grpc._cython.cygrpc.schedule_coro_threadsafe
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/common.pyx.pxi", line 112, in grpc._cython.cygrpc.schedule_coro_threadsafe
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 436, in create_task
    self._check_closed()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
sys:1: RuntimeWarning: coroutine 'AioServer.shutdown' was never awaited
Task exception was never retrieved
future: <Task finished name='HandleExceptions[/generate.v2.TextGenerationService/Prefill]' coro=<<coroutine without __name__>()> exception=SystemExit(1)>
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/interceptor.py", line 21, in intercept
    return await response
  File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 82, in _unary_interceptor
    raise error
  File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 73, in _unary_interceptor
    return await behavior(request_or_iterator, context)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 129, in Prefill
    batch = self.model.batch_type.from_pb_processor(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/vlm_causal_lm.py", line 235, in from_pb_processor
    batch_tokenized_inputs, image_inputs = cls.batch_tokenized_inputs(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/vlm_causal_lm.py", line 208, in batch_tokenized_inputs
    "pixel_values": torch.cat(
RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 448 but got size 447 for tensor number 2 in the list.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 90, in serve
    server.serve(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 257, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 702, in _handle_exceptions
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 689, in grpc._cython.cygrpc._handle_exceptions
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 821, in _handle_rpc
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 554, in _handle_unary_unary_rpc
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 408, in _finish_handler_with_unary_response
  File "/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
    return await self.intercept(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/interceptor.py", line 28, in intercept
    exit(1)
  File "/opt/conda/lib/python3.10/_sitebuiltins.py", line 26, in __call__
    raise SystemExit(code)
SystemExit: 1 rank=0
2024-06-12T02:32:54.970948Z ERROR text_generation_launcher: Shard 0 crashed
2024-06-12T02:32:54.970957Z  INFO text_generation_launcher: Terminating webserver
2024-06-12T02:32:54.970968Z  INFO text_generation_launcher: Waiting for webserver to gracefully shutdown
2024-06-12T02:32:54.971025Z  INFO text_generation_router::server: router/src/server.rs:1739: signal received, starting graceful shutdown
2024-06-12T02:32:55.071044Z  INFO text_generation_launcher: webserver terminated
2024-06-12T02:32:55.071051Z  INFO text_generation_launcher: Shutting down shards
Error: ShardFailed
2024-06-12T02:32:55.769235Z  INFO text_generation_launcher: Args {
    model_id: "HuggingFaceM4/idefics2-8b",
    revision: None,
    validation_workers: 2,
    sharded: None,
    num_shard: Some(
        1,
    ),
    quantize: None,
    speculate: None,
    dtype: None,
    trust_remote_code: true,
    max_concurrent_requests: 128,
    max_best_of: 2,
    max_stop_sequences: 10,
    max_top_n_tokens: 5,
    max_input_tokens: None,
    max_input_length: Some(
        4096,
    ),
    max_total_tokens: Some(
        8192,
    ),
    waiting_served_ratio: 0.3,
    max_batch_prefill_tokens: Some(
        32768,
    ),
    max_batch_total_tokens: None,
    max_waiting_tokens: 20,
    max_batch_size: None,
    cuda_graphs: None,
    hostname: "d150871c26df",
    port: 80,
    shard_uds_path: "/tmp/text-generation-server",
    master_addr: "localhost",
    master_port: 29500,
    huggingface_hub_cache: Some(
        "/data",
    ),
    weights_cache_override: None,
    disable_custom_kernels: false,
    cuda_memory_fraction: 1.0,
    rope_scaling: None,
    rope_factor: None,
    json_output: false,
    otlp_endpoint: None,
    cors_allow_origin: [],
    watermark_gamma: None,
    watermark_delta: None,
    ngrok: false,
    ngrok_authtoken: None,
    ngrok_edge: None,
    tokenizer_config_path: None,
    disable_grammar_support: false,
    env: false,
    max_client_batch_size: 4,
}

Expected behavior

no failure

The text was updated successfully, but these errors were encountered:

LysandreJik · 2024-06-13T07:56:13Z

Hey @pseudotensor, thanks a lot for opening an issue! Would you mind sharing here the command that led to this error, maybe alongside an image (or a batch of images) that lead to this problem? I'd be happy to explore this issue if you can help me out here.

When a batch contained images if different sizes during prefill, the server would fail (see e.g. #2056). Images were processed separately and then concatenated. However, this can fail for images with different sizes. Fix this by preprocessing all images in the batch together, so that the image processor can ensure that all image tensors have compatible sizes.

tctrautman · 2024-06-14T21:13:11Z

I'm seeing the same error when I pass multiple images with different dimensions to the /generate endpoint. Thankfully it looks like @danieldk's #2065 should fix this!

When a batch contained images if different sizes during prefill, the server would fail (see e.g. #2056). Images were processed separately and then concatenated. However, this can fail for images with different sizes. Fix this by preprocessing all images in the batch together, so that the image processor can ensure that all image tensors have compatible sizes.

When a batch contained images if different sizes during prefill, the server would fail (see e.g. huggingface#2056). Images were processed separately and then concatenated. However, this can fail for images with different sizes. Fix this by preprocessing all images in the batch together, so that the image processor can ensure that all image tensors have compatible sizes.

danieldk mentioned this issue Jun 13, 2024

Support different image sizes in prefill in VLMs #2065

Merged

5 tasks

danieldk closed this as completed in #2065 Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

idefics2: Sizes of tensors must match except in dimension 0. Expected size 448 but got size 447 for tensor number 2 in the list. #2056

idefics2: Sizes of tensors must match except in dimension 0. Expected size 448 but got size 447 for tensor number 2 in the list. #2056

pseudotensor commented Jun 12, 2024

LysandreJik commented Jun 13, 2024

tctrautman commented Jun 14, 2024

idefics2: Sizes of tensors must match except in dimension 0. Expected size 448 but got size 447 for tensor number 2 in the list. #2056

idefics2: Sizes of tensors must match except in dimension 0. Expected size 448 but got size 447 for tensor number 2 in the list. #2056

Comments

pseudotensor commented Jun 12, 2024

System Info

Information

Tasks

Reproduction

Expected behavior

LysandreJik commented Jun 13, 2024

tctrautman commented Jun 14, 2024