Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idefics2: Sizes of tensors must match except in dimension 0. Expected size 448 but got size 447 for tensor number 2 in the list. #2056

Closed
1 of 4 tasks
pseudotensor opened this issue Jun 12, 2024 · 2 comments · Fixed by #2065

Comments

@pseudotensor
Copy link

System Info

tgi 2.0.4

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Unclear cause at moment.

idefics2 failure for some images

2024-06-12T02:32:54.141044Z ERROR text_generation_launcher: Method Prefill encountered an error.
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 90, in serve
    server.serve(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 257, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
    return await self.intercept(
> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/interceptor.py", line 21, in intercept
    return await response
  File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 82, in _unary_interceptor
    raise error
  File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 73, in _unary_interceptor
    return await behavior(request_or_iterator, context)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 129, in Prefill
    batch = self.model.batch_type.from_pb_processor(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/vlm_causal_lm.py", line 235, in from_pb_processor
    batch_tokenized_inputs, image_inputs = cls.batch_tokenized_inputs(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/vlm_causal_lm.py", line 208, in batch_tokenized_inputs
    "pixel_values": torch.cat(
RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 448 but got size 447 for tensor number 2 in the list.

2024-06-12T02:32:54.141401Z ERROR health:health:health: text_generation_client: router/client/src/lib.rs:33: Server error: CANCELLED
2024-06-12T02:32:54.261535Z ERROR batch{batch_size=1}:prefill:prefill{id=4 size=1}:prefill{id=4 size=1}: text_generation_client: router/client/src/lib.rs:33: Server error: CANCELLED
2024-06-12T02:32:54.758240Z ERROR batch{batch_size=1}:prefill:clear_cache{batch_id=Some(4)}:clear_cache{batch_id=Some(4)}: text_generation_client: router/client/src/lib.rs:33: Server error: transport error
2024-06-12T02:32:54.758272Z ERROR compat_generate{default_return_full_text=false compute_type=Extension(ComputeType("1-nvidia-h100-80gb-hbm3"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.01), repetition_penalty: Some(1.07), frequency_penalty: None, top_k: Some(1), top_p: Some(0.999), typical_p: Some(0.95), do_sample: false, max_new_tokens: Some(1024), return_full_text: Some(false), stop: ["<end_of_utterance>", "</s>", "Assistant:", "User:"], truncate: None, watermark: false, details: true, decoder_input_details: false, seed: Some(1), top_n_tokens: None, grammar: None }}:async_stream:generate_stream:infer:send_error: text_generation_router::infer: router/src/infer.rs:865: Request failed during generation: Server error: CANCELLED
2024-06-12T02:32:54.876523Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:658: UserWarning: You are using a Backend <class 'text_generation_server.utils.dist.FakeGroup'> as a ProcessGroup. This usage is deprecated since PyTorch 2.0. Please use a public API of PyTorch Distributed instead.
  warnings.warn(
Exception ignored in: <function Server.__del__ at 0x70cf9f7195a0>
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/grpc/aio/_server.py", line 194, in __del__
    cygrpc.schedule_coro_threadsafe(
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/common.pyx.pxi", line 120, in grpc._cython.cygrpc.schedule_coro_threadsafe
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/common.pyx.pxi", line 112, in grpc._cython.cygrpc.schedule_coro_threadsafe
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 436, in create_task
    self._check_closed()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
sys:1: RuntimeWarning: coroutine 'AioServer.shutdown' was never awaited
Task exception was never retrieved
future: <Task finished name='HandleExceptions[/generate.v2.TextGenerationService/Prefill]' coro=<<coroutine without __name__>()> exception=SystemExit(1)>
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/interceptor.py", line 21, in intercept
    return await response
  File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 82, in _unary_interceptor
    raise error
  File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 73, in _unary_interceptor
    return await behavior(request_or_iterator, context)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 129, in Prefill
    batch = self.model.batch_type.from_pb_processor(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/vlm_causal_lm.py", line 235, in from_pb_processor
    batch_tokenized_inputs, image_inputs = cls.batch_tokenized_inputs(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/vlm_causal_lm.py", line 208, in batch_tokenized_inputs
    "pixel_values": torch.cat(
RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 448 but got size 447 for tensor number 2 in the list.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 90, in serve
    server.serve(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 257, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 702, in _handle_exceptions
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 689, in grpc._cython.cygrpc._handle_exceptions
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 821, in _handle_rpc
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 554, in _handle_unary_unary_rpc
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi", line 408, in _finish_handler_with_unary_response
  File "/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
    return await self.intercept(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/interceptor.py", line 28, in intercept
    exit(1)
  File "/opt/conda/lib/python3.10/_sitebuiltins.py", line 26, in __call__
    raise SystemExit(code)
SystemExit: 1 rank=0
2024-06-12T02:32:54.970948Z ERROR text_generation_launcher: Shard 0 crashed
2024-06-12T02:32:54.970957Z  INFO text_generation_launcher: Terminating webserver
2024-06-12T02:32:54.970968Z  INFO text_generation_launcher: Waiting for webserver to gracefully shutdown
2024-06-12T02:32:54.971025Z  INFO text_generation_router::server: router/src/server.rs:1739: signal received, starting graceful shutdown
2024-06-12T02:32:55.071044Z  INFO text_generation_launcher: webserver terminated
2024-06-12T02:32:55.071051Z  INFO text_generation_launcher: Shutting down shards
Error: ShardFailed
2024-06-12T02:32:55.769235Z  INFO text_generation_launcher: Args {
    model_id: "HuggingFaceM4/idefics2-8b",
    revision: None,
    validation_workers: 2,
    sharded: None,
    num_shard: Some(
        1,
    ),
    quantize: None,
    speculate: None,
    dtype: None,
    trust_remote_code: true,
    max_concurrent_requests: 128,
    max_best_of: 2,
    max_stop_sequences: 10,
    max_top_n_tokens: 5,
    max_input_tokens: None,
    max_input_length: Some(
        4096,
    ),
    max_total_tokens: Some(
        8192,
    ),
    waiting_served_ratio: 0.3,
    max_batch_prefill_tokens: Some(
        32768,
    ),
    max_batch_total_tokens: None,
    max_waiting_tokens: 20,
    max_batch_size: None,
    cuda_graphs: None,
    hostname: "d150871c26df",
    port: 80,
    shard_uds_path: "/tmp/text-generation-server",
    master_addr: "localhost",
    master_port: 29500,
    huggingface_hub_cache: Some(
        "/data",
    ),
    weights_cache_override: None,
    disable_custom_kernels: false,
    cuda_memory_fraction: 1.0,
    rope_scaling: None,
    rope_factor: None,
    json_output: false,
    otlp_endpoint: None,
    cors_allow_origin: [],
    watermark_gamma: None,
    watermark_delta: None,
    ngrok: false,
    ngrok_authtoken: None,
    ngrok_edge: None,
    tokenizer_config_path: None,
    disable_grammar_support: false,
    env: false,
    max_client_batch_size: 4,
}

Expected behavior

no failure

@LysandreJik
Copy link
Member

Hey @pseudotensor, thanks a lot for opening an issue! Would you mind sharing here the command that led to this error, maybe alongside an image (or a batch of images) that lead to this problem? I'd be happy to explore this issue if you can help me out here.

danieldk added a commit that referenced this issue Jun 13, 2024
When a batch contained images if different sizes during prefill, the
server would fail (see e.g. #2056). Images were processed separately and
then concatenated. However, this can fail for images with different sizes.

Fix this by preprocessing all images in the batch together, so that the
image processor can ensure that all image tensors have compatible sizes.
danieldk added a commit that referenced this issue Jun 14, 2024
When a batch contained images if different sizes during prefill, the
server would fail (see e.g. #2056). Images were processed separately and
then concatenated. However, this can fail for images with different sizes.

Fix this by preprocessing all images in the batch together, so that the
image processor can ensure that all image tensors have compatible sizes.
@tctrautman
Copy link

I'm seeing the same error when I pass multiple images with different dimensions to the /generate endpoint. Thankfully it looks like @danieldk's #2065 should fix this!

danieldk added a commit that referenced this issue Jun 17, 2024
When a batch contained images if different sizes during prefill, the
server would fail (see e.g. #2056). Images were processed separately and
then concatenated. However, this can fail for images with different sizes.

Fix this by preprocessing all images in the batch together, so that the
image processor can ensure that all image tensors have compatible sizes.
yuanwu2017 pushed a commit to yuanwu2017/tgi-gaudi that referenced this issue Sep 26, 2024
When a batch contained images if different sizes during prefill, the
server would fail (see e.g. huggingface#2056). Images were processed separately and
then concatenated. However, this can fail for images with different sizes.

Fix this by preprocessing all images in the batch together, so that the
image processor can ensure that all image tensors have compatible sizes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants