docs: update README #426

zhyncs · 2024-08-07T07:09:35Z

 /usr/lib/gcc/x86_64-linux-gnu/11/../../../x86_64-linux-gnu/crti.o: in function `_init':
    (.init+0xb): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
    /flashinfer/python/build/temp.linux-x86_64-cpython-310/csrc/batch_decode.o: in function `__cudaUnregisterBinaryUtil()':
    tmpxft_00000886_00000000-6_batch_decode.cudafe1.cpp:(.text+0x1d7): relocation truncated to fit: R_X86_64_PC32 against `.bss'
    /flashinfer/python/build/temp.linux-x86_64-cpython-310/csrc/batch_decode.o: in function `std::string::_Rep::_M_dispose(std::allocator<char> const&) [clone .part.0]':
    tmpxft_00000886_00000000-6_batch_decode.cudafe1.cpp:(.text+0x1e3): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `__libc_single_threaded@@GLIBC_2.32' defined in .bss section in /lib/x86_64-linux-gnu/libc.so.6
    /flashinfer/python/build/temp.linux-x86_64-cpython-310/csrc/batch_decode.o: in function `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) [clone .constprop.0]':
    tmpxft_00000886_00000000-6_batch_decode.cudafe1.cpp:(.text+0x237): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::cerr@@GLIBCXX_3.4' defined in .bss section in /usr/lib/gcc/x86_64-linux-gnu/11/libstdc++.so
    tmpxft_00000886_00000000-6_batch_decode.cudafe1.cpp:(.text+0x25b): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `std::cerr@@GLIBCXX_3.4' defined in .bss section in /usr/lib/gcc/x86_64-linux-gnu/11/libstdc++.so
    /flashinfer/python/build/temp.linux-x86_64-cpython-310/csrc/batch_decode.o: in function `void* flashinfer::AlignedAllocator::aligned_alloc<void>(unsigned long, unsigned long, std::string) [clone .constprop.0]':
    tmpxft_00000886_00000000-6_batch_decode.cudafe1.cpp:(.text+0x30a): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/11/libstdc++.so
    tmpxft_00000886_00000000-6_batch_decode.cudafe1.cpp:(.text+0x324): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::basic_streambuf<char, std::char_traits<char> >@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/11/libstdc++.so
    tmpxft_00000886_00000000-6_batch_decode.cudafe1.cpp:(.text+0x3cf): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::basic_ios<char, std::char_traits<char> >@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/11/libstdc++.so
    tmpxft_00000886_00000000-6_batch_decode.cudafe1.cpp:(.text+0x402): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `VTT for std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/11/libstdc++.so
    tmpxft_00000886_00000000-6_batch_decode.cudafe1.cpp:(.text+0x46e): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/11/libstdc++.so
    tmpxft_00000886_00000000-6_batch_decode.cudafe1.cpp:(.text+0x48c): additional relocation overflows omitted from the output
    build/lib.linux-x86_64-cpython-310/flashinfer/_kernels.cpython-310-x86_64-linux-gnu.so: PC-relative offset overflow in PLT entry for `PyDict_DelItemString'
    collect2: error: ld returned 1 exit status
    error: command '/usr/bin/x86_64-linux-gnu-g++' failed with exit code 1
    [end of output]

yzh119 · 2024-08-07T07:13:49Z

README.md

+```bash
+git clone https://github.com/flashinfer-ai/flashinfer.git --recursive
+cd flashinfer/python
+# workaround for undefined symbol `__gmon_start__' on A100


Can you provide a link to an existing issue (if any)?

I encountered this issue when compiling FlashInfer from source on A100 machine with 8 devices (https://www.runpod.io). The image used was pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04, and I did not raise a separate issue for it. The method in this PR is a workaround that I verified on runpod.

Oh I just noticed your error message, that error appears when binary size is too large, which I should fix these days. Limiting the target CUDA arch is an option to reduce binary size, not only applies to A100. I suppose you might encounter the same issue for other GPU instances.

I found similar issues or questions on forums through Google, but their solutions didn't resolve my issue. ref https://www.google.com/search?q=R_X86_64_REX_GOTPCRELX%20against%20undefined%20symbol%20%60__gmon_start__%27

I suppose you might encounter the same issue for other GPU instances.

Yes😂

The fundamental solution is to break the CUDAExtension into multiple sudmoules, and compile each of them into a shared object with reasonable size. cc @Yard1 as you might be interested.

Currently, the workflow running in the FlashInfer repository is functioning properly. For most users (not FlashInfer developers), using the whl compiled in the workflow is sufficient. If someone is a developer and need to modify code and compile within FlashInfer, like me, before implementing what you mentioned as 'break the CUDAExtension into multiple submodules', perhaps this is an acceptable workaround.

So my suggestion here is to keep this note, but mentioning it's for reducing binary size, don't say it's only for A100.
The following function in torch can help user identify their device capability:
https://pytorch.org/docs/stable/generated/torch.cuda.get_device_capability.html#torch.cuda.get_device_capability

@yzh119 May you help check if the new changes are alright? Thanks.

yzh119

I'll merge this first and then organize a FAQ page for these issues, thank you for your contribution.

docs: update README

8d6725e

zhyncs requested a review from yzh119 August 7, 2024 07:09

yzh119 reviewed Aug 7, 2024

View reviewed changes

update

389e899

yzh119 approved these changes Aug 7, 2024

View reviewed changes

yzh119 merged commit ddc1f09 into flashinfer-ai:main Aug 7, 2024

zhyncs deleted the doc branch August 7, 2024 08:12

zhyncs mentioned this pull request Aug 7, 2024

misc: add compute capability in check_env sgl-project/sglang#965

Merged

zhyncs added the documentation Improvements or additions to documentation label Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: update README #426

docs: update README #426

Uh oh!

zhyncs commented Aug 7, 2024

Uh oh!

yzh119 Aug 7, 2024

Uh oh!

zhyncs Aug 7, 2024

Uh oh!

yzh119 Aug 7, 2024

Uh oh!

zhyncs Aug 7, 2024

Uh oh!

zhyncs Aug 7, 2024

Uh oh!

yzh119 Aug 7, 2024

Uh oh!

zhyncs Aug 7, 2024

Uh oh!

yzh119 Aug 7, 2024

Uh oh!

zhyncs Aug 7, 2024

Uh oh!

zhyncs Aug 7, 2024

Uh oh!

yzh119 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

docs: update README #426

docs: update README #426

Uh oh!

Conversation

zhyncs commented Aug 7, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yzh119 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants