Skip to content

UCX SEGV in osc_ucx_component.c #5083

Closed
@gpaulsen

Description

@gpaulsen

@xinzhao3 @jladd-mlnx
many IBM tests on v3.1.x and on master have been failing for a number of weeks with a runtime segv due to the OSC UCX component.

I believe this should be easy to reproduce, though I'm not sure where the argument to the 'flavor' is coming from.

I think we should either block v3.1.x or disable the ucx osc component for the v3.1.x until we figure this out, due to how easy it is to his this issue.

aint: osc_ucx_component.c:246: int mem_map(void **, size_t, ucp_mem_h *, ompi_osc_ucx_module_t *,
int): Assertion `flavor == 2 || flavor == 1' failed.
[c656f6n05:122836] *** Process received signal ***
[c656f6n05:122836] Signal: Aborted (6)
[c656f6n05:122836] Signal code:  (-6)
[c656f6n05:122836] [ 0] [0x3fff9fcd0478]
[c656f6n05:122836] [ 1] aint: osc_ucx_component.c:246: int mem_map(void **, size_t, ucp_mem_h *,
ompi_osc_ucx_module_t *, int): Assertion `flavor == 2 || flavor == 1' failed.
[c656f6n05:122835] *** Process received signal ***
[c656f6n05:122835] Signal: Aborted (6)
[c656f6n05:122835] Signal code:  (-6)
[c656f6n05:122835] [ 0] [0x3fffa46f0478]
[c656f6n05:122835] [ 1] /lib64/libc.so.6(abort+0x280)[0x3fff9f530d70]
[c656f6n05:122836] [ 2] /lib64/libc.so.6(abort+0x280)[0x3fffa3f50d70]
[c656f6n05:122835] [ 2] /lib64/libc.so.6(+0x348a4)[0x3fff9f5248a4]
[c656f6n05:122836] [ 3] /lib64/libc.so.6(+0x348a4)[0x3fffa3f448a4]
[c656f6n05:122835] [ 3] /lib64/libc.so.6(__assert_fail+0x64)[0x3fff9f524994]
[c656f6n05:122836] [ 4] /lib64/libc.so.6(__assert_fail+0x64)[0x3fffa3f44994

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions