-
Notifications
You must be signed in to change notification settings - Fork 368
Allow systems without NUMA support to start up #7524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Ravi-Kappiyoor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the PR description, without these changes, any machine that encounters these issues will be unable to boot up and/or create a workspace. I.e., these changes should not affect anyone that is currently able to boot up and/or create a workspace - should only make it easier for machines that are not currently supported to potentially start up (with suitable warning messages that hopefully indicate that things are in a weird state).
src/util/shmem/fd_numa_linux.c
Outdated
| FD_LOG_WARNING(( "opendir( \"%s\" ) failed (%i-%s)", path, errno, fd_io_strerror( errno ) )); | ||
| return 0UL; | ||
| FD_LOG_WARNING(( "opendir( \"%s\" ) failed (%i-%s) - assuming a single NUMA node", path, errno, fd_io_strerror( errno ) )); | ||
| return 1UL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this change, fd_shmem_private_boot (run as part of fd_boot) would fail due to:
ulong numa_cnt = fd_numa_node_cnt();
if( FD_UNLIKELY( !((1UL<=numa_cnt) & (numa_cnt<=FD_SHMEM_NUMA_MAX)) ) )
FD_LOG_ERR(( "fd_shmem: unexpected numa_cnt %lu (expected in [1,%lu])", numa_cnt, FD_SHMEM_NUMA_MAX ));
| FD_LOG_WARNING(( "No numa node found in \"%s\"", path )); | ||
| return ULONG_MAX; | ||
| FD_LOG_WARNING(( "No numa node found in \"%s\" - assuming NUMA node 0", path )); | ||
| return 0UL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this change, fd_shmem_private_boot (run as part of fd_boot) would fail due to:
ulong numa_idx = fd_numa_node_idx( cpu_idx );
if( FD_UNLIKELY( numa_idx>=FD_SHMEM_NUMA_MAX) )
FD_LOG_ERR(( "fd_shmem: unexpected numa idx (%lu) for cpu idx %lu", numa_idx, cpu_idx ));
| long rc = syscall( SYS_get_mempolicy, mode, nodemask, maxnode, addr, flags ); | ||
| if ( rc && errno == ENOSYS ) { | ||
| FD_LOG_WARNING(( "System appears to not support NUMA - unable to get mempolicy. Attempting to continue..." )); | ||
| return 0L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this change, we are unable to create a workspace due to the following in fd_shmem_create_multi_flags:
if( FD_UNLIKELY( fd_numa_get_mempolicy( &orig_mempolicy, orig_nodemask, FD_SHMEM_NUMA_MAX, NULL, 0UL ) ) ) {
FD_LOG_WARNING(( "fd_numa_get_mempolicy failed (%i-%s)", errno, fd_io_strerror( errno ) ));
ERROR( done );
}
| long rc = syscall( SYS_set_mempolicy, mode, nodemask, maxnode ); | ||
| if ( rc && errno == ENOSYS ) { | ||
| FD_LOG_WARNING(( "System appears to not support NUMA - unable to set mempolicy. Attempting to continue..." )); | ||
| return 0L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this change, we are unable to create a workspace due to the following in fd_shmem_create_multi_flags:
if( FD_UNLIKELY( fd_numa_set_mempolicy( MPOL_BIND | MPOL_F_STATIC_NODES, nodemask, FD_SHMEM_NUMA_MAX ) ) ) {
FD_LOG_WARNING(( "fd_numa_set_mempolicy failed (%i-%s)", errno, fd_io_strerror( errno ) ));
ERROR( unmap );
}
| long rc = syscall( SYS_mbind, addr, len, mode, nodemask, maxnode, flags ); | ||
| if ( rc && errno == ENOSYS ) { | ||
| FD_LOG_WARNING(( "System appears to not support NUMA - unable to bind memory. Attempting to continue..." )); | ||
| return 0L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this change, we are unable to create a workspace due to the following in fd_shmem_create_multi_flags:
if( FD_UNLIKELY( fd_numa_mbind( sub_shmem, sub_sz, MPOL_BIND, nodemask, FD_SHMEM_NUMA_MAX, MPOL_MF_MOVE|MPOL_MF_STRICT ) ) ) {
FD_LOG_WARNING(( "sub[%lu]: fd_numa_mbind(\"%s\",%lu KiB,MPOL_BIND,1UL<<%lu,MPOL_MF_MOVE|MPOL_MF_STRICT) failed (%i-%s)",
sub_idx, path, sub_sz>>10, sub_numa_idx, errno, fd_io_strerror( errno ) ));
ERROR( unmap );
}
|
What hardware / environment is this running on where the kernel NUMA APIs are not working? Is this a virtual machine? |
Yes. Specifically, this is to try and get "Shelby-in-a-box" working. The idea is:
These changes allow the system to get off the ground and start running (after a number of Shelby-specific changes as well...) |
Each of the changes below changes something that would have otherwise crashed right at startup to not crash. I.e., it should not affect anyone running Firedancer today (unless they are unable to start, in which case, it might help them...).