Skip to content

Conversation

@Ravi-Kappiyoor
Copy link
Contributor

Each of the changes below changes something that would have otherwise crashed right at startup to not crash. I.e., it should not affect anyone running Firedancer today (unless they are unable to start, in which case, it might help them...).

Copy link
Contributor Author

@Ravi-Kappiyoor Ravi-Kappiyoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the PR description, without these changes, any machine that encounters these issues will be unable to boot up and/or create a workspace. I.e., these changes should not affect anyone that is currently able to boot up and/or create a workspace - should only make it easier for machines that are not currently supported to potentially start up (with suitable warning messages that hopefully indicate that things are in a weird state).

FD_LOG_WARNING(( "opendir( \"%s\" ) failed (%i-%s)", path, errno, fd_io_strerror( errno ) ));
return 0UL;
FD_LOG_WARNING(( "opendir( \"%s\" ) failed (%i-%s) - assuming a single NUMA node", path, errno, fd_io_strerror( errno ) ));
return 1UL;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this change, fd_shmem_private_boot (run as part of fd_boot) would fail due to:

  ulong numa_cnt = fd_numa_node_cnt();
  if( FD_UNLIKELY( !((1UL<=numa_cnt) & (numa_cnt<=FD_SHMEM_NUMA_MAX)) ) )
    FD_LOG_ERR(( "fd_shmem: unexpected numa_cnt %lu (expected in [1,%lu])", numa_cnt, FD_SHMEM_NUMA_MAX ));

FD_LOG_WARNING(( "No numa node found in \"%s\"", path ));
return ULONG_MAX;
FD_LOG_WARNING(( "No numa node found in \"%s\" - assuming NUMA node 0", path ));
return 0UL;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this change, fd_shmem_private_boot (run as part of fd_boot) would fail due to:

    ulong numa_idx = fd_numa_node_idx( cpu_idx );
    if( FD_UNLIKELY( numa_idx>=FD_SHMEM_NUMA_MAX) )
      FD_LOG_ERR(( "fd_shmem: unexpected numa idx (%lu) for cpu idx %lu", numa_idx, cpu_idx ));

long rc = syscall( SYS_get_mempolicy, mode, nodemask, maxnode, addr, flags );
if ( rc && errno == ENOSYS ) {
FD_LOG_WARNING(( "System appears to not support NUMA - unable to get mempolicy. Attempting to continue..." ));
return 0L;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this change, we are unable to create a workspace due to the following in fd_shmem_create_multi_flags:

  if( FD_UNLIKELY( fd_numa_get_mempolicy( &orig_mempolicy, orig_nodemask, FD_SHMEM_NUMA_MAX, NULL, 0UL ) ) ) {
    FD_LOG_WARNING(( "fd_numa_get_mempolicy failed (%i-%s)", errno, fd_io_strerror( errno ) ));
    ERROR( done );
  }

long rc = syscall( SYS_set_mempolicy, mode, nodemask, maxnode );
if ( rc && errno == ENOSYS ) {
FD_LOG_WARNING(( "System appears to not support NUMA - unable to set mempolicy. Attempting to continue..." ));
return 0L;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this change, we are unable to create a workspace due to the following in fd_shmem_create_multi_flags:

    if( FD_UNLIKELY( fd_numa_set_mempolicy( MPOL_BIND | MPOL_F_STATIC_NODES, nodemask, FD_SHMEM_NUMA_MAX ) ) ) {
      FD_LOG_WARNING(( "fd_numa_set_mempolicy failed (%i-%s)", errno, fd_io_strerror( errno ) ));
      ERROR( unmap );
    }

long rc = syscall( SYS_mbind, addr, len, mode, nodemask, maxnode, flags );
if ( rc && errno == ENOSYS ) {
FD_LOG_WARNING(( "System appears to not support NUMA - unable to bind memory. Attempting to continue..." ));
return 0L;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this change, we are unable to create a workspace due to the following in fd_shmem_create_multi_flags:

    if( FD_UNLIKELY( fd_numa_mbind( sub_shmem, sub_sz, MPOL_BIND, nodemask, FD_SHMEM_NUMA_MAX, MPOL_MF_MOVE|MPOL_MF_STRICT ) ) ) {
      FD_LOG_WARNING(( "sub[%lu]: fd_numa_mbind(\"%s\",%lu KiB,MPOL_BIND,1UL<<%lu,MPOL_MF_MOVE|MPOL_MF_STRICT) failed (%i-%s)",
                       sub_idx, path, sub_sz>>10, sub_numa_idx, errno, fd_io_strerror( errno ) ));
      ERROR( unmap );
    }

@mmcgee-jump
Copy link
Contributor

What hardware / environment is this running on where the kernel NUMA APIs are not working? Is this a virtual machine?

@Ravi-Kappiyoor
Copy link
Contributor Author

What hardware / environment is this running on where the kernel NUMA APIs are not working? Is this a virtual machine?

Yes. Specifically, this is to try and get "Shelby-in-a-box" working. The idea is:

  • For developers that wish to build against Shelby, we want to provide them a nice, easy "localnet" that they can spin up and tear down to test local development changes
  • To do so, we are spinning up Ubuntu containers that run the client
  • It turns out that Docker on Desktop for Macs implements a Linux VM that is where the containers are actually running
    • However, this VM is missing key components - e.g., any form of NUMA support

These changes allow the system to get off the ground and start running (after a number of Shelby-specific changes as well...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants