Skip to content

Make EESSI-extend support accelerator installations #27

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jul 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/scripts/verify_eessi_environment.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,8 @@ def check_env_endswith(var1, var2):
check_env_equals("EESSI_ACCELERATOR_TARGET_OVERRIDE", "EESSI_ACCEL_SUBDIR")
# special case is where EESSI_ACCELERATOR_TARGET_OVERRIDE may not match the final
# accelerator architecture chosen.
# In CI we set FINAL_ACCELERATOR_PATH_EXPECTED to allow us to compare against an expected value.
check_env_equals("EESSI_ACCELERATOR_TARGET", "FINAL_ACCELERATOR_PATH_EXPECTED")
# In CI we set FINAL_ACCELERATOR_TARGET_EXPECTED to allow us to compare against an expected value.
check_env_equals("EESSI_ACCELERATOR_TARGET", "FINAL_ACCELERATOR_TARGET_EXPECTED")
# verify the software paths that should exist
check_env_endswith("EESSI_SOFTWARE_PATH", "EESSI_SOFTWARE_SUBDIR")
check_env_endswith("EESSI_SITE_SOFTWARE_PATH", "EESSI_SOFTWARE_SUBDIR")
Expand Down
87 changes: 85 additions & 2 deletions .github/workflows/tests_eessi_extend_module.yml
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,8 @@ jobs:
export EESSI_PROJECT_INSTALL="$MY_INSTALLATION_PATH"
module load EESSI-extend/${{matrix.EESSI_VERSION}}-easybuild
# check some specific envvars
check_env_var "EASYBUILD_INSTALLPATH" "$MY_INSTALLATION_PATH/versions/$EESSI_VERSION/software/linux/$EESSI_SOFTWARE_SUBDIR"
export EXPECTED_INSTALLATION_PATH="$MY_INSTALLATION_PATH/versions/$EESSI_VERSION/software/linux/$EESSI_SOFTWARE_SUBDIR"
check_env_var "EASYBUILD_INSTALLPATH" "$EXPECTED_INSTALLATION_PATH"
check_env_var "EASYBUILD_UMASK" "002"
check_env_var "EASYBUILD_GROUP_WRITABLE_INSTALLDIR" "1"
# unload and check the environment is clean again
Expand All @@ -118,10 +119,92 @@ jobs:
mkdir -p $EESSI_USER_INSTALL # must exist
module load EESSI-extend/${{matrix.EESSI_VERSION}}-easybuild
# check some specific envvars
check_env_var "EASYBUILD_INSTALLPATH" "$MY_INSTALLATION_PATH/$USER/versions/$EESSI_VERSION/software/linux/$EESSI_SOFTWARE_SUBDIR"
export EXPECTED_INSTALLATION_PATH="$MY_INSTALLATION_PATH/$USER/versions/$EESSI_VERSION/software/linux/$EESSI_SOFTWARE_SUBDIR"
check_env_var "EASYBUILD_INSTALLPATH" "$EXPECTED_INSTALLATION_PATH"
check_env_var "EASYBUILD_UMASK" "077"
# unload and check the environment is clean again
module unload EESSI-extend
check_disallowed_env_prefix EASYBUILD_
unset EESSI_USER_INSTALL

- name: Run tests for EESSI-extend in the various GPU scenarios
run: |
export MY_INSTALLATION_PATH=/tmp/easybuild

# Define a function to check the values of environment variables
# and another that checks an environment does not contain environment
# variables matching a certain pattern
source .github/workflows/scripts/test_utils.sh

# Set an environment variable to use when we want to target accelerators
export STORED_EESSI_ACCELERATOR_TARGET_OVERRIDE="accel/nvidia/cc80"
export STORED_CUDA_CC="8.0"

# Let's start from a clean slate
module purge
export EESSI_ACCELERATOR_TARGET_OVERRIDE=$STORED_EESSI_ACCELERATOR_TARGET_OVERRIDE
module load EESSI/${{matrix.EESSI_VERSION}}
# Access the installed EESSI-extend
module use "$MY_INSTALLATION_PATH"/modules/all
check_disallowed_env_prefix EASYBUILD_

# Configure for CVMFS install
export EESSI_CVMFS_INSTALL=1
module load EESSI-extend/${{matrix.EESSI_VERSION}}-easybuild
check_env_var "EASYBUILD_INSTALLPATH" "$EESSI_SOFTWARE_PATH" # installation path should be the same unless we ask for an explicit GPU installation
check_env_var "EASYBUILD_CUDA_COMPUTE_CAPABILITIES" "$STORED_CUDA_CC"
export EESSI_ACCELERATOR_INSTALL=1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is this variable (EESSI_ACCELERATOR_INSTALL) and EESSI_ACCELERATOR_TARGET_OVERRIDE that need to be set by the bot in order to configure EESSI-extend correctly for a GPU installation

module load EESSI-extend/${{matrix.EESSI_VERSION}}-easybuild # reload for an actual GPU installation
check_env_var "EASYBUILD_INSTALLPATH" "${EESSI_SOFTWARE_PATH}/${EESSI_ACCELERATOR_TARGET_OVERRIDE}"
# unload and make sure the environment is clean again
module unload EESSI-extend
check_disallowed_env_prefix EASYBUILD_
unset EESSI_ACCELERATOR_INSTALL
unset EESSI_CVMFS_INSTALL

# Now configure for a site
export EESSI_SITE_INSTALL=1
module load EESSI-extend/${{matrix.EESSI_VERSION}}-easybuild
check_env_var "EASYBUILD_INSTALLPATH" "$EESSI_SITE_SOFTWARE_PATH" # installation path should be the same unless we ask for an explicit GPU installation
check_env_var "EASYBUILD_CUDA_COMPUTE_CAPABILITIES" "$STORED_CUDA_CC"
export EESSI_ACCELERATOR_INSTALL=1
module load EESSI-extend/${{matrix.EESSI_VERSION}}-easybuild # reload for an actual GPU installation
check_env_var "EASYBUILD_INSTALLPATH" "${EESSI_SITE_SOFTWARE_PATH}/${EESSI_ACCELERATOR_TARGET_OVERRIDE}"
# unload and make sure the environment is clean again
module unload EESSI-extend
check_disallowed_env_prefix EASYBUILD_
unset EESSI_ACCELERATOR_INSTALL
unset EESSI_SITE_INSTALL

# Now for a project
export EESSI_PROJECT_INSTALL="$MY_INSTALLATION_PATH"
export EXPECTED_INSTALLATION_PATH="$MY_INSTALLATION_PATH/versions/$EESSI_VERSION/software/linux/$EESSI_SOFTWARE_SUBDIR"
module load EESSI-extend/${{matrix.EESSI_VERSION}}-easybuild
check_env_var "EASYBUILD_INSTALLPATH" "$EXPECTED_INSTALLATION_PATH" # installation path should be the same unless we ask for an explicit GPU installation
check_env_var "EASYBUILD_CUDA_COMPUTE_CAPABILITIES" "$STORED_CUDA_CC"
export EESSI_ACCELERATOR_INSTALL=1
module load EESSI-extend/${{matrix.EESSI_VERSION}}-easybuild # reload for an GPU actual installation
check_env_var "EASYBUILD_INSTALLPATH" "$EXPECTED_INSTALLATION_PATH" # installation path should be the same for project case
# unload and make sure the environment is clean again
module unload EESSI-extend
check_disallowed_env_prefix EASYBUILD_
unset EESSI_ACCELERATOR_INSTALL
unset EESSI_PROJECT_INSTALL

# Now for a user
export EESSI_USER_INSTALL="$MY_INSTALLATION_PATH/$USER"
mkdir -p $EESSI_USER_INSTALL # must exist
module load EESSI-extend/${{matrix.EESSI_VERSION}}-easybuild
# check some specific envvars
export EXPECTED_INSTALLATION_PATH="$MY_INSTALLATION_PATH/$USER/versions/$EESSI_VERSION/software/linux/$EESSI_SOFTWARE_SUBDIR"
module load EESSI-extend/${{matrix.EESSI_VERSION}}-easybuild
check_env_var "EASYBUILD_INSTALLPATH" "$EXPECTED_INSTALLATION_PATH" # installation path should be the same unless we ask for an explicit GPU installation
check_env_var "EASYBUILD_CUDA_COMPUTE_CAPABILITIES" "$STORED_CUDA_CC"
export EESSI_ACCELERATOR_INSTALL=1
module load EESSI-extend/${{matrix.EESSI_VERSION}}-easybuild # reload for an actual GPU installation
check_env_var "EASYBUILD_INSTALLPATH" "$EXPECTED_INSTALLATION_PATH" # installation path should be the same for user case
# unload and make sure the environment is clean again
module unload EESSI-extend
check_disallowed_env_prefix EASYBUILD_
unset EESSI_ACCELERATOR_INSTALL
unset EESSI_USER_INSTALL
6 changes: 3 additions & 3 deletions .github/workflows/tests_eessi_module.yml
Original file line number Diff line number Diff line change
Expand Up @@ -165,9 +165,9 @@ jobs:
include:
# For each override we expect a specific path (which may differ from the original due to overrides)
- EESSI_ACCELERATOR_TARGET_OVERRIDE: accel/nvidia/cc80
FINAL_ACCELERATOR_PATH_EXPECTED: accel/nvidia/cc80
FINAL_ACCELERATOR_TARGET_EXPECTED: accel/nvidia/cc80
- EESSI_ACCELERATOR_TARGET_OVERRIDE: accel/nvidia/cc77 # deliberately chose a non-existent CUDA capability
FINAL_ACCELERATOR_PATH_EXPECTED: accel/nvidia/cc70 # this reverts to the fallback case (which does exist)
FINAL_ACCELERATOR_TARGET_EXPECTED: accel/nvidia/cc70 # this reverts to the fallback case (which does exist)

steps:
- name: Check out software-layer repository
Expand All @@ -193,7 +193,7 @@ jobs:
# Set our accelerator path overrides according to our matrix
if [[ "${{matrix.EESSI_ACCELERATOR_TARGET_OVERRIDE}}" != "none" ]]; then
export EESSI_ACCELERATOR_TARGET_OVERRIDE=${{matrix.EESSI_ACCELERATOR_TARGET_OVERRIDE}}
export FINAL_ACCELERATOR_PATH_EXPECTED=${{matrix.FINAL_ACCELERATOR_PATH_EXPECTED}}
export FINAL_ACCELERATOR_TARGET_EXPECTED=${{matrix.FINAL_ACCELERATOR_TARGET_EXPECTED}}
fi

# Turn on debug output in case we want to take a look
Expand Down
34 changes: 25 additions & 9 deletions EESSI-extend-easybuild.eb
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,11 @@ description = """
If both EESSI_USER_INSTALL and EESSI_PROJECT_INSTALL are defined, both sets of
installations are exposed, but new installations are created as user
installations.

Strict installation path checking is enforced by EESSI for EESSI and site
installations involving accelerators. In these cases, if you wish to create an
accelerator installation you must set the environment variable
EESSI_ACCELERATOR_INSTALL (and load/reload this module).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new environment variable has an impact on the build scripts, it needs to be set in the scenario where we expect to do an accelerator installation

"""

toolchain = SYSTEM
Expand Down Expand Up @@ -78,8 +83,21 @@ if (mode() == "load") then
end
end
working_dir = os.getenv("WORKING_DIR") or pathJoin("/tmp", os.getenv("USER"))

-- Gather the EPREFIX to use as a sysroot
sysroot = os.getenv("EESSI_EPREFIX")

-- Check if we have GPU capabilities and configure CUDA compute capabilities
eessi_accelerator_target = os.getenv("EESSI_ACCELERATOR_TARGET")
if (eessi_accelerator_target ~= nil) then
cuda_compute_capability = string.match(eessi_accelerator_target, "^accel/nvidia/cc([0-9][0-9])$")
if (cuda_compute_capability ~= nil) then
easybuild_cuda_compute_capabilities = cuda_compute_capability:sub(1, 1) .. "." .. cuda_compute_capability:sub(2, 2)
else
LmodError("Incorrect value for $EESSI_ACCELERATOR_TARGET: " .. eessi_accelerator_target)
end
end

-- Use an installation prefix that we _should_ have write access to
if (os.getenv("EESSI_CVMFS_INSTALL") ~= nil) then
-- Make sure no other EESSI install environment variables are set
Expand All @@ -88,22 +106,20 @@ if (os.getenv("EESSI_CVMFS_INSTALL") ~= nil) then
end
eessi_cvmfs_install = true
easybuild_installpath = os.getenv("EESSI_SOFTWARE_PATH")
eessi_accelerator_target = os.getenv("EESSI_ACCELERATOR_TARGET")
if (eessi_accelerator_target ~= nil) then
cuda_compute_capability = string.match(eessi_accelerator_target, "^nvidia/cc([0-9][0-9])$")
if (cuda_compute_capability ~= nil) then
easybuild_installpath = pathJoin(easybuild_installpath, 'accel', eessi_accelerator_target)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually wrong, archdetect returns paths like accel/nvidia/cc80 (see https://github.com/EESSI/software-layer-scripts/blob/main/tests/archdetect/nvidia-smi/1xa100.output), but we were consistent in our error (the build script set EESSI_ACCELERATOR_TARGET incorrectly rather than set EESSI_ACCELERATOR_TARGET_OVERRIDE which would have affected the behaviour of archdetect).

easybuild_cuda_compute_capabilities = cuda_compute_capability:sub(1, 1) .. "." .. cuda_compute_capability:sub(2, 2)
else
LmodError("Incorrect value for $EESSI_ACCELERATOR_TARGET: " .. eessi_accelerator_target)
end
-- enforce accelerator subdirectory usage for CVMFS installs (only if an accelerator install is requested)
if (eessi_accelerator_target ~= nil) and (cuda_compute_capability ~= nil) and (os.getenv("EESSI_ACCELERATOR_INSTALL") ~= nil) then
easybuild_installpath = pathJoin(easybuild_installpath, eessi_accelerator_target)
end
elseif (os.getenv("EESSI_SITE_INSTALL") ~= nil) then
-- Make sure no other EESSI install environment variables are set
if ((os.getenv("EESSI_PROJECT_INSTALL") ~= nil) or (os.getenv("EESSI_USER_INSTALL") ~= nil)) then
LmodError("You cannot use EESSI_SITE_INSTALL in combination with any other EESSI_*_INSTALL environment variables")
end
easybuild_installpath = os.getenv("EESSI_SITE_SOFTWARE_PATH")
-- enforce accelerator subdirectory usage for site installs (only if an accelerator install is requested)
if (eessi_accelerator_target ~= nil) and (cuda_compute_capability ~= nil) and (os.getenv("EESSI_ACCELERATOR_INSTALL") ~= nil) then
easybuild_installpath = pathJoin(easybuild_installpath, eessi_accelerator_target)
end
else
-- Deal with user and project installs
project_install = os.getenv("EESSI_PROJECT_INSTALL")
Expand Down
10 changes: 10 additions & 0 deletions EESSI-install-software.sh
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,16 @@ if [[ ! -z ${EESSI_DEV_PROJECT} ]]; then
echo ">> \$EESSI_PROJECT_INSTALL set to ${EESSI_PROJECT_INSTALL}"
fi

# If we have EESSI_ACCELERATOR_TARGET_OVERRIDE set (and non-empty), then this implies building for a GPU target
# (this must be set _before_ we load EESSI-extend).
# We also make sure that EESSI_ACCELERATOR_TARGET is also set as EESSI_ACCELERATOR_TARGET_OVERRIDE must
# be set before the EESSI module is loaded in order to set accelerator information.
if [[ -n "$EESSI_ACCELERATOR_TARGET_OVERRIDE" && -z "$EESSI_ACCELERATOR_TARGET" ]]; then
fatal_error "EESSI module should've set EESSI_ACCELERATOR_TARGET ($EESSI_ACCELERATOR_TARGET) when EESSI_ACCELERATOR_TARGET_OVERRIDE ($EESSI_ACCELERATOR_TARGET_OVERRIDE) exported."
elif [[ -n "$EESSI_ACCELERATOR_TARGET_OVERRIDE" ]]; then
export EESSI_ACCELERATOR_INSTALL=1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fi

echo "DEBUG: before loading EESSI-extend // EASYBUILD_INSTALLPATH='${EASYBUILD_INSTALLPATH}'"
source $TOPDIR/load_eessi_extend_module.sh ${EESSI_VERSION}
echo "DEBUG: after loading EESSI-extend // EASYBUILD_INSTALLPATH='${EASYBUILD_INSTALLPATH}'"
Expand Down
10 changes: 8 additions & 2 deletions bot/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -162,8 +162,14 @@ export EESSI_SOFTWARE_SUBDIR_OVERRIDE
echo "bot/build.sh: EESSI_SOFTWARE_SUBDIR_OVERRIDE='${EESSI_SOFTWARE_SUBDIR_OVERRIDE}'"

# determine accelerator target (if any) from .architecture in ${JOB_CFG_FILE}
export EESSI_ACCELERATOR_TARGET=$(cfg_get_value "architecture" "accelerator")
echo "bot/build.sh: EESSI_ACCELERATOR_TARGET='${EESSI_ACCELERATOR_TARGET}'"
ACCEL_OVERRIDE=$(cfg_get_value "architecture" "accelerator")
if [[ -n "$ACCEL_OVERRIDE" ]]; then
# bot job config does not include accel subdirectory
export EESSI_ACCELERATOR_TARGET_OVERRIDE="accel/$ACCEL_OVERRIDE"
else
export EESSI_ACCELERATOR_TARGET_OVERRIDE=""
fi
echo "bot/build.sh: EESSI_ACCELERATOR_TARGET_OVERRIDE='${EESSI_ACCELERATOR_TARGET_OVERRIDE}'"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


# get EESSI_OS_TYPE from .architecture.os_type in ${JOB_CFG_FILE} (default: linux)
EESSI_OS_TYPE=$(cfg_get_value "architecture" "os_type")
Expand Down
4 changes: 2 additions & 2 deletions run_in_compat_layer_env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ fi
if [ ! -z ${EESSI_SOFTWARE_SUBDIR_OVERRIDE} ]; then
INPUT="export EESSI_SOFTWARE_SUBDIR_OVERRIDE=${EESSI_SOFTWARE_SUBDIR_OVERRIDE}; ${INPUT}"
fi
if [ ! -z ${EESSI_ACCELERATOR_TARGET} ]; then
INPUT="export EESSI_ACCELERATOR_TARGET=${EESSI_ACCELERATOR_TARGET}; ${INPUT}"
if [ ! -z ${EESSI_ACCELERATOR_TARGET_OVERRIDE} ]; then
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trz42 This is why I was asking about where these environment variables get set, this should be using the override mechanism

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INPUT="export EESSI_ACCELERATOR_TARGET_OVERRIDE=${EESSI_ACCELERATOR_TARGET_OVERRIDE}; ${INPUT}"
fi
if [ ! -z ${EESSI_CVMFS_REPO_OVERRIDE} ]; then
INPUT="export EESSI_CVMFS_REPO_OVERRIDE=${EESSI_CVMFS_REPO_OVERRIDE}; ${INPUT}"
Expand Down