Skip to content

Addition of generic pre build and post build hooks. #9892

Open
@erikd

Description

@erikd

Describe the feature request
The ability to call shell scripts just before and just after each package is built.

At IOG, our use case for these is for build caching, particularly in CI. Since these pre and post hooks are just shell scripts, there are probably other uses for them.

Additional context
At IOG we have a number of very large Haskell projects with deep dependency trees, that can take a long time to build in CI.

The obvious answer to long build times is caching of build products. A previous attempt at this caching was made, but that solution was not really very satisfactory, because the cache was keyed on ${CPU}-${OS}-${GHC_VERSION)-${hash-of-dependencies}. The first three are obvious. The problem is hash-of-dependencies. If a single high level dependency changes, there will be no cache hit and everything will be built from scratch. As it turns out, this is actually the most common scenario.

A better caching solution is one where the caching is done on individual dependencies rather than on all the dependencies as a huge blob. Caching individual dependencies means that when a high level dependency changes, there is a very high likelihood that all the lower level dependencies will still be found in the cache.

My initial implementation of this package level caching was done as a simple wrapper around cabal that used rsync to fetch and save the cache over ssh to another machine. This proved highly effective and I was able to populate the cache from one machine and use if from another (both machines running Debian Linux).

However, @angerman came up with an even better solution that required adding the ability to run shell scripts before and after the build of each individual package. Using this feature (we have rough patches against cabal HEAD and version 3.10.3.0) we are able to use our own Amazon S3 storage for our cache. We do not propose to make this S3 storage public (obvious potential security problems) but any organization like ours or any individual could use their own S3 storage. I also have a working pair of pre and post build hooks that use ssh to a different machine as the storage backend.

The naive patch against HEAD (error handling could be improved, maybe the hook names could be made configurable) is:

diff --git a/cabal-install/src/Distribution/Client/ProjectBuilding/UnpackedPackage.hs b/cabal-install/src/Distribution/Client/ProjectBuilding/UnpackedPackage.hs
index 065334d5c..570c9b18c 100644
--- a/cabal-install/src/Distribution/Client/ProjectBuilding/UnpackedPackage.hs
+++ b/cabal-install/src/Distribution/Client/ProjectBuilding/UnpackedPackage.hs
@@ -678,7 +678,22 @@ buildAndInstallUnpackedPackage
           runConfigure
         PBBuildPhase{runBuild} -> do
           noticeProgress ProgressBuilding
-          runBuild
+          -- run preBuildHook. If it returns with 0, we assume the build was
+          -- successful. If not, run the build.
+          code <- rawSystemExitCode verbosity (Just srcdir) "preBuildHook" [
+              (unUnitId $ installedUnitId rpkg)
+            , (getSymbolicPath srcdir)
+            , (getSymbolicPath builddir)
+            ] `catchIO` (\_ -> return (ExitFailure 10))
+          when (code /= ExitSuccess) $ do
+            runBuild
+            -- not sure, if we want to care about a failed postBuildHook?
+            void $ rawSystemExitCode verbosity (Just srcdir) "postBuildHook" [
+                (unUnitId $ installedUnitId rpkg)
+              , (getSymbolicPath srcdir)
+              , (getSymbolicPath builddir)
+              ] `catchIO` (\_ -> return (ExitFailure 10))
+
         PBHaddockPhase{runHaddock} -> do
           noticeProgress ProgressHaddock
           runHaddock

An example of our preBuildHook script (kept in ~/.cabal/iog-hooks, S3 credentials pulled from s3-credentials.bash, the aws executable is from the awscli package) is as follows:

#!/usr/bin/env bash

# shellcheck disable=SC1091,SC2046
. $(dirname "${BASH_SOURCE[0]}")/s3-credentials.bash

CACHE_KEY="$1.tar.gz"

# Check if artifact exists in S3 (with endpoint and credentials)
aws s3 ls s3://"$CACHE_BUCKET"/"$CACHE_KEY" --endpoint-url "$AWS_ENDPOINT" > /dev/null 2>&1

# shellcheck disable=SC2181
if [ $? -eq 0 ]; then
  echo "S3 hit       $1"
  aws s3 cp s3://"$CACHE_BUCKET"/"$CACHE_KEY" - --endpoint-url "$AWS_ENDPOINT" | tar -xz
else
  echo "S3 miss      $1"
  exit 1
fi

To use the cache I run cabal as:

PATH=$PATH:$HOME/.cabal/iog-hooks cabal 

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions