Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packaging: Add pex_binary BUILD metadata for building st2 venv #6307

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

cognifloyd
Copy link
Member

@cognifloyd cognifloyd commented Feb 27, 2025

st2-packages uses a couple of methods for creating the /opt/stackstorm/st2 venv: dh_virtualenv on debian and other tools on EL.

With the move to Pants using lockfiles generated by PEX (Python EXecutable), PEX is the simplest way to build the /opt/stackstorm/st2 venv. (Note: "pex" is both the name of the tool and the archive files it produces.) Pex packages, or files, are essentially executable archives that contain wheels and run python code. For us, the pex package will basically be a self-extracting virtualenv.

The pants docs have a nice overview of PEX, including how to use it via pants.

In future PRs, I plan to add BUILD metadata to embed this pex package in rpm/deb files (built by pants+nfpm). The rpm/deb will use a post-install scriptlet that runs the pex file to generate the venv.

pex_binary(...) BUILD metadata

To generate a pex package, we need to add a pex_binary target that depends on all of our python_distribution targets. This list of dependencies replaces the in-requirements.txt file in st2-packages and the st2client dep which gets injected in the Makefile.

st2/packaging/BUILD.venv

Lines 35 to 57 in 5a2ad11

pex_binary(
name="st2.pex",
dependencies=[
# this should depend on all python_distribution targets
"//st2actions",
"//st2api",
"//st2auth",
"//st2client",
"//st2common",
"//st2reactor",
"//st2stream",
"//st2tests",
"//contrib/runners/action_chain_runner",
"//contrib/runners/announcement_runner",
"//contrib/runners/http_runner",
"//contrib/runners/inquirer_runner",
"//contrib/runners/local_runner",
"//contrib/runners/noop_runner",
"//contrib/runners/orquesta_runner",
"//contrib/runners/python_runner",
"//contrib/runners/remote_runner",
"//contrib/runners/winrm_runner",
],

There is config to control various aspects of the generated pex. I tried to document what/why some of these options are set the way they are. See the pex_binary docs for more about each of these options.

st2/packaging/BUILD.venv

Lines 59 to 67 in 5a2ad11

execution_mode="venv",
layout="zipapp", # zipapp creates a single file, loose and packed create directories
sh_boot=True, # faster startup time (only relevant for unpacking the pex)
include_tools=True, # include pex.tools to populate a venv from the pex
# TODO: To improve docker layer caching, we could break this into 2 pexes
# one w/ include_requirements=False and the other w/ include_requirements=True.
include_requirements=True, # include third party requirements
include_sources=False, # already includes our wheels, skipping wheel-owned sources
venv_hermetic_scripts=False, # do not add -sE to script shebangs

The include_tools=True option is especially important in allowing the pex to generate a virtualenv. We configure the pex to run a python script, and in that script we use PEX_TOOLS to build the virtualenv. The script is configured with the executable option here:

executable="build_st2_venv.py", # included by dependency inferrence

This is the part of that script that uses PEX_TOOLS to build the venv:

env = {"PEX_TOOLS": "1"}
cmd = [
get_pex_path(),
"venv",
"--force", # remove and replace the venv if it exists
"--non-hermetic-scripts", # do not add -sE to python shebang
# st2-packages has a note about python symlinks breaking pack install.
# uncomment this if that proves to still be an issue.
# "--copies", # pack install follows python symlinks to find bin dir
"--system-site-packages",
"--compile", # pre-compile all pyc files
"--prompt=st2",
str(st2_venv_path),
]
pretty_cmd = "".join(k + "=" + v + " " for k, v in env.items()) + " ".join(cmd)
print(f"Now running: {pretty_cmd}", file=sys.stderr)
result = subprocess.call(cmd, env=env)

The build_st2_venv.py script actually has access to the all st2 deps. Technically, we could import st2 code as well, but I ran into issues doing that because pants then inferred dependencies on conf files that should not be in the pex. I used oslo_config to extract the [system].base_path from our config file(s) or the env var. I added the bare minimum to accomplish this here. Note that this explicitly supports building the venv before st2 conf files are in place - the code in st2common does not do that.

def get_st2_base_path(args: Optional[List[str]] = None) -> Path:
st2_config_path = (
os.environ.get("ST2_CONFIG_PATH", os.environ.get("ST2_CONF"))
or "/etc/st2/st2.conf"
)
cfg.CONF.register_opts(
[cfg.StrOpt("base_path", default="/opt/stackstorm")], group="system"
)
try:
cfg.CONF(args=args, default_config_files=[st2_config_path], use_env=False)
except cfg.ConfigFilesNotFoundError:
pass
st2_base_path = os.environ.get(
"ST2_SYSTEM__BASE_PATH", cfg.CONF["system"]["base_path"]
)
return Path(st2_base_path)

Finally, the unpacked virtualenv ends up with copies of the build_st2_venv.py script thanks to how pants runs pex. There isn't a good way to configure that behavior, so I added some logic to remove the script immediately after unpacking the venv.

def tidy_venv(st2_venv_path: Path) -> None:
"""Clean up and remove this script from the venv.
Unfortunately, the way pants uses pex, this script ends up in the venv.
"""
for path in (st2_venv_path / "lib").glob("python*"):
script_path = path / "site-packages" / "packaging" / "build_st2_venv.py"
if script_path.exists():
script_path.unlink()
script_path = path / "site-packages" / "__pex_executable__.py"
if script_path.exists():
script_path.unlink()
# and remove the reference to this script
main_path = st2_venv_path / "__main__.py"
main_path.write_text(main_path.read_text().replace("__pex_executable__", ""))

I believe the st2-packages repo also adds another metadata file in /opt/stackstorm/st2, but we can add that later if we actually need it. This script will be the place to add any st2-specific metadata files to the venv.

The last thing to point out is how the BUILD metadata "parametrizes" the pex_binary BUILD metadata so that there is a separate pex file for each python minor version we support (python 3.8, 3.9, 3.10, and 3.11). I tried several ways to add this parametrization before landing on this way, which feels the most concise / most grokable to me. This calls a custom function that defines the parametrization with as little repetition as possible:

st2/packaging/BUILD.venv

Lines 68 to 72 in 5a2ad11

# 1 parametrize group per python minor version in [DEFAULT].st2_interpreter_constraints in pants.toml
**_pex_py3("8", constraint="CPython>=3.8.1,<3.9"),
**_pex_py3("9"),
**_pex_py3("10"),
**_pex_py3("11"),

This is our function that calls the pants parametrize() function. The first arg defines the name of the parametrization group (py38, py39, py310, py311). Then the output_path determines where under dist/ the pex package will be placed once built. And the interpreter_constraints option narrows our constraints to one minor version of python.

st2/packaging/BUILD.venv

Lines 25 to 32 in 5a2ad11

def _pex_py3(minor: str, constraint: str = ""):
if not constraint:
constraint = f"CPython==3.{minor}.*"
return parametrize(
f"py3{minor}",
output_path=f"${{spec_path_normalized}}/st2-py3{minor}.pex",
interpreter_constraints=[constraint],
)

Building the pex packages

Finally, to build all pex packages (which end up under dist/packaging/), run:

pants package packaging/::

Or to build just the one for python 3.11 (which ends up as dist/packaging/st2-py311.pex), run:

pants package packaging:st2.pex@parametrize=py311

Then to use the pex, just run it (sudo might not be required if your user has write permissions in /opt/stackstorm):

sudo dist/packaging/st2-py311.pex

Now we can treat st2.pex as a self-extracting venv installer.

Use a preamble file, which pex executes before its bootstrap code,
to make the pex just build /opt/stackstorm/st2 instead of teaching
all the installers what a pex-tools are.
Other packaging metadata will go in packaging/BUILD later.
@cognifloyd cognifloyd added this to the pants milestone Feb 27, 2025
@cognifloyd cognifloyd self-assigned this Feb 27, 2025
@pull-request-size pull-request-size bot added the size/L PR that changes 100-499 lines. Requires some effort to review. label Feb 27, 2025
The preamble method was hacky and had some uncomfortable sharp edges.
The pex cache does not know about the preamble which can break running
the pex to unpack itself in some cases (like if you use PEX_TOOLS to
inspect the pex before unpacking it).

So, switch to a python script that will run when the pex is executed.
This script has access to the full venv, thanks to pex, before we have
even unpacked it. So, we can use deps like oslo.config safely.
However, using a pex entry_point/script/executable causes the script
to be included in the extracted venv. If we import any st2 code, then
that can make pants add other sources or conf files that we do not
want in the pex (other than via the wheels of our sources). So, I left
a note warning against importing st2 code.
@cognifloyd cognifloyd marked this pull request as ready for review February 27, 2025 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement pantsbuild size/L PR that changes 100-499 lines. Requires some effort to review. st2-packages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant