Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix just argument splitting on pass-through recipes with complex args #4961

Merged
merged 7 commits into from
Sep 23, 2024

Conversation

sarayourfriend
Copy link
Collaborator

@sarayourfriend sarayourfriend commented Sep 19, 2024

Description

This PR fixes a problem we've run into several times, and never really found a suitable ("nice") solution for.

Without these changes, when we pass just's collected arguments lists (+args and *args type things), the underlying bash call that just makes will cause arguments to be split on any space and follow difficult to track rules about quotation mark escaping. Commands like this, were simply impossible, without abstract levels of quote escaping:

ov env DC_USER="airflow"  just exec scheduler airflow dags trigger staging_audio_data_refresh --conf '{"index_suffix": "init"}'

ov just api/test -k "test_image_proxy or test_check_dead_links"

The --conf argument would have its quotes stripped and turn into {"index_suffix": "init"}, passed to the underlying command, which just executes using the default shell (bash in the case of ov). Bash would interpret this as two separate positional arguments, {index_suffix: and init}, with quotation marks stripped following bash's quotation handling logic.

By using just's positional-arguments setting on these recipes, we're able to pass arguments through using bash's argument array expansion, which preserves quoted arguments as single individual arguments, rather than splitting on all strings.

With these changes, the commands above work perfectly and exactly as expected. The quoted arguments are preserved as a single argument, rather than split when just compiles the recipe template which drops the argument groups.

I chose not to enable positional-arguments globally in each justfile, despite using it almost everywhere with gather args, because I didn't want to cause side effects on any recipes that weren't written with that particular setting in mind. Plus, setting it explicitly on each recipe should make it easier for folks to understand why those recipes behave the way they do compared to others (or a new one they are writing), and will give them something easy to search for in the just documentation. The global setting would be slightly harder to find and know that it was the effective difference.

Testing Instructions

Check out main and try to run the commands above to see what kind of error you get back. This helps make sure you know what goes wrong without these changes. Then, check out this branch, and run the commands above again, and they should work.

CI should pass, which is a decent indicator that I haven't broken anything 😅

Make sure to take an extra look at the recipes that use positional arguments for a final gather argument, but that also have individual named just arguments. For example, the k6/run recipe. The $@ split should capture only the positional arguments that aren't handled as regular just template variables.

You can test this by echoing the final command instead of running it, to see the expanded arguments. For example, the diff below yields this output when running the k6/run recipe (via the root justfile alias k6):

$ ov j k6 frontend static-en --summary-export ./k6-summary.json -e scenario_vus=1 -e scenario_iterations=1 --console-output ./k6-summary.txt
pnpm run build --input src/frontend/static-en.test.ts
k6 run --summary-export ./k6-summary.json -e scenario_vus=1 -e scenario_iterations=1 --console-output ./k6-summary.txt ./dist/frontend/static-en.test.js

Here we can see that the pnpm run build command has the correct namespace and test-file interpolated. Likewise in the k6 run command. The k6 run command also needs to have the additional CLI args passed through between run and the script location. This is achieved with "${@:3}", which takes a slice of the $@ array starting at the 4th position forward. Recall that bash automatically drops from $@ what would be assigned to $0 (that is, the command that was run), unless you address it directly. However, the index still remains. If you echo "${@:0}" and `"${@:3}" in the recipe, you will see these lines respectively

run frontend static-en --summary-export ./k6-summary.json -e scenario_vus=1 -e scenario_iterations=1 --console-output ./k6-summary.txt

--summary-export ./k6-summary.json -e scenario_vus=1 -e scenario_iterations=1 --console-output ./k6-summary.txt

As you can see, while $@ on its own does not include argv0, when slicing it as an array, you still need to consider it when deciding the index to slice from. As such, the third argument passed to just will be in index 3 (position 4) of $@ when sliced, so we slice from 3 onward.

Here is a diff you can use to work off and explore the concepts above to better understand $@ if you would like
diff --git a/packages/js/k6/justfile b/packages/js/k6/justfile
index dc6c2e609..1d390b056 100644
--- a/packages/js/k6/justfile
+++ b/packages/js/k6/justfile
@@ -1,5 +1,7 @@
 # Build and run K6 script by namespace and scenario
 [positional-arguments]
-run namespace scenario *extra_args:
-    pnpm run build --input src/{{ namespace }}/{{ scenario }}.test.ts
-    k6 run "${@:3}" ./dist/{{ namespace }}/{{ scenario }}.test.js
+@run namespace scenario *extra_args:
+    echo "${@:0}"
+    echo "${@:3}"
+    echo pnpm run build --input src/{{ namespace }}/{{ scenario }}.test.ts
+    echo k6 run "${@:3}" ./dist/{{ namespace }}/{{ scenario }}.test.js

Checklist

  • My pull request has a descriptive title (not a vague title likeUpdate index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.
  • I ran the DAG documentation generator (ov just catalog/generate-docs for catalog
    PRs) or the media properties generator (ov just catalog/generate-docs media-props
    for the catalog or ov just api/generate-docs for the API) where applicable.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@sarayourfriend sarayourfriend added 🟧 priority: high Stalls work on the project or its dependents 🛠 goal: fix Bug fix 🤖 aspect: dx Concerns developers' experience with the codebase 🧱 stack: infra Related to the Terraform config and other infrastructure labels Sep 19, 2024
@sarayourfriend sarayourfriend requested review from a team as code owners September 19, 2024 23:53
@sarayourfriend sarayourfriend requested review from AetherUnbound, stacimc, obulat and dhruvkb and removed request for a team September 19, 2024 23:53
@openverse-bot openverse-bot added 🧱 stack: api Related to the Django API 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧱 stack: documentation Related to Sphinx documentation 🧱 stack: frontend Related to the Nuxt frontend 🧱 stack: mgmt Related to repo management and automations labels Sep 19, 2024
@sarayourfriend sarayourfriend removed 🧱 stack: api Related to the Django API 🧱 stack: frontend Related to the Nuxt frontend 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧱 stack: mgmt Related to repo management and automations 🧱 stack: documentation Related to Sphinx documentation labels Sep 19, 2024
@openverse-bot openverse-bot added 🧱 stack: api Related to the Django API 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧱 stack: documentation Related to Sphinx documentation 🧱 stack: frontend Related to the Nuxt frontend 🧱 stack: mgmt Related to repo management and automations labels Sep 19, 2024
Comment on lines -7 to -9
IS_PROD := env_var_or_default("PROD", env_var_or_default("IS_PROD", ""))
# `PROD_ENV` can be "ingestion_server" or "catalog"
PROD_ENV := env_var_or_default("PROD_ENV", "")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These variables were used long ago when we used to deploy the ingestion server and catalog using just. We don't do that any more, and they just add noise/complexity to this file now, so are good to remove.

p package script +args="":
pnpm --filter {{ package }} run {{ script }} {{ args }}
[positional-arguments]
p package script *args:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A handful of scripts used +args="", which works, but is technically redundant. *args expresses the same thing (optional variadic arguments to gather), and perhaps much more intentionally, especially to relay that they are optional, rather than the + prefix which indicates at least one is required (a requirement nullified by the default blank that is passed).

@sarayourfriend
Copy link
Collaborator Author

Hmmm, I don't know why those render-release-drafter scripts fail in CI. They pass locally 🤔. I thought at first it was because the sh implementation in the GitHub runner was different than the one in ov, but setting it to bash hasn't fixed it. I will give it a deeper look now.

Copy link
Collaborator

@stacimc stacimc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't fully tested everything yet, but on an ov just recreate, although recreate runs apparently successfully, I see this error:

just up "--force-recreate --build"
env COMPOSE_PROFILES="api,ingestion_server,frontend,catalog" docker compose -f docker-compose.yml "$@"
unknown flag: --force-recreate --build
error: Recipe `dc` failed on line 204 with exit code 16

I also get errors trying to run the catalog tests, which I was able to fix by removing the single quotes in L106 for the _mount-test recipe.

@sarayourfriend
Copy link
Collaborator Author

@stacimc recreate fixed in e834229 and catalog/test fixed in f97a5e6

It's surprising the lengths we ended up going to work around this in just!

Copy link

Full-stack documentation: https://docs.openverse.org/_preview/4961

Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again.

You can check the GitHub pages deployment action list to see the current status of the deployments.

@stacimc
Copy link
Collaborator

stacimc commented Sep 20, 2024

This looks so close -- I've not tested everything but everything I typically use, including passing additional arguments to the various test commands, the documentation recipes for generating new IPs etc... The only trouble I'm running into is with ov j catalog/generate-docs:

airflow command error: argument GROUP_OR_COMMAND: invalid choice: "bash -c 'python catalog/utilities/dag_doc_gen/dag_doc_generation.py && chmod 666 /opt/airflow/catalog/utilities/dag_doc_gen/DAGs.md'" (choose from 'cheat-sheet', 'config', 'connections', 'dag-processor', 'dags', 'db', 'info', 'jobs', 'kerberos', 'plugins', 'pools', 'providers', 'roles', 'rotate-fernet-key', 'scheduler', 'standalone', 'sync-perm', 'tasks', 'triggerer', 'users', 'variables', 'version', 'webserver'), see help above.

@sarayourfriend
Copy link
Collaborator Author

Yes, it's failed in CI as well. I'm working on a fix... shouldn't be long, sorry! I will draft this until CI is passing.

@sarayourfriend sarayourfriend marked this pull request as draft September 20, 2024 01:39
@sarayourfriend sarayourfriend marked this pull request as ready for review September 20, 2024 02:33
@sarayourfriend
Copy link
Collaborator Author

Okay, this is ready to go now, hopefully!

Copy link
Collaborator

@stacimc stacimc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! What an improvement, I hope this will save us many headaches in the future :)

@sarayourfriend sarayourfriend merged commit 081d174 into main Sep 23, 2024
58 checks passed
@sarayourfriend sarayourfriend deleted the fix/justfile-recipe-nesting-hell branch September 23, 2024 22:49
ipython *args: up-deps
env DC_USER="airflow" just ../run \
--workdir /opt/airflow/catalog/dags \
{{ SERVICE }} \
bash -c \'ipython {{ args }}\'
bash -c "ipython ${@:2}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarayourfriend why was ${@:2} used here? There doesn't appear to be a positional argument that we need to ignore, and it seems like the first argument after the command provided would be swallowed (for instance ov just catalog/ipython --TerminalInteractiveShell.editing_mode=vi would no longer work as expected because the editing_mode argument would be ignored). Was that intentional?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖 aspect: dx Concerns developers' experience with the codebase 🛠 goal: fix Bug fix 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: api Related to the Django API 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧱 stack: documentation Related to Sphinx documentation 🧱 stack: frontend Related to the Nuxt frontend 🧱 stack: infra Related to the Terraform config and other infrastructure 🧱 stack: mgmt Related to repo management and automations
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants