Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition GeoIPS settings to a config file #782

Open
biosafetylvl5 opened this issue Sep 16, 2024 · 0 comments
Open

Transition GeoIPS settings to a config file #782

biosafetylvl5 opened this issue Sep 16, 2024 · 0 comments
Assignees

Comments

@biosafetylvl5
Copy link
Collaborator

biosafetylvl5 commented Sep 16, 2024

Description

  • GeoIPS current uses environment variables to set settings
  • This issue proposes to supplant environmental variables as the default option for setting settings
  • Environmental variables will still be exist, and function to override options specified in config files
  • Changes will be backward compatible.

Background and Motivation

  • Environmental variables are used to set settings via base_paths.py
  • base_paths.py exports those into a dictionary for easy access by the rest of the codebase
  • Not all of the codebase interacts with the dictionary directly.
  • At the time of writing, there are 986 different uses of environmental variables in the code outside of base_paths.py. Not accounting for times variables are accessed via os or in plugin repos, this is their distribution by variable:
$GEOIPS_OPERATIONAL_USER
	 occurs in the geoips code base 2 times
$GEOIPS_OUTDIRS
	 occurs in the geoips code base 68 times
$GEOIPS_PACKAGES_DIR
	 occurs in the geoips code base 456 times
$GEOIPS_BASEDIR
	 occurs in the geoips code base 13 times
$GEOIPS_TESTDATA_DIR
	 occurs in the geoips code base 410 times
$GEOIPS_DEPENDENCIES_DIR
	 occurs in the geoips code base 17 times
$HOME
	 occurs in the geoips code base 14 times
$GEOIPS_VERS
	 occurs in the geoips code base 5 times
$TCWWW
	 occurs in the geoips code base 1 times

Environmental variables are used in a lot of places. Here are a few examples of how environmental variables are used outside of base_paths.py:

  1. Paths in python files:tests/unit_tests/commandline/test_geoips_run.py: "$GEOIPS_TESTDATA_DIR/test_data_noaa_aws/data/goes16/20200918/1950/*",
  2. Paths in shell files: tests/utils/copy_diffs_for_eval.sh:echo " $GEOIPS_TESTDATA_DIR/$repo_name"
  3. In the CI: .github/workflows/new-brassy-note.yaml: echo "Version is $GEOIPS_VERS"
  4. In the Docs: docs/source/starter/installation.rst: if [[ "$GEOIPS_VERS" == "" ]]; then
  5. In Dockerfiles: Dockerfile: && mkdir -p $GEOIPS_OUTDIRS
  6. In comments: geoips/plugins/modules/filename_formatters/tc_clean_fname.py: Base directory, defaults to $TCWWW.
  7. In console output: utils/memtrack.sh: echo "Making $GEOIPS_OUTDIRS/memory_logs"

This list was crudely generated using grep and wc with this script:

declare -a arr=('BASE_PATH' 'GEOIPS_DOCS_URL' 'GEOIPS_OPERATIONAL_USER' 'GEOIPS_OUTDIRS' 'GEOIPS_PACKAGES_DIR' 'GEOIPS_BASEDIR' 'GEOIPS_TESTDATA_DIR' 'GEOIPS_DEPENDENCIES_DIR' 'PRESECTORED_DATA_PATH' 'PREREAD_DATA_PATH' 'PREREGISTERED_DATA_PATH' 'PRECALCULATED_DATA_PATH' 'CLEAN_IMAGERY_PATH' 'ANNOTATED_IMAGERY_PATH' 'FINAL_DATA_PATH' 'PREGENERATED_GEOLOCATION_PATH' 'GEOIPS_COPYRIGHT' 'GEOIPS_COPYRIGHT_ABBREVIATED' 'GEOIPS_RCFILE' 'TC_TEMPLATE' 'DEFAULT_QUEUE' 'BOXNAME' 'HOME' 'SCRATCH' 'LOCALSCRATCH' 'SHAREDSCRATCH' 'GEOIPS_ANCILDAT' 'GEOIPS_ANCILDAT_AUTOGEN' 'LOGDIR' 'GEOIPSDATA' 'GEOIPS_VERS' 'TCWWW' 'TCWWW_URL' 'PUBLICWWW' 'PUBLICWWW_URL' 'PRIVATEWWW' 'PRIVATEWWW_URL' 'TC_DECKS_DB' 'TC_DECKS_DIR')

for env_var in "${arr[@]}"
do
	number_times_used="$(grep -rs $env_var | grep -v '.pyc' | wc -l)"
	echo $env_var 
	echo -e "\t occurs in the geoips code base $number_times_used times"
done
for env_var in "${arr[@]}"
do
	number_times_used="$(grep -rs \$$env_var | grep -v '.pyc' | wc -l)"
	echo -e "\$$env_var" 
	echo -e "\t occurs in the geoips code base $number_times_used times"
done

The full output of this script is included below.

Config Files v. Environmental Variables

  1. Config files are stored on disk, easily copied, and versioned. Environment variables are tied to the runtime and can be lost when the environment resets.
  2. Config files support comments and are human-readable. Environment variables lack inline documentation and require external reference.
  3. Config files can store secrets securely, and not share them with any other processes running in the same session.
  4. Config files can be tracked in version control, providing a change history. Environment variables lack this capability natively.
  5. Config files are easy to swap. Managing environment variables across multiple environments is made easier by auxiliary files that basically become config files.
  6. Config files organize multiple settings efficiently. Environment variables become hard to manage as their number grows.
  7. Config files + environmental variables allow one-time overwriting and persistence to go hand-in-hand.

Alternative Solutions

  • Continue to use the environmental variables, change nothing.

Proposed plan:

  1. Refactor base_paths.py
  2. Draft config file
  3. Decide where config files should go (see https://github.com/tox-dev/platformdirs)
  4. Refactor base_paths.py to read from a config file, set environmental variables on runtime
  5. Add geoips command to set environment variables
  6. Refactor code base to get variables from config file
  7. Convert shell scripts to use the geoips set environmental variables command
  8. Stop setting environmental variables via base_paths.py, only read from a config file and use file value if env variable not set.
  9. Update docs to point at config files instead of instructing users to set env variables
BASE_PATH
	 occurs in the geoips code base 16 times
GEOIPS_DOCS_URL
	 occurs in the geoips code base 3 times
GEOIPS_OPERATIONAL_USER
	 occurs in the geoips code base 6 times
GEOIPS_OUTDIRS
	 occurs in the geoips code base 126 times
GEOIPS_PACKAGES_DIR
	 occurs in the geoips code base 537 times
GEOIPS_BASEDIR
	 occurs in the geoips code base 22 times
GEOIPS_TESTDATA_DIR
	 occurs in the geoips code base 489 times
GEOIPS_DEPENDENCIES_DIR
	 occurs in the geoips code base 41 times
PRESECTORED_DATA_PATH
	 occurs in the geoips code base 4 times
PREREAD_DATA_PATH
	 occurs in the geoips code base 4 times
PREREGISTERED_DATA_PATH
	 occurs in the geoips code base 4 times
PRECALCULATED_DATA_PATH
	 occurs in the geoips code base 7 times
CLEAN_IMAGERY_PATH
	 occurs in the geoips code base 8 times
ANNOTATED_IMAGERY_PATH
	 occurs in the geoips code base 13 times
FINAL_DATA_PATH
	 occurs in the geoips code base 6 times
PREGENERATED_GEOLOCATION_PATH
	 occurs in the geoips code base 5 times
GEOIPS_COPYRIGHT
	 occurs in the geoips code base 15 times
GEOIPS_COPYRIGHT_ABBREVIATED
	 occurs in the geoips code base 10 times
GEOIPS_RCFILE
	 occurs in the geoips code base 6 times
TC_TEMPLATE
	 occurs in the geoips code base 6 times
DEFAULT_QUEUE
	 occurs in the geoips code base 5 times
BOXNAME
	 occurs in the geoips code base 2 times
HOME
	 occurs in the geoips code base 19 times
SCRATCH
	 occurs in the geoips code base 11 times
LOCALSCRATCH
	 occurs in the geoips code base 5 times
SHAREDSCRATCH
	 occurs in the geoips code base 4 times
GEOIPS_ANCILDAT
	 occurs in the geoips code base 21 times
GEOIPS_ANCILDAT_AUTOGEN
	 occurs in the geoips code base 5 times
LOGDIR
	 occurs in the geoips code base 4 times
GEOIPSDATA
	 occurs in the geoips code base 4 times
GEOIPS_VERS
	 occurs in the geoips code base 35 times
TCWWW
	 occurs in the geoips code base 21 times
TCWWW_URL
	 occurs in the geoips code base 4 times
PUBLICWWW
	 occurs in the geoips code base 11 times
PUBLICWWW_URL
	 occurs in the geoips code base 4 times
PRIVATEWWW
	 occurs in the geoips code base 11 times
PRIVATEWWW_URL
	 occurs in the geoips code base 4 times
TC_DECKS_DB
	 occurs in the geoips code base 10 times
TC_DECKS_DIR
	 occurs in the geoips code base 6 times
$BASE_PATH
	 occurs in the geoips code base 0 times
$GEOIPS_DOCS_URL
	 occurs in the geoips code base 0 times
$GEOIPS_OPERATIONAL_USER
	 occurs in the geoips code base 2 times
$GEOIPS_OUTDIRS
	 occurs in the geoips code base 68 times
$GEOIPS_PACKAGES_DIR
	 occurs in the geoips code base 456 times
$GEOIPS_BASEDIR
	 occurs in the geoips code base 13 times
$GEOIPS_TESTDATA_DIR
	 occurs in the geoips code base 410 times
$GEOIPS_DEPENDENCIES_DIR
	 occurs in the geoips code base 17 times
$PRESECTORED_DATA_PATH
	 occurs in the geoips code base 0 times
$PREREAD_DATA_PATH
	 occurs in the geoips code base 0 times
$PREREGISTERED_DATA_PATH
	 occurs in the geoips code base 0 times
$PRECALCULATED_DATA_PATH
	 occurs in the geoips code base 0 times
$CLEAN_IMAGERY_PATH
	 occurs in the geoips code base 0 times
$ANNOTATED_IMAGERY_PATH
	 occurs in the geoips code base 0 times
$FINAL_DATA_PATH
	 occurs in the geoips code base 0 times
$PREGENERATED_GEOLOCATION_PATH
	 occurs in the geoips code base 0 times
$GEOIPS_COPYRIGHT
	 occurs in the geoips code base 0 times
$GEOIPS_COPYRIGHT_ABBREVIATED
	 occurs in the geoips code base 0 times
$GEOIPS_RCFILE
	 occurs in the geoips code base 0 times
$TC_TEMPLATE
	 occurs in the geoips code base 0 times
$DEFAULT_QUEUE
	 occurs in the geoips code base 0 times
$BOXNAME
	 occurs in the geoips code base 0 times
$HOME
	 occurs in the geoips code base 14 times
$SCRATCH
	 occurs in the geoips code base 0 times
$LOCALSCRATCH
	 occurs in the geoips code base 0 times
$SHAREDSCRATCH
	 occurs in the geoips code base 0 times
$GEOIPS_ANCILDAT
	 occurs in the geoips code base 0 times
$GEOIPS_ANCILDAT_AUTOGEN
	 occurs in the geoips code base 0 times
$LOGDIR
	 occurs in the geoips code base 0 times
$GEOIPSDATA
	 occurs in the geoips code base 0 times
$GEOIPS_VERS
	 occurs in the geoips code base 5 times
$TCWWW
	 occurs in the geoips code base 1 times
$TCWWW_URL
	 occurs in the geoips code base 0 times
$PUBLICWWW
	 occurs in the geoips code base 0 times
$PUBLICWWW_URL
	 occurs in the geoips code base 0 times
$PRIVATEWWW
	 occurs in the geoips code base 0 times
$PRIVATEWWW_URL
	 occurs in the geoips code base 0 times
$TC_DECKS_DB
	 occurs in the geoips code base 0 times
$TC_DECKS_DIR
	 occurs in the geoips code base 0 times

When only the geoips/ dir (aka the real codebase):

BASE_PATH
	 occurs in the geoips code base 4 times
GEOIPS_DOCS_URL
	 occurs in the geoips code base 3 times
GEOIPS_OPERATIONAL_USER
	 occurs in the geoips code base 4 times
GEOIPS_OUTDIRS
	 occurs in the geoips code base 47 times
GEOIPS_PACKAGES_DIR
	 occurs in the geoips code base 11 times
GEOIPS_BASEDIR
	 occurs in the geoips code base 9 times
GEOIPS_TESTDATA_DIR
	 occurs in the geoips code base 8 times
GEOIPS_DEPENDENCIES_DIR
	 occurs in the geoips code base 6 times
PRESECTORED_DATA_PATH
	 occurs in the geoips code base 4 times
PREREAD_DATA_PATH
	 occurs in the geoips code base 4 times
PREREGISTERED_DATA_PATH
	 occurs in the geoips code base 4 times
PRECALCULATED_DATA_PATH
	 occurs in the geoips code base 7 times
CLEAN_IMAGERY_PATH
	 occurs in the geoips code base 4 times
ANNOTATED_IMAGERY_PATH
	 occurs in the geoips code base 9 times
FINAL_DATA_PATH
	 occurs in the geoips code base 4 times
PREGENERATED_GEOLOCATION_PATH
	 occurs in the geoips code base 5 times
GEOIPS_COPYRIGHT
	 occurs in the geoips code base 11 times
GEOIPS_COPYRIGHT_ABBREVIATED
	 occurs in the geoips code base 6 times
GEOIPS_RCFILE
	 occurs in the geoips code base 4 times
TC_TEMPLATE
	 occurs in the geoips code base 4 times
DEFAULT_QUEUE
	 occurs in the geoips code base 5 times
BOXNAME
	 occurs in the geoips code base 2 times
HOME
	 occurs in the geoips code base 7 times
SCRATCH
	 occurs in the geoips code base 11 times
LOCALSCRATCH
	 occurs in the geoips code base 5 times
SHAREDSCRATCH
	 occurs in the geoips code base 4 times
GEOIPS_ANCILDAT
	 occurs in the geoips code base 9 times
GEOIPS_ANCILDAT_AUTOGEN
	 occurs in the geoips code base 5 times
LOGDIR
	 occurs in the geoips code base 4 times
GEOIPSDATA
	 occurs in the geoips code base 4 times
GEOIPS_VERS
	 occurs in the geoips code base 9 times
TCWWW
	 occurs in the geoips code base 17 times
TCWWW_URL
	 occurs in the geoips code base 4 times
PUBLICWWW
	 occurs in the geoips code base 7 times
PUBLICWWW_URL
	 occurs in the geoips code base 4 times
PRIVATEWWW
	 occurs in the geoips code base 7 times
PRIVATEWWW_URL
	 occurs in the geoips code base 4 times
TC_DECKS_DB
	 occurs in the geoips code base 10 times
TC_DECKS_DIR
	 occurs in the geoips code base 6 times
$BASE_PATH
	 occurs in the geoips code base 0 times
$GEOIPS_DOCS_URL
	 occurs in the geoips code base 0 times
$GEOIPS_OPERATIONAL_USER
	 occurs in the geoips code base 0 times
$GEOIPS_OUTDIRS
	 occurs in the geoips code base 9 times
$GEOIPS_PACKAGES_DIR
	 occurs in the geoips code base 2 times
$GEOIPS_BASEDIR
	 occurs in the geoips code base 1 times
$GEOIPS_TESTDATA_DIR
	 occurs in the geoips code base 0 times
$GEOIPS_DEPENDENCIES_DIR
	 occurs in the geoips code base 0 times
$PRESECTORED_DATA_PATH
	 occurs in the geoips code base 0 times
$PREREAD_DATA_PATH
	 occurs in the geoips code base 0 times
$PREREGISTERED_DATA_PATH
	 occurs in the geoips code base 0 times
$PRECALCULATED_DATA_PATH
	 occurs in the geoips code base 0 times
$CLEAN_IMAGERY_PATH
	 occurs in the geoips code base 0 times
$ANNOTATED_IMAGERY_PATH
	 occurs in the geoips code base 0 times
$FINAL_DATA_PATH
	 occurs in the geoips code base 0 times
$PREGENERATED_GEOLOCATION_PATH
	 occurs in the geoips code base 0 times
$GEOIPS_COPYRIGHT
	 occurs in the geoips code base 0 times
$GEOIPS_COPYRIGHT_ABBREVIATED
	 occurs in the geoips code base 0 times
$GEOIPS_RCFILE
	 occurs in the geoips code base 0 times
$TC_TEMPLATE
	 occurs in the geoips code base 0 times
$DEFAULT_QUEUE
	 occurs in the geoips code base 0 times
$BOXNAME
	 occurs in the geoips code base 0 times
$HOME
	 occurs in the geoips code base 3 times
$SCRATCH
	 occurs in the geoips code base 0 times
$LOCALSCRATCH
	 occurs in the geoips code base 0 times
$SHAREDSCRATCH
	 occurs in the geoips code base 0 times
$GEOIPS_ANCILDAT
	 occurs in the geoips code base 0 times
$GEOIPS_ANCILDAT_AUTOGEN
	 occurs in the geoips code base 0 times
$LOGDIR
	 occurs in the geoips code base 0 times
$GEOIPSDATA
	 occurs in the geoips code base 0 times
$GEOIPS_VERS
	 occurs in the geoips code base 0 times
$TCWWW
	 occurs in the geoips code base 1 times
$TCWWW_URL
	 occurs in the geoips code base 0 times
$PUBLICWWW
	 occurs in the geoips code base 0 times
$PUBLICWWW_URL
	 occurs in the geoips code base 0 times
$PRIVATEWWW
	 occurs in the geoips code base 0 times
$PRIVATEWWW_URL
	 occurs in the geoips code base 0 times
$TC_DECKS_DB
	 occurs in the geoips code base 0 times
$TC_DECKS_DIR
	 occurs in the geoips code base 0 times
@biosafetylvl5 biosafetylvl5 self-assigned this Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant