Airflow variables and connection management: review and moving forward #4865
Replies: 2 comments 1 reply
-
Thanks for the large degree of context here, I'll provide my thoughts when I have time! |
Beta Was this translation helpful? Give feedback.
-
Thanks for starting this discussion, and linking the previous ones; my question came from knowing that variables set via the Admin UI take lower precedence than environment variables (the opposite of how we'd prefer it to work), and recalling that we had discussed how to handle that in the past but not remembering what our conclusion was!
If I'm understanding correctly that the proposal here is that we only make changes to Airflow variables using the Ansible definitions, ie never through the Admin UI, I have some strong reservations. Particularly for variables like the I agree that maintaining an approval process and history in code for some variables (Airflow connections, contact email, etc) is important. Some of our variables (again, |
Beta Was this translation helpful? Give feedback.
-
At least twice the topic of how best to manage Airflow variables has come up in conversations between me, @stacimc, and @AetherUnbound. Most recently, Staci brought it up in this comment thread on a PR introducing a new variable:
To understand Staci's question, and the general question I'm wanting to resolve in this discussion, recall that Airflow has three distinct methods for managing variables. In order of precedence, they are (order retrieved from this Astronomer.io forum thread):
We do not have a secrets backend defined, so the two options available to us are environment variables and the Airflow UI.
In previous discussions, we've decided to use the metastore, configured through the UI (private infrastructure repository issue). That may well be the best path forward still. However, I wanted to revisit this (the "through the UI" part of that) in light of Staci's question, and because the previous decision was made before we had set up Ansible as a way of interacting with the Airflow API.
A bit of background to set the stage... feel free to skip this if you understand why using a UI to manage application configuration is not always ideal.
Generally speaking, we prefer to configure the environment variables of our services in code. That is part of our Infrastructure as Code (IaC) practice. The primary motivators for IaC for environment variables are:
We use IaC to manage all environment variables/application (or runtime) configuration for the following services:
For Kibana and Elasticsearch, we manage a combination of environment variables and configuration files, so there is precedence and sometimes necessity of keeping a combination of both for various reasons.
We use IaC to manage some variables for Airflow, but not all. Of the methods listed above available for Airflow variable management, we only use environment variables for our IaC managed variables. We manage the following as environment variables for Airflow in our Airflow Ansible role:
getenv
in the catalogue code to find these). I've open an issue to address these.Everything else, including API keys for providers we ingest from, and more recent additions of configuration settings that may have previous gone into environment-variable only things, are all managed through the Airflow UI. The rest of this discussion should assume we've written our code (or updated it in #4863) so that we can always and reliably retrieve variables from the metastore.
Given all three methods of variable configuration can be retrieved through the Metastore
Variable.get
and related APIs, we can still decide how best to manage variables.As noted above we have already decided to use the metastore to configure Airflow Variables, and I don't think we need to revisit that discussion. That is, we certainly do not need to move variables towards configuration only via the environment (the opposite of #4863). Additionally, we do not need to invest in the complexity of the remote Secrets Backends now (regardless of if ever).
Instead, I propose we continue to use the Metastore (in line with previous decisions), but start managing them using the Airflow REST API, via our Ansible role for Airflow.
The Airflow REST API exposes endpoints for managing variables and connections:
Using these endpoints, we can augment our existing Ansible role with the ability to sync variables and connections, even if no other changes exist, and bypass the Airflow restart step in these cases.
We would define connections and variables in the Airflow secrets
group_vars
as a YAML structure, mirroring the REST API's structure. For example, theSILENCED_SLACK_NOTIFICATIONS
variable would be defined like so ingroup_vars
.Connections would look like this:
Because these YAML structures mirror the REST API's interfaces for variables/connections, it is trivial to export our current configuration to this format.
The role tasks would go something like this:
/api/v1/variables
with the variable. The POST request works both for updating an existing variable and to create a new one (provided you always supply the entire body of the POST request, which we will)./api/v1/variables/{{ variable.key }}
for each one.After this, the answer to Staci's original question will be: always manage variables using the Ansible definitions.
@WordPress/openverse-catalog what do y'all think of this? It's an attempt to bring the management of Airflow configuration/secrets, etc, closer in line with the rest of our applications.
Footnotes
Though, whether a change to the environment variables of a service requires review before applying them is up to the Opener making or proposing the change to decide. In some situations (i.e., after a discussion), we might apply changes before review. In others, there is an emergent situation requiring changes from an SRE perspective. Which is to say: review is a notable feature of, but not always a necessary aspect of, environment variable management. Mostly the goal is visibility before or after the fact, irrespective of whether a critical review is required. ↩
Beta Was this translation helpful? Give feedback.
All reactions