Skip to content

Conversation

@topherinternational
Copy link
Contributor

This change adds support for mongodb+srv:// URIs when the MongoHook interprets the Connection parameters, while maintaining the existing option of an srv query parameter.

Current Behavior

If you are:

  • using the MongoHook AND
  • using the SRV/seed list feature in your MongoDB cluster AND
  • creating your Connection from a URI string starting with mongodb+srv://

you need to manually change your connection URI by adding a query parameter srv=true (the URI scheme is in fact ignored completely).

If you do not add the query param, the hook will munge the scheme to mongodb://, attempt to use the SRV locator as a direct service location, and fail to connect properly.

New Behavior

After this change:

  • Connections created with URIs beginning with mongodb+srv://, or with conn_type=mongodb+srv, will cause the MongoHook to interpret the connection as an SRV locator
  • srv=true forces SRV behavior, as before

Tests have been added to verify the new behavior with both uri and conn_type arguments.

Justification

My organization uses the Mongo-supported form of mongodb+srv:// which we get from the service vendor who hosts our MongoDB instances. (The introduction of SRV to Mongo was blogged here).

The lack of support for the standard URI scheme means that every time we add or change a MongoDB URL in our Airflow connections, the Ops team can't simply copy the URL from the service vendor to a config file or secret; they have to manually add the query param, or rope in the relevant dev team to do so (or worse, we all have to respond to a production failure and then decide under pressure who's going to modify the URL).

Merging this change will make coordinating Mongo clients in our Airflow stack simpler and more robust.

@boring-cyborg
Copy link

boring-cyborg bot commented Aug 10, 2024

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@eladkal
Copy link
Contributor

eladkal commented Aug 10, 2024

Connections created with URIs beginning with mongodb+srv://, or with conn_type=mongodb+srv, will cause the MongoHook to interpret the connection as an SRV locator

Please let's not do that. Adding a new connection type will cause more confusion (consider that we have dozens of providers, we must make sure we don't overwhelm users with options).

Specifically for the mongo case, can you please explain further why srv=True isn't enough to deduce about the URI? Is it because the current behavior isn't wire as you expect? We can introduce breaking changes if we think it's best (preferably with deprecating first)

Also, I see that srv=True needs to be defined in the extra which makes it less visable. We can set it as boolean field in the connection form like we do for Snowflake:

"insecure_mode": BooleanField(
label=lazy_gettext("Insecure mode"), description="Turns off OCSP certificate checks"
),

@topherinternational
Copy link
Contributor Author

@eladkal ah I see what you are concerned about...in my org we use environment variables to pass connection params, so we don't use the connections UI at all. Yes I totally agree that we don't want to add connection types that increase confusion.

My aim here is to prevent future devs (including my own) from hitting the same issue we did, naively passing a "mongodb+srv://" URI as an env var and getting a cryptic pymongo error. It would be better if we got a legible error from the hook or operator code.

Would it be an acceptable approach to tighten up the connection validation inside MongoHook? I.e., if the connection.conn_type isn't mongo raise an error, if the conn_type has "srv" in it error/warn the user about using srv=true? (Currently the hook doesn't examine the conn_type at all, this seems loose.)

And if so, would that validation be considered a breaking change in the provider? Previously-working connections might fail to validate and thus stop working.

Also, I see that srv=True needs to be defined in the extra which makes it less visable. We can set it as boolean field in the connection form like we do for Snowflake

That is a good idea, I'm not very front-end-y but I might try to do that in a separate MR.

@topherinternational
Copy link
Contributor Author

Closing in favor of #41717.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants