-
Notifications
You must be signed in to change notification settings - Fork 26.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Option to set the tracking URI for MLflowCallback. #29032
Feature: Option to set the tracking URI for MLflowCallback. #29032
Conversation
…:seanswyi/transformers into feature/add-mlflow-uri-to-mlflowcallback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding! Changes LGTM
Third-party integrations are maintained by their original contributors, rather than the transformers team. So let's get a review from @noise-field
Sounds good, thanks for the feedback @amyeroberts! I just made another minor change in the docstring and committed it: the default value of |
Is there any way to rerun tests? The failed Some tests failed!
============================= FAILURES SHORT STACK =============================
_ LayoutLMTokenizationTest.test_add_tokens [type (microsoft/layoutlm-base-uncased)] _
self = <huggingface_hub.utils._http.UniqueRequestIdAdapter object at 0x7f75b9f8c730>
request = <PreparedRequest [GET]>, stream = True
timeout = Timeout(connect=10, read=10, total=None), verify = True, cert = None
proxies = OrderedDict([('no', '127.0.0.1,localhost,circleci-internal-outer-build-agent')])
try:
resp = conn.urlopen(
method=request.method,
url=url,
body=request.body,
headers=request.headers,
redirect=False,
assert_same_host=False,
preload_content=False,
decode_content=False,
retries=self.max_retries,
timeout=timeout,
chunked=chunked,
)
...
except (_SSLError, _HTTPError) as e:
if isinstance(e, _SSLError):
# This branch is for urllib3 versions earlier than v1.22
raise SSLError(e, request=request)
elif isinstance(e, ReadTimeoutError):
> raise ReadTimeout(e, request=request)
E requests.exceptions.ReadTimeout: (ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: e8e1debd-e8f7-45eb-b23d-fa63400a0c0b)')
../.pyenv/versions/3.8.12/lib/python3.8/site-packages/requests/adapters.py:532: ReadTimeout
Exited with code exit status 255 |
@seanswyi I can re-run tests for you. I've just set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick turnaround!
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@seanswyi As everything is passing and both @muellerzr and I approve, let's just merge and if @noise-field has any comments we can do a follow-up PR. Thanks again for adding! |
I'm an MLflow maintainer. Found this PR while I'm investigating https://github.com/mlflow-automation/mlflow/actions/runs/7949278234/job/21702921711#step:15:2462. What if we run this code like this? import mlflow
import os
assert "MLFLOW_TRACKING_URI" not in os.environ
mlflow.set_tracking_uri("sqlite:///my.db")
# train
...
# Since MLFLOW_TRACKING_URI is not set, the tracking URI is set to an empty string, which is undesired in this case. |
@harupy Could you elaborate a bit on why you think that would be better? The reason I wrote the initial PR the way I did is because to me it made sense to allow the user to keep Please let me know if I'm misunderstanding or missing something. I haven't been able to fully look into the failed tests you provided but it seems like there's an issue between this PR and the autologging module? |
True, but there may be users who use the code like my snippet (we do something similar in our tests, no |
@harupy If you believe that only calling the function when the environment variable exists is better then how about changing the current code to something like this? if "MLFLOW_TRACKING_URI" in os.environ:
self._tracking_uri = os.environ["MLFLOW_TRACKING_URI"]
logger.debug(f"MLflow tracking URI is set to {self._tracking_uri}")
self._ml_flow.set_tracking_uri(self._tracking_uri)
else:
logger.debug(f"Environment variable `MLFLOW_TRACKING_URI` is not provided and therefore will not be explicitly set.") Just curious, is there any reason why a temporary SQLite DB is used? It seems like the documentation would suggest that setting a tracking URI isn't necessary and that the data will just be stored locally at |
Yeah that looks good to me.
|
…ace#29032) * Added option to set tracking URI for MLflowCallback. * Added option to set tracking URI for MLflowCallback. * Changed to in docstring.
Hi! I'm here because this PR broke my code :) I think there might be a compatibility break here? In my code I call
means that the tracking_uri is getting overriden to be "" because I didn't set the env var. I can change my implementation to be compatible with the change that was merged, but others may have similar issues if their implementation was similar to mine. |
Hi @njbrake! Thanks for the comment, that's something that I missed earlier. The MLflow API seems to provide a Maybe I could incorporate this in the other PR (#29096) that addresses @harupy's previous comment as well? (cc. @amyeroberts) |
@seanswyi Yes please! |
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> Signed-off-by: Arthur Jenoudet <arthur.jenoudet@databricks.com>
* Added option to set tracking URI for MLflowCallback. * Added option to set tracking URI for MLflowCallback. * Changed to in docstring.
What does this PR do?
Previously, the MLflowCallback was only able to set MLflow experiments or runs. This PR adds the option to also set the tracking URI.
Fixes #28961
Before submitting
Pull Request section?
to it if that's the case. Option to set the tracking URI for MLflowCallback. #28961
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.