Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable parallel logging for private logging servers with high latency #18967

Open
OlfwayAdbayIgbay opened this issue Nov 8, 2023 · 1 comment
Labels
discussion In a discussion stage feature Is an improvement or enhancement logger: mlflow

Comments

@OlfwayAdbayIgbay
Copy link

OlfwayAdbayIgbay commented Nov 8, 2023

Description & Motivation

I am using a private MLFlow server, which has a very high latency. This results in my training process to be stuck in logging 99% of the time.

In my mind, this should be easy to fix - simply open a thread in parallel to the trainer which does the logging in the background, while the trainer keeps training.

Pitch

Add a trainer flag "log_in_parallel" which detaches the logging from the training process, and lets it train before the logging is finished.

Alternatives

An alternative would be to store all the logging data offline and only synchronize it in predetermined intervals, such as every 10 epochs.

Additional context

99% is only a slight hyperbole, this is what my gpu usage looks like:
image

cc @Borda

@OlfwayAdbayIgbay OlfwayAdbayIgbay added feature Is an improvement or enhancement needs triage Waiting to be triaged by maintainers labels Nov 8, 2023
@awaelchli
Copy link
Contributor

Hey @OlfwayAdbayIgbay

In my opinion, this should be discussed and addressed by MLFlow directly. Other logging frameworks already do that. It is not something that the training framework (in this case Lightning) should implement in my opinion. The goal of lightning is only to provide simple wrappers around the loggers to standardize the interface. If a feature is present in one logger and not the other, that's something the user has to consider when choosing the integration.

@awaelchli awaelchli added discussion In a discussion stage logger: mlflow and removed needs triage Waiting to be triaged by maintainers labels Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion In a discussion stage feature Is an improvement or enhancement logger: mlflow
Projects
None yet
Development

No branches or pull requests

2 participants