Enable parallel logging for private logging servers with high latency #18967

OlfwayAdbayIgbay · 2023-11-08T08:45:14Z

Description & Motivation

I am using a private MLFlow server, which has a very high latency. This results in my training process to be stuck in logging 99% of the time.

In my mind, this should be easy to fix - simply open a thread in parallel to the trainer which does the logging in the background, while the trainer keeps training.

Pitch

Add a trainer flag "log_in_parallel" which detaches the logging from the training process, and lets it train before the logging is finished.

Alternatives

An alternative would be to store all the logging data offline and only synchronize it in predetermined intervals, such as every 10 epochs.

Additional context

99% is only a slight hyperbole, this is what my gpu usage looks like:

cc @Borda

awaelchli · 2023-11-08T13:26:08Z

Hey @OlfwayAdbayIgbay

In my opinion, this should be discussed and addressed by MLFlow directly. Other logging frameworks already do that. It is not something that the training framework (in this case Lightning) should implement in my opinion. The goal of lightning is only to provide simple wrappers around the loggers to standardize the interface. If a feature is present in one logger and not the other, that's something the user has to consider when choosing the integration.

OlfwayAdbayIgbay added feature Is an improvement or enhancement needs triage Waiting to be triaged by maintainers labels Nov 8, 2023

awaelchli added discussion In a discussion stage logger: mlflow and removed needs triage Waiting to be triaged by maintainers labels Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable parallel logging for private logging servers with high latency #18967

Enable parallel logging for private logging servers with high latency #18967

OlfwayAdbayIgbay commented Nov 8, 2023 •

edited by github-actions bot

Loading

awaelchli commented Nov 8, 2023

Enable parallel logging for private logging servers with high latency #18967

Enable parallel logging for private logging servers with high latency #18967

Comments

OlfwayAdbayIgbay commented Nov 8, 2023 • edited by github-actions bot Loading

Description & Motivation

Pitch

Alternatives

Additional context

awaelchli commented Nov 8, 2023

OlfwayAdbayIgbay commented Nov 8, 2023 •

edited by github-actions bot

Loading