-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhance] Distinguish training/testing saving directory #1036
[Enhance] Distinguish training/testing saving directory #1036
Conversation
mmengine/runner/runner.py
Outdated
'the value of task type must be one of ["train", "val", "test"]') | ||
previous_log_dir = self._log_dir | ||
self._log_dir = '_'.join([self._log_dir, task_type]) | ||
shutil.move(previous_log_dir, self._log_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, will the FileHandler
in the logger still save logs to the old log file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can fix this if it should modify the log file in FileHandles, and I wonder whether the save_dir in visualizer should modify in the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, all opened files should be closed and redirected to the new directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we close the files opened by TensorboardVisBackend
, when we want to use it once again, we must to create a new instance of TensorboardVisBackend
and it may create two events.out file in the log dir. Is this what we expect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emm, resuming a visualization seems weird and dirty more or less. A more reasonable workaround is that we should lazy initialize the visualizer in test
, train
, val
rather than __init__
, but I'm not sure will it has a side effect.
Besides, if I first run runner.train()
, and then run runer.test()
, does it mean we need to create a new visualizer and logger during test
? What is the lifespan of the logger and visualization? Maybe we need to rethink it carefully. What do you think of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can record which function has been called and change the log dir in magic fuc __del__
when the instance of runner is finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about this: We maintain the current logic of runner.train
, keeping the output log file names unchanged. However, we add a condition in runner.test
. If the train_loop has already been initialized (indicating that testing is being conducted immediately after training), we will not change the log path. On the other hand, if the train_loop has not been initialized (meaning the test is being run directly), we can simply rename the folder at the end of the testing process. This way, there is no need to modify the output paths for FileHandler and Visualizer. Does this approach sound reasonable to you?
7966b63
to
c5c3424
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1036 +/- ##
=======================================
Coverage ? 77.78%
=======================================
Files ? 139
Lines ? 11307
Branches ? 2278
=======================================
Hits ? 8795
Misses ? 2118
Partials ? 394
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
Distinguish training/testing saving directory.
Modification
add suffix for log_dir in Runner after call train() val() test() functions.
BC-breaking (Optional)
Does the modification introduce changes that break the backward-compatibility of the downstream repos?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.
Use cases (Optional)
If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.
Checklist