Draft
Conversation
oke464
reviewed
Sep 12, 2025
Collaborator
oke464
left a comment
There was a problem hiding this comment.
After discussions with @filipekstrm we think setting parameters persistent_workers, pin_memory, num_workers to pytorch default and adding description in readme might be the best option. @rartino what do you think?
| "epochs": 1000, | ||
| "val_interval": 1, | ||
| "num_workers": 0, | ||
| "pin_memory": True, |
Collaborator
There was a problem hiding this comment.
Suggested change
| "pin_memory": True, | |
| "pin_memory": False, |
| "val_interval": 1, | ||
| "num_workers": 0, | ||
| "pin_memory": True, | ||
| "persistent_workers": True, |
Collaborator
There was a problem hiding this comment.
Suggested change
| "persistent_workers": True, | |
| "persistent_workers": False, |
| Warning: using logger ```none``` will not save any checkpoints (or anything else), but can be used for, e.g., debugging. | ||
|
|
||
| This command will use the default values for all other parameters, which are the ones used in the paper. | ||
| This command will use the default values for all other parameters, which are the ones used in the paper. **Note: It is not strictly necessary to set ```num_workers```, and if not it will default to 0. However, in our experience, increasing it can substantially speed up training** |
Collaborator
There was a problem hiding this comment.
Suggested change
| This command will use the default values for all other parameters, which are the ones used in the paper. **Note: It is not strictly necessary to set ```num_workers```, and if not it will default to 0. However, in our experience, increasing it can substantially speed up training** | |
| This command will use the default values for all other parameters, which are the ones used in the paper. **Note: It is not strictly necessary to set ```num_workers```, ``persistent_workers``, and ``pin_memory``. However, in our experience, increasing ``num_workers``, and setting ``persistent_workers=True``, and ``pin_memory=True`` can substantially speed up training.** Optimum ``num_workers`` value depends on your system, we have used the maximum suggested value from PyTorch warning. |
| To train a WyckoffDiff model on WBM, a minimal example is | ||
| ``` | ||
| python main.py --mode train_d3pm --d3pm_transition [uniform/marginal/zeros_init] --logger [none/model_only/local_only/tensorboard/wandb] | ||
| python main.py --mode train_d3pm --d3pm_transition [uniform/marginal/zeros_init] --logger [none/model_only/local_only/tensorboard/wandb] --num_workers [NUM_WORKERS] |
Collaborator
There was a problem hiding this comment.
Suggested change
| python main.py --mode train_d3pm --d3pm_transition [uniform/marginal/zeros_init] --logger [none/model_only/local_only/tensorboard/wandb] --num_workers [NUM_WORKERS] | |
| python main.py --mode train_d3pm --d3pm_transition [uniform/marginal/zeros_init] --logger [none/model_only/local_only/tensorboard/wandb] --num_workers [NUM_WORKERS] --persistent_workers [True/False] --pin_memory [True/False] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Trying to make training faster by improving dataloading. This has been developed on a GTX 3090 and A100, and experiences are hence from there:
--num_workerswhich is passed to dataloaders. Default is 0 (which is default also in DataLoader and hence what is used currently), but increasing it can substantially make training faster--pin_memorywhich is passed to dataloaders. The default is True which is opposite of what is used in Pytorch--persistent_workerswhich is passed to dataloaders if---num_workers > 0. The default is True which is opposite of what is used in PytorchDatawhen getting the data (i.e., in thegetmethod ofWyckoffDataset). This instance only contains the bare minimum information necessary (e.g, it does not include the matrixxas that is not used)I am a little unsure about using True as default for
--pin_memoryand--persistent_workersas False is the default in Pytorch. On the other hand, I think they help in improving speed for our use, and hence they can be True in our codebase. On the other hand, for--num_workersI think it can be system-specific what is suitable and hence I left 0 as the default. I did, however, include it in the training command example in the README together with a note