-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable ffcv with PL #11538
Comments
@ethanwharris This could be built-in within Flash. |
This is related to #10696. When we support arbitrary iterables, ffcv should be supported automatically. Maybe not all features we support with basic torch loaders (haven't looked at ffcv details so far), but definitely usable for training. |
Any update on this? |
FFCV should be perfectly usable with no modifications to the PL source code after the 1.6 release. Before, some patching to the loops was necessary to disable PL's pre-fetching as it conflicted with the pre-fetching done by FFCV. Now, PL avoids pre-fetching whenever it is possible. From my own benchmarks, the speed up from FFCV applies perfectly when used with PL. We'll be releasing public benchmarks in the future. We'll also explore making or internal data processing more flexible so that maybe we can provide an opinionated A caveat I've noticed are that you'll want to remove the Features like fault-tolerance do not support FFCV. Note that there's a lot of surface for bugs here and I think most of the battle-testing has been done on map-style datasets. If you encounter any memory or speed issues, please open a separate issue and you can ping me on it. Cheers! |
Ok - awesome! Thanks so much! |
Second question (sorry for the delayed follow-up); is the recommended pattern to couple FFCV dataloading within a LightningDataModule, or should I just pass the FFCV loaders directly to the Trainer? |
It's up to you, but both will work. The advantage of the |
Hi, Do we need to write a custom FitLoop to define how we want to pre-fetch the batch, as documented in the 1.6.0 release page to be compatible with FFCV dataloader? |
@leejiahe You don't, this was just to showcase the customization. But it shouldn't be necessary |
do you want to remove all ToDevice(device)? I'm a bit confused because I've mostly been able to ignore manually controlling device placement since switching to PyTorch lightning. |
I have actually been having a lot of trouble getting this to work with DDP and multiple GPUs. What is the best way to get the GPU rank from a lighting data module? Their data loader seems pretty opinionated about these transforms and really wants to move it to the device. |
How did you go in the end @codestar12? |
No, only if it's the last operation in the pipeline. Otherwise, you'll want to keep it because doing more operations with the data on device could be faster.
You can use |
I basically just removed a lot of FFCV transforms from the pipeline and just kept the data preprocessing bit. It looks like @carmocca finally has the answer though |
Please add somewhere in the Lightning or bolt an example of how to use the Lightning in conjunction with FFCV |
Do I understand right from the docs though that you can't use FFCV + Lightning when using |
It was working last time I tried it. What errors do you see? FFCV requires quite a bit of configuration to work as you expect. This comment contains the script I used in the past: #15598 (review) |
Ah, I had seen your example, but thought it was incomplete because for
|
No, Lightning will not insert it automatically for FFCV. You might need to initialize the |
A team introduced ffcv which does optimizations in the dataloader level.
As long as it can be a drop in replacement for dataloader it should add the benefits to any PyTorch and Lightning script.
ie:
Let's make sure when users want to get the benefits of both FFCV and PL they can do that!
cc @Borda @rohitgr7 @Felonious-Spellfire @justusschock @awaelchli @ninginthecloud @otaj @tchaton @carmocca
The text was updated successfully, but these errors were encountered: