enable ffcv with PL #11538

williamFalcon · 2022-01-19T10:39:01Z

A team introduced ffcv which does optimizations in the dataloader level.
As long as it can be a drop in replacement for dataloader it should add the benefits to any PyTorch and Lightning script.

ie:

# current
data = Dataloader(...)

trainer.fit(..., data)

# 10x faster with ffcv
data = FFCVDataloader

trainer.fit(..., data)

Let's make sure when users want to get the benefits of both FFCV and PL they can do that!
cc @Borda @rohitgr7 @Felonious-Spellfire @justusschock @awaelchli @ninginthecloud @otaj @tchaton @carmocca

tchaton · 2022-01-20T06:51:04Z

@ethanwharris This could be built-in within Flash.

paantya · 2022-01-20T16:10:07Z

libffcv/ffcv#72

justusschock · 2022-01-20T17:07:07Z

This is related to #10696.

When we support arbitrary iterables, ffcv should be supported automatically. Maybe not all features we support with basic torch loaders (haven't looked at ffcv details so far), but definitely usable for training.

siddk · 2022-03-17T01:46:52Z

Any update on this?

carmocca · 2022-03-17T15:28:39Z

FFCV should be perfectly usable with no modifications to the PL source code after the 1.6 release.

Before, some patching to the loops was necessary to disable PL's pre-fetching as it conflicted with the pre-fetching done by FFCV. Now, PL avoids pre-fetching whenever it is possible.

From my own benchmarks, the speed up from FFCV applies perfectly when used with PL. We'll be releasing public benchmarks in the future.

We'll also explore making or internal data processing more flexible so that maybe we can provide an opinionated LightningDataModule integration that makes switching existing LightningDataModules easier.

A caveat I've noticed are that you'll want to remove the ToDevice(device) from the label pipeline if it's the last operation in your transformations (example) as PL already takes care of moving the tensors to the correct devices.

Features like fault-tolerance do not support FFCV.

Note that there's a lot of surface for bugs here and I think most of the battle-testing has been done on map-style datasets. If you encounter any memory or speed issues, please open a separate issue and you can ping me on it.

Cheers!

siddk · 2022-03-17T18:50:08Z

Ok - awesome! Thanks so much!

siddk · 2022-03-25T20:57:42Z

Second question (sorry for the delayed follow-up); is the recommended pattern to couple FFCV dataloading within a LightningDataModule, or should I just pass the FFCV loaders directly to the Trainer?

carmocca · 2022-03-29T03:39:53Z

It's up to you, but both will work.

The advantage of the LightningDataModule is that you can put the FFCV specific data-writing code (https://docs.ffcv.io/writing_datasets.html#writing-a-dataset-to-ffcv-format) inside the setup hook and also save in the instance state any specific arguments for your FFCV Loaders:
(https://docs.ffcv.io/making_dataloaders.html#dataset-ordering) inside *_dataloader methods

leejiahe · 2022-03-30T09:44:24Z

Hi,

Do we need to write a custom FitLoop to define how we want to pre-fetch the batch, as documented in the 1.6.0 release page to be compatible with FFCV dataloader?

carmocca · 2022-03-30T12:21:19Z

@leejiahe You don't, this was just to showcase the customization. But it shouldn't be necessary

codestar12 · 2022-04-12T18:27:24Z

A caveat I've noticed are that you'll want to remove the ToDevice(device) from the label pipeline if it's the last operation in your transformations (example) as PL already takes care of moving the tensors to the correct devices.

do you want to remove all ToDevice(device)? I'm a bit confused because I've mostly been able to ignore manually controlling device placement since switching to PyTorch lightning.

codestar12 · 2022-04-14T18:02:01Z

I have actually been having a lot of trouble getting this to work with DDP and multiple GPUs. What is the best way to get the GPU rank from a lighting data module? Their data loader seems pretty opinionated about these transforms and really wants to move it to the device.

Data-drone · 2022-06-11T08:18:00Z

How did you go in the end @codestar12?

carmocca · 2022-06-14T16:09:09Z

do you want to remove all ToDevice(device)?

No, only if it's the last operation in the pipeline. Otherwise, you'll want to keep it because doing more operations with the data on device could be faster.

What is the best way to get the GPU rank from a lighting data module?

You can use self.trainer.local_rank

codestar12 · 2022-06-14T17:56:45Z

How did you go in the end @codestar12?

I basically just removed a lot of FFCV transforms from the pipeline and just kept the data preprocessing bit. It looks like @carmocca finally has the answer though

paantya · 2022-06-28T16:16:56Z

Please add somewhere in the Lightning or bolt an example of how to use the Lightning in conjunction with FFCV

rmchurch · 2022-12-15T21:11:38Z

Do I understand right from the docs though that you can't use FFCV + Lightning when using ddp? I pass the distributed arg to the ffcv loader and get errors when running with Lightning. Is this a fundamental issue preventing, or could multi-GPU training with FFCV + Lightning potentially be implemented some day?

carmocca · 2022-12-16T00:20:03Z

It was working last time I tried it. What errors do you see? FFCV requires quite a bit of configuration to work as you expect. This comment contains the script I used in the past: #15598 (review)

rmchurch · 2022-12-16T01:43:35Z

Ah, I had seen your example, but thought it was incomplete because for gpus>1 it was marked FIXME: distributed, and distributed=False input into FFCV Loader. Thinking it was old code, I was trying distributed=True, and getting the init_process_group not initialized error. But is it the case that I should put distributed=False into the FFCV Loader still, and Lightning will insert the DistributedSampler? If that is the case, perhaps the paragraph below in the docs should be rephrased, or additional detail given that ddp will work with FFCV, but you have to leave distributed=False.

In a distributed multi-GPU setting (ddp), Lightning automatically replaces the DataLoader’s sampler with its distributed counterpart. This makes sure that each GPU sees a different part of the dataset. As sampling can be implemented in arbitrary ways with custom iterables, there is no way for Lightning to know, how to replace the sampler.

carmocca · 2022-12-20T03:00:15Z

No, Lightning will not insert it automatically for FFCV. You might need to initialize the Loader class under one of the *_dataloader hooks so that the process groups are initialized before you create it.

williamFalcon added the feature Is an improvement or enhancement label Jan 19, 2022

carmocca self-assigned this Jan 19, 2022

carmocca added 3rd party Related to a 3rd-party data handling Generic data-related topic performance labels Jan 19, 2022

carmocca added this to the 1.6 milestone Jan 19, 2022

Borda mentioned this issue Jan 20, 2022

add ffcv data loader #11553

Closed

paantya mentioned this issue Jan 20, 2022

support pytorch-lightning libffcv/ffcv#72

Open

This was referenced Jan 25, 2022

Support no pre-fetching #11606

Merged

Move data fetcher ownership to the loops #11621

Merged

carmocca mentioned this issue Feb 24, 2022

Do not prefetch when possible #12101

Merged

11 tasks

Borda modified the milestones: 1.6, 1.6.x Mar 21, 2022

salelkafrawy mentioned this issue Jul 15, 2022

FFCV with CIFAR10 #13668

Closed

carmocca modified the milestones: pl:1.6.x, pl:future Jul 28, 2022

carmocca added docs Documentation related and removed feature Is an improvement or enhancement performance labels Oct 31, 2022

Borda assigned justusschock and unassigned carmocca Oct 31, 2022

carmocca modified the milestones: future, v1.9 Oct 31, 2022

justusschock mentioned this issue Nov 9, 2022

FCCV Docs #15598

Merged

12 tasks

lexierule closed this as completed in #15598 Nov 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable ffcv with PL #11538

enable ffcv with PL #11538

williamFalcon commented Jan 19, 2022 •

edited by github-actions bot

Loading

tchaton commented Jan 20, 2022

paantya commented Jan 20, 2022

justusschock commented Jan 20, 2022 •

edited

Loading

siddk commented Mar 17, 2022

carmocca commented Mar 17, 2022 •

edited

Loading

siddk commented Mar 17, 2022

siddk commented Mar 25, 2022

carmocca commented Mar 29, 2022

leejiahe commented Mar 30, 2022

carmocca commented Mar 30, 2022

codestar12 commented Apr 12, 2022

codestar12 commented Apr 14, 2022

Data-drone commented Jun 11, 2022

carmocca commented Jun 14, 2022

codestar12 commented Jun 14, 2022

paantya commented Jun 28, 2022

rmchurch commented Dec 15, 2022

carmocca commented Dec 16, 2022

rmchurch commented Dec 16, 2022

carmocca commented Dec 20, 2022

enable ffcv with PL #11538

enable ffcv with PL #11538

Comments

williamFalcon commented Jan 19, 2022 • edited by github-actions bot Loading

tchaton commented Jan 20, 2022

paantya commented Jan 20, 2022

justusschock commented Jan 20, 2022 • edited Loading

siddk commented Mar 17, 2022

carmocca commented Mar 17, 2022 • edited Loading

siddk commented Mar 17, 2022

siddk commented Mar 25, 2022

carmocca commented Mar 29, 2022

leejiahe commented Mar 30, 2022

carmocca commented Mar 30, 2022

codestar12 commented Apr 12, 2022

codestar12 commented Apr 14, 2022

Data-drone commented Jun 11, 2022

carmocca commented Jun 14, 2022

codestar12 commented Jun 14, 2022

paantya commented Jun 28, 2022

rmchurch commented Dec 15, 2022

carmocca commented Dec 16, 2022

rmchurch commented Dec 16, 2022

carmocca commented Dec 20, 2022

williamFalcon commented Jan 19, 2022 •

edited by github-actions bot

Loading

justusschock commented Jan 20, 2022 •

edited

Loading

carmocca commented Mar 17, 2022 •

edited

Loading