Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytorch Dataloader issue #3410

Open
gganduu opened this issue Nov 5, 2021 · 1 comment
Open

Pytorch Dataloader issue #3410

gganduu opened this issue Nov 5, 2021 · 1 comment
Assignees

Comments

@gganduu
Copy link

gganduu commented Nov 5, 2021

When using pytorch, I can define a function extends torch.utils.data.Dataset, and a function extends torch.utils.data.Dataloader. Besides, In order to further process a batch of data, we can define a collect_fn function when build the Dataloader:

class myDataset(torch.utils.data.Dataloader):
	...
    
    def __iter__(self, idx):
        ...
        
        return img, label
    ...
    
@staticmethod
def collect_fn(batch):
    
    ...
    
    return img, label

train_loader = torhc.utils.data.Dataloader(
	dataset,
    batchsize,
    ...
    collect_fn
)

This is a standard dataloader definition in pytorch, and I successfully trained my yolov5 model using az.

But when I tried to return more values, for example, I want to return additional image names:

class myDataset(torch.utils.data.Dataloader):
	...
    
    def __iter__(self, idx):
        ...
        
        return img, label, img_names
    ...

and then to get data info using:

for epoch in range(epoches):
	for idx, (img, label, img_names) in enumerate(train_loader):
		...

It works well when I train the model without az. But when I use az local mode, it passes the img_names as target to my loss function!

So, my loss function received my predict correctly, but it received img_names tuple instead of label`.

@hkvision
Copy link
Contributor

hkvision commented Nov 5, 2021

Hi @gganduu For this issue and #3409 Since your dataloader and loss are not standard ones, you need to rewrite the train loop to make it work.
To be more specific, you need to write a class that extends TrainingOperator and override the train_batch method here: https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/orca/src/bigdl/orca/learn/pytorch/training_operator.py#L220, and input the new operator class as the argument when creating the estimator: https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/orca/src/bigdl/orca/learn/pytorch/estimator.py#L45

*features, target = batch            => change to              img, label, img_names = batch     

Unfortunately, since the stuff in yolo is not a standard one and a bit complicated, you need to modify some code to customize the implementation. Have a try?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants