Skip to content

Add backend switch for IterDataPipe.__getstate__ #341

Open
@ejguan

Description

@ejguan

I see. I can confirm that we will rely on this DILL_AVAILABLE for any place in TorchData project to determine if dill is available or not.

The problem is that we will need to patch every single module where this is imported. You cannot patch the place where it is defined, but rather where it is used. If you look above, we are not patching ._utils.serialization but rather .datapipes.datapipe, because this is where the flag is used. If we now need use this flag in multiple modules, we need to patch all of them. This is very brittle.

It's doable, but I am not sure if we want to do so because the goal of automatically using dill is to reduce the work users need to figure out if the DataPipe is serializable with lambda function.

Not sure I understand. If we just keep the same detection as we have now, users that don't care should not see any difference. If dill is available, it will be picked up and otherwise pickle will be used. But it would give users the option to enforce a particular backend if they need to. Without this option, the environment you use has an effect on the functionality and there is no way change that. I don't think this is good design.

Even if you don't do it for the users, think about how you want to test pickle vs dill yourself. Right now the only option is to have two separate workflows one with dill installed and one without.

Originally posted by @pmeier in pytorch/vision#5711 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions