Description
I see. I can confirm that we will rely on this
DILL_AVAILABLE
for any place in TorchData project to determine ifdill
is available or not.
The problem is that we will need to patch every single module where this is imported. You cannot patch the place where it is defined, but rather where it is used. If you look above, we are not patching ._utils.serialization
but rather .datapipes.datapipe
, because this is where the flag is used. If we now need use this flag in multiple modules, we need to patch all of them. This is very brittle.
It's doable, but I am not sure if we want to do so because the goal of automatically using
dill
is to reduce the work users need to figure out if the DataPipe is serializable with lambda function.
Not sure I understand. If we just keep the same detection as we have now, users that don't care should not see any difference. If dill
is available, it will be picked up and otherwise pickle
will be used. But it would give users the option to enforce a particular backend if they need to. Without this option, the environment you use has an effect on the functionality and there is no way change that. I don't think this is good design.
Even if you don't do it for the users, think about how you want to test pickle
vs dill
yourself. Right now the only option is to have two separate workflows one with dill
installed and one without.
Originally posted by @pmeier in pytorch/vision#5711 (comment)