In the released training code, I don't see a trainable task decoder to preserve the semantics of the latent.
Is this open-sourced code an implementation of the pipeline diagram shown in the paper?
Furthermore, if there is no task decoder during training to maintain the latent's semantics, doesn't the entire training process just become a representation trade-off? That is, adding a residual branch enhances the ability to reconstruct details, but at the cost of sacrificing semantics."