Closed

Description
What's the reasoning behind limiting the Mix Visual Transformer encoder #632 to 3 input channels?
I couldn't spot anything in the paper or the original SegFormer implementation.
What's the reasoning behind limiting the Mix Visual Transformer encoder #632 to 3 input channels?
I couldn't spot anything in the paper or the original SegFormer implementation.