Hi;
The result from the transformer before the "forward_head" have shape [8, 197, 768] , however, if we would like to use it for segmentation, other model such as TransUnet have shape [8, 196, 768]. This is important to get the square of 196 and convert into height and width for 2D.
I notice it gets this shape after calling "x = self._pos_embed(x)."
How could we convert the tenser [8, 197, 768] shape to [8, 196, 768]. Could we simply extract the first 196 vector or last 196?
Hi;
The result from the transformer before the "forward_head" have shape [8, 197, 768] , however, if we would like to use it for segmentation, other model such as TransUnet have shape [8, 196, 768]. This is important to get the square of 196 and convert into height and width for 2D.
I notice it gets this shape after calling "x = self._pos_embed(x)."
How could we convert the tenser [8, 197, 768] shape to [8, 196, 768]. Could we simply extract the first 196 vector or last 196?