Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query about the code. #2

Open
sterzhang opened this issue Aug 1, 2024 · 3 comments
Open

Query about the code. #2

sterzhang opened this issue Aug 1, 2024 · 3 comments

Comments

@sterzhang
Copy link

First thanks for your great work, that really inspires me a lot!
However, when following your code, we found that your classification layer is based on [feature] instead of [att].
It seems that the CLIP-bolstered [att] doesn't really be used to the final classification, which really confuses us.
Maybe it is designed to do so, but it would be better if you could explain the reason behind this, really appreciated!

@tmtuan1307
Copy link
Owner

Hi, yes, it is designed, and let me explain it. Given $[feature] \in R^d$ of any classifier model, we need to map it into the same dimension with CLIP's embedding at $512$ using $M \in R^{d\times 512}$.
$[att] = M([feature])$

  • In case of using $[att]$ for final classifier $h$ (as your mentioned):
    $\hat{y} = h([att]) = h(M([feature]))$
    We need to adjust the standard architecture of the model and increase its number of parameters, potentially making the comparison unfair.
  • In case of using $[feature]$ for final classifier $h$ (as our design)
    $\hat{y} =h([feature])$
    The standard architecture and the number of parameters are kept unchanged.

Furthermore, since $[att] = M([feature])$, our Bounding Loss on $[att]$ can still impact and bound the $[feature]$ around CLIP's Label Text Embedding.

@sterzhang
Copy link
Author

sterzhang commented Aug 1, 2024 via email

@sterzhang
Copy link
Author

sterzhang commented Aug 1, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants