Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is Cosine-Similarity of Embeddings Really About Similarity? #732

Open
1 task
irthomasthomas opened this issue Mar 16, 2024 · 1 comment
Open
1 task

Is Cosine-Similarity of Embeddings Really About Similarity? #732

irthomasthomas opened this issue Mar 16, 2024 · 1 comment
Labels
embeddings vector embeddings and related tools Papers Research papers Research personal research notes for a topic

Comments

@irthomasthomas
Copy link
Owner

Is Cosine-Similarity of Embeddings Really About Similarity?

DESCRIPTION:
Is Cosine-Similarity of Embeddings Really About Similarity?

Harald Steck
hsteck@netflix.com Netflix Inc. Los Gatos, CA, USA
Chaitanya Ekanadham
cekanadham@netflix.com Netflix Inc. Los Angeles, CA, USA
Nathan Kallus
nkallus@netflix.com Netflix Inc. & Cornell University New York, NY, USA

Abstract

Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations. A popular application is to quantify semantic similarity between high-dimensional objects by applying cosine-similarity to a learned low-dimensional feature embedding. This can work better but sometimes also worse than the unnormalized dot-product between embedded vectors in practice. To gain insight into this empirical observation, we study embeddings derived from regularized linear models, where closed-form solutions facilitate analytical insights. We derive analytically how cosine-similarity can yield arbitrary and therefore meaningless ‘similarities.’ For some linear models the similarities are not even unique, while for others they are implicitly controlled by the regularization. We discuss implications beyond linear models: a combination of different regularizations are employed when learning deep models; these have implicit and unintended effects when taking cosine-similarities of the resulting embeddings, rendering results opaque and possibly arbitrary. Based on these insights, we caution against blindly using cosine-similarity and outline alternatives.

1 Introduction

Discrete entities are often embedded via a learned mapping to dense real-valued vectors in a variety of domains. For instance, words are embedded based on their surrounding context in a large language model (LLM), while recommender systems often learn an embedding of items (and users) based on how they are consumed by users. The benefits of such embeddings are manifold. In particular, they can be used directly as (frozen or fine-tuned) inputs to other models, and/or they can provide a data-driven notion of (semantic) similarity between entities that were previously atomic and discrete.

While similarity in ’cosine similarity’ refers to the fact that larger values (as opposed to smaller values in distance metrics) indicate closer proximity, it has, however, also become a very popular measure of semantic similarity between the entities of interest, the motivation being that the norm of the learned embedding-vectors is not as important as the directional alignment between the embedding-vectors. While there are countless papers that report the successful use of cosine similarity in practical applications, it was, however, also found to not work as well as other approaches, like the (unnormalized) dot-product between the learned embeddings, e.g., see [3, 4, 8].

In this paper, we try to shed light on these inconsistent empirical observations. We show that cosine similarity of the learned embeddings can in fact yield arbitrary results. We find that the underlying reason is not cosine similarity itself, but the fact that the learned embeddings have a degree of freedom that can render arbitrary cosine-similarities even though their (unnormalized) dot-products are well-defined and unique. To obtain insights that hold more generally, we derive analytical solutions, which is possible for linear Matrix Factorization (MF) models–this is outlined in detail in the next Section. In Section 3, we propose possible remedies. The experiments in Section 4 illustrate our findings derived in this paper.

Link to the paper

Suggested labels

@irthomasthomas irthomasthomas added embeddings vector embeddings and related tools Papers Research papers Research personal research notes for a topic labels Mar 16, 2024
@irthomasthomas
Copy link
Owner Author

Related content

#728

Similarity score: 0.81

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
embeddings vector embeddings and related tools Papers Research papers Research personal research notes for a topic
Projects
None yet
Development

No branches or pull requests

1 participant