-
Notifications
You must be signed in to change notification settings - Fork 337
Open
Description
Hi,
Thank for the great tutorial on document clustering. I am pretty new to text analytics and wanted to ask if there is a reason that distances are calculated twice for hierarchical document clustering?
First here on the `tfidf_matrix' using cosine distance:
from sklearn.metrics.pairwise import cosine_similarity
dist = 1 - cosine_similarity(tfidf_matrix)
and second time here over the dist
through ward function that runs euclidean distance before doing the ward linkage:
linkage_matrix = ward(dist)
Is this something specially done for text clustering?
Thanks again
Metadata
Metadata
Assignees
Labels
No labels