Why distances are calculated twice?

Hi,

Thank for the great  tutorial on document clustering. I am pretty new to text analytics and wanted to ask if  there is a reason that distances are calculated twice for hierarchical document clustering? 
First here on the `tfidf_matrix' using cosine distance:

`from sklearn.metrics.pairwise import cosine_similarity`
`dist = 1 - cosine_similarity(tfidf_matrix)`

and second time here over the `dist` through ward function that runs euclidean distance before doing the ward linkage:

`linkage_matrix = ward(dist)`

Is this something specially done for text clustering?

Thanks again

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why distances are calculated twice? #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Why distances are calculated twice? #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions