-
Notifications
You must be signed in to change notification settings - Fork 378
Open
Description
As I understand it, passing multiple documents into KeyBERT is faster than iterating over documents while passing only one of them because each word gets embedded only once in the former approach. Does that mean that one should also see some performance benefits from splitting one huge document into multiple documents?
To provide an example, my intuition tells me the second option here should be faster. But from my tests it isn't. Shouldn't it be?
# first
kw_model = KeyBERT()
kw_model.extract_keywords(doc)
# second
kw_model = KeyBERT()
docs = split_the_doc_somehow(doc)
kw_model.extract(docs)Metadata
Metadata
Assignees
Labels
No labels