You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using CLD2 in order to detect the language of the downloaded files, and I was wondering if is possible to avoid to run CLD2 multiple times for a CrawlURI instance where the "via" URI has already been processed using CLD2. For this purpose, I've though in using the "data" map which is provided for storing data, but I'm not sure if it's thread-safe, and if it's not, it'll not be possible to use it in the way I had though because the "data" variable might be smashed or corrupted for the other threads when crawling with multiple threads.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi!
I'm using CLD2 in order to detect the language of the downloaded files, and I was wondering if is possible to avoid to run CLD2 multiple times for a CrawlURI instance where the "via" URI has already been processed using CLD2. For this purpose, I've though in using the "data" map which is provided for storing data, but I'm not sure if it's thread-safe, and if it's not, it'll not be possible to use it in the way I had though because the "data" variable might be smashed or corrupted for the other threads when crawling with multiple threads.
The methods I had though are (from http://builds.archive.org/javadoc/heritrix-3.2.0/org/archive/modules/CrawlURI.html):
Something similar to:
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions