Skip to content

Ensure loading of recent public suffix list (effective_tld_names.dat) #17

Closed
@sebastian-nagel

Description

@sebastian-nagel

The public suffix list (using the old file name "effective_tld_names.dat") is shipped twice in the Nutch job file in the dependency jar files of

The latter one ships with a heavily outdated version of the public suffix list. Crawler-commons EffectiveTldFinder loads the "effective_tld_names.dat" from class path. When running in distributed mode here is no control which dependency jar is first on the class path. So it may happen that the outdated version is loaded.

Ideally, the most recent version of the public suffix list should be used. This could be achieved by downloading the list during build and placing it in the "conf/" folder which is always first in the class path.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions