Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added configuration parameter processBinaryContentInCrawling #54

Merged
merged 1 commit into from
Jul 19, 2015

Conversation

EgbertW
Copy link
Contributor

@EgbertW EgbertW commented May 20, 2015

Note: I recently discovered that you moved from google code to Github. I've been using and modifying crawlerj4 for a while now and now I'm able to submit pull requests to allow you to merge them. I rebased all my (relevant) patches on the new master, so they should be easy to merge. I hope you consider some or all of them useful.

Description of this patch:

Added configuration parameter processBinaryContentInCrawling determines if binary content must be processed by TIKA in addition to being retrieved at all (which is controlled by includeBinaryContentInCrawling). This is useful if you want to be able to retrieve the binary content but do not care if links inside are processed. This can improve the performance when handling binary
documents strongly.

determines if binary content must be processed by TIKA in addition to
being retrieved at all (which is controlled by
includeBinaryContentInCrawling). This is useful if you want to be able
to retrieve the binary content but do not care if links inside are
processed. This can improve the performance when handling binary
documents strongly.
yasserg added a commit that referenced this pull request Jul 19, 2015
Added configuration parameter processBinaryContentInCrawling
@yasserg yasserg merged commit bdbdc3e into yasserg:master Jul 19, 2015
@EgbertW EgbertW deleted the binary-content branch July 21, 2015 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants