Added configuration parameter processBinaryContentInCrawling #54
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Note: I recently discovered that you moved from google code to Github. I've been using and modifying crawlerj4 for a while now and now I'm able to submit pull requests to allow you to merge them. I rebased all my (relevant) patches on the new master, so they should be easy to merge. I hope you consider some or all of them useful.
Description of this patch:
Added configuration parameter processBinaryContentInCrawling determines if binary content must be processed by TIKA in addition to being retrieved at all (which is controlled by includeBinaryContentInCrawling). This is useful if you want to be able to retrieve the binary content but do not care if links inside are processed. This can improve the performance when handling binary
documents strongly.