Description
status ACCEPTED severity enhancement in component analyzer for ---
Reported in version unspecified on platform ANY/Generic
Assigned to: Lubos Kosco
On 2011-01-20 10:12:05 +0000, Vladimir Kotal wrote:
PDF analyzer would be beneficial to have, e.g. in order to search design documents together with source code (by selecting a project with the source code and a "project" with design documents).
On 2011-02-15 13:54:44 +0000, Lubos Kosco wrote:
we could reuse http://pdfbox.apache.org/
after all old opengrok - arcs - still used for psarcs had it like that ...
forwardport? :-D
On 2011-02-15 13:59:43 +0000, Lubos Kosco wrote:
alternatively is to use pdfbox underneath tika and grant a myriad of supported formats for lucene:
http://tika.apache.org/0.8/formats.html
(pdf, (open)office, mbox, rtf, audio/video metadata alt. java class and jar parser, it also has a compressed files parser, which can be used to satisfy bug 343 )
I have a feeling this might be one of the major features for next version! :)
On 2011-03-15 07:29:16 +0000, Lubos Kosco wrote:
for odf formats we also have:
http://odftoolkit.org/