Skip to content

Comments

#2543: Centralize the initialization of TikaConfig in TikaManager#2550

Open
aberenguel wants to merge 3 commits intomasterfrom
#2543_tika_config
Open

#2543: Centralize the initialization of TikaConfig in TikaManager#2550
aberenguel wants to merge 3 commits intomasterfrom
#2543_tika_config

Conversation

@aberenguel
Copy link
Collaborator

Solves issue #2543

@lfcnassif
Copy link
Member

Thanks @aberenguel! Just a question, does this fix an existing bug or it aims to prevent introducing future bugs when calling Tika from additional places?

@aberenguel
Copy link
Collaborator Author

There is no bug. It is just a optimization to avoid TikaConfig to be instanced in several places and to ensure that is instanced only after when it is properly initialized in ParsingTask.

@aberenguel
Copy link
Collaborator Author

I don't known if ParsingTask is mandatory for IPED processing. If it is not, the TikaConfig initialization must be executed another place. If we initialize it lazily inside of TikaManager.getTikaConfig(), we don't have much control of when it is initialized, potentially being instantiated ahead of time.

@aberenguel
Copy link
Collaborator Author

The way this has been implemented in the PR may help avoid future programming errors.

@lfcnassif
Copy link
Member

I don't known if ParsingTask is mandatory for IPED processing.

It's not needed, it's disabled in fastmode profile and I just checked the code, process() method is skipped if it is disabled.

If we initialize it lazily inside of TikaManager.getTikaConfig(), we don't have much control of when it is initialized, potentially being instantiated ahead of time.

I think we can add checks to see if some custom signature and some custom parser were loaded and are working fine, at runtime. If not, abort processing.

@aberenguel
Copy link
Collaborator Author

It's not needed, it's disabled in fastmode profile and I just checked the code, process() method is skipped if it is disabled.

Excellent! But I can't remove it from ParserConfig.xml, can I?

I think we can add checks to see if some custom signature and some custom parser were loaded and are working fine, at runtime. If not, abort processing.

I'm going to do that.

@lfcnassif
Copy link
Member

Excellent! But I can't remove it from ParserConfig.xml, can I?

No, I think part of the code of init() method must be run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants