Analysis of the Indexing and Document Search Functionalities of the Apache Solr Platform

This project aims to analyze the functionalities of the Apache Solr platform, an open-source document search engine, for indexing and searching documents. The project involves configuring a prototype of Apache Solr and developing several components, including scripts, a Java program, and a UI Solritas. These components work together to achieve the following tasks:

Detect changes in a folder containing documents in real time
Identify non-indexable PDF files
Make non-indexable PDF files indexable using OCR to extract text content
Index the PDF files using Apache Solr
Perform searches on the indexed files through a graphical interface

Prerequisites • Presentation • Screenshots

Prerequisites

Project Description

The project's aim is to configure a prototype of Apache Solr and develop a set of scripts, a Java program, and a UI Solritas to index and search documents. The main parts are:

Real-time detection of changes in a folder containing documents
- This task is performed by a script that monitors the folder and detects any file creation, modification, deletion, or movement
Identification of non-indexable PDF files
- This task is performed by a script that identifies PDF files that cannot be indexed because of missing text content
Making the PDF files indexable for content
- This task is performed by a script that uses Optical Character Recognition (OCR) technology to extract text from PDF files that cannot be indexed
Indexing the PDF files using Apache Solr
- This task is performed by a Java program that indexes the PDF files on the Apache Solr platform
Performing searches on the indexed files with a graphical interface
- This task is performed by a UI Solritas that allows users to search the indexed files using a graphical interface

Screenshots

Below are some screenshots (in Italian) of the different components used in the project:

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
doc		doc
img		img
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis of the Indexing and Document Search Functionalities of the Apache Solr Platform

Prerequisites

Project Description

Screenshots

UI Solritas

Java Indexer

Solr Index Updater

Indexer

About

Releases

Packages

Languages

License

liviobisogni/solr-ocr-indexing

Folders and files

Latest commit

History

Repository files navigation

Analysis of the Indexing and Document Search Functionalities of the Apache Solr Platform

Prerequisites

Project Description

Screenshots

UI Solritas

Java Indexer

Solr Index Updater

Indexer

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages