GitHub - mattszeto/FindEX: Search Engine in java for text and json files on disk.

Search Engine

Search Engine:

Boolean Queries: Positional Inverted Index, K-gram Index, Porter2Stemmer Algorithm
- K-Grams
- Stemming
- NEAR/k Queries
- Wildcard Queries
Ranked Queries: Disk Positional Index, K-gram Index on Disk
- Spelling Check (Suggested Queries): Jaccard Coefficient, Levenshtein Edit Distance

Dependencies:

SparkJava w/ Thymeleaf-Template: com.sparkjava:spark-template-thymeleaf:2.7.1
Gson: gson-2.8.6.jar
libstemmer (porter2stemmer): tartarus snowball-stemmer
mapdb (b+tree for on-disk index): org.mapdb

How to run search engine: (2 Ways)

CMD text:

You can run our search engine in the command line by running indexer.java

Open up your favorite text-editor or IDE (project developed on intellij)
download and load all dependencies (noted above)
run indexer.java for a fully functional cmd text application

Web UI: localhost

Another way is to run from your local machine at http://localhost:4567/

Open up text-editor or IDE (project developed on intellij)
download and load all dependencies (noted above)
run WebUI.java
open a chrome window and go to http://localhost:4567/

Building Index:

To build an index, provide a path to your corpus folder.

(if you have already built your index before, and no files have changed within the corpus, then you do not need to build index again) you can check this by seeing if your corpus folder already has an index folder inside with postings.bin and docWeights.bin (there will be other files too, but these are the most important

You must bring your own corpus to index and search from (local file directory needed)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
out/production/search-engine		out/production/search-engine
src		src
README.md		README.md
search-engine.iml		search-engine.iml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Search Engine

How to run search engine: (2 Ways)

CMD text:

Web UI: localhost

Building Index:

About

Releases

Languages

mattszeto/FindEX

Folders and files

Latest commit

History

Repository files navigation

Search Engine

How to run search engine: (2 Ways)

CMD text:

Web UI: localhost

Building Index:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Languages