ContentIndexer

ContentIndexer is a small GenServer based indexing & searching service. Intially I created this for my blog that is based on markdown. When the total amount of data to be indexed is not huge this small service can handle it very quickly. It stores the index in a genserver and hence searching is very fast.

It uses tf-idf matching & weighting for the actual index. The searching is done in the same way and comparing the query against the index via similarity.

What is tf-idf?

tf–idf, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. It is often used as a weighting factor in information retrieval and text mining.

Helpful blog post

tf-idf background info

Installation

The library is available in Hex. The package can be installed by adding content_indexer to your list of dependencies in mix.exs:

def deps do
  [{:content_indexer, "~> 0.2.0"}]
end

Usage

Please review this test ContentIndexer.TfIdf.IndexProcessTest for the easiest way to know how you can use this in your project. The module ContentIndexer.Services.PreProcess has several functions that are used to pre-process both the content and the queries - since these are passed as functions you can write your own versions of these and pass them into the content tokenisation and query building process.

Currently I am using this to process markdown files for my blog - but this can be useful for any other such text based content.

The hex documentation is here https://hexdocs.pm/content_indexer.

Running tests

Clone the repo and fetch its dependencies:

$ git clone https://github.com/netflakes/content_indexer.git
$ cd ecto
$ mix deps.get
$ mix test

License

The source code is licensed under the MIT license

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
.vscode		.vscode
config		config
doc		doc
docs		docs
lib		lib
test		test
.DS_Store		.DS_Store
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md
mix.exs		mix.exs
mix.lock		mix.lock
rename.sh		rename.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ContentIndexer

What is tf-idf?

Installation

Usage

Running tests

License

About

Releases

Packages

Languages

lulu-2021/content_indexer

Folders and files

Latest commit

History

Repository files navigation

ContentIndexer

What is tf-idf?

Installation

Usage

Running tests

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages