Skip to content

hassanjawwad12/text-search-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

text-search-engine

A text search engine built using GO

Download the data dump

data-dump

Data structure

  • Title
  • URL
  • Abstract

Basic approaches

  • strings.Contains
  • regexp
  • matchstring

Problem with basic approaches

The problem is that they don't scale Takes upto 2 seconds for 600k docs , but what if we have 10M docs ? The time will keep increasing

Solution

We will use the approach called inverted index

  • We will pre-process the data and create inverted-index from the text.
  • We will keep a track of each word and its existence document wise.
  • We do Tokenization
  • Then we will do filtering (lowercasing , dropping common words, stemming).
  • Lastly we will do searching.
  • We don't go through the docs for searching we will simply search the index.

About

A text search engine built using GO

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages