A mini Google. Custom web crawler & indexer written in Golang.
🚧 This is a work in progress and therefore you should expect that the application may not have all the features at this moment.
- Golang-Powered: Leverage the performance and safety of one of the best languages in the market for backend development.
- Search engine based on the
Depth-first search (DFS)
algorithm: Depth-first search is an algorithm for traversing or searching tree or graph data structures, as is the case with HTML documents. To avoid processing the same link more than once, aunique
constraint is used when storing the urls that will be crawled in subsequent cycles. - Indexing
full text search
: It is carried out using a parser/tokenizer that uses the Snowball library and implemented an inverted index, which is stored in the database, allowing an efficient query of the terms search. - SQL Database Integration: Storing crawled urls and indexing results in a
Postgres
DB, which allows greater scalability and efficiency in searches. - Caching of the responses (in
JSON
format) of the searches performed: TheFiber
framework provides middleware for easy caching of server responses. - Using the
Fiber
framework,A-H/Templ
andHtmx
libraries:: The use of Fiber, Templ and Htmx greatly speeds up the creation of a simple user interface for minimal search engine administration. Check out some of my other repositories for more explanations. - Using interfaces in the
services
package: The architecture follows a typical "onion model" where each layer doesn't know about the layer above it, and each layer is responsible for a specific thing, in this case, theservices
(package) layer, which allows for better separation of responsibilities anddependency injection
. - Using concurrency in engine-built crawling functions: Use is made of one of the features in which the Go language shines most: concurrency, to try to speed up the always heavy link crawling tasks. 🚧 This is a work in progress!!
Before compiling the view templates, you'll need to regenerate the CSS. First, you need to install the dependencies required by Tailwind CSS
and daisyUI
(you must have Node.js
installed on your system) and then run the regeneration of the main.css
file. To do this, apply the following commands:
$ cd tailwind && npm i
$ npm run build-css-prod # `npm run watch-css` regenerate the css in watch mode for development
Since we use the PostgreSQL database from a Docker container, it is necessary to have the latter also installed and execute this command in the project folder:
$ docker compose up -d
These other commands will also be useful to manage the database from its container:
$ docker start search-engine # start container
$ docker stop search-engine # stop container
$ docker exec -it search-engine psql -U postgres # (user: postgres, without password)
Besides the obvious prerequisite of having Go! on your machine, you must have Air installed for hot reloading when editing code.
Tip
In order to have autocompletion and syntax highlighting in VS Code for the Templ templating language
, you will have to install the templ-vscode extension (for vim/nvim install this plugin). To generate the Go code corresponding to these templates you will have to download this executable binary from Github and place it in the PATH of your system. The command:
$ templ generate # `templ generate --watch` to enable watch mode
Tip
This command allows us to regenerate the .templ
templates and, therefore, is necessary to start the application. This will also allow us to monitor changes to the .templ
files (if we have the --watch
flag activated) and compile them as we save them if we make changes to them. Review the documentation on Templ installation and support for your IDE .
Build for production:
$ go build -ldflags="-s -w" -o ./bin/search-engine ./cmd/search-engine/main.go # ./bin/search-engine to run the application / Ctrl + C to stop the application
Start the app in development mode:
$ air # This compiles the view templates automatically / Ctrl + C to stop the application