Simple scraper that searches for all href links in a given domain.
This project was developed using the technologies bellow:
- Programming language: Elixir
- Web Framework: Phoenix+Liveview+PubSub for realtime updates
- Dabase: Postgres
- Job processing: Oban
- Scrape/Crawler tool: Crawly + Floki
⚠️ You must have elixir, erlang and docker compose installed in order to run this application.
-
Runing the db
docker compose up -d -
Fetching dependencies
mix deps.get -
Running migrations and setting up DB
mix ecto.setup -
Running tests
mix test -
Running the server
mix phx.server
Congratulations! Now you can access the page at http://localhost:4000
If it is your first time, you'll have to create an account. If it is not, simple click signin and start scraping!
- SignUp/SignIn
- When you either sign up or sign in, you'll be redirected to the
/pagesendpoint and a session for your user_id will be created - you cannot access any
/pagesendpoint without a session. If you try to do so, you'll be redirected to/signin
- When you either sign up or sign in, you'll be redirected to the
- Scrape
- You can enter a webpage to be scraped. An async job will be encharged of running the operation. When the operation is finished, the scraped page will show on the menu.
- The scrape will look for href links. All links found will be shown in the details page. You can access it just by clicking on the page card.
- PubSub was used for live updates, so if you have the same account opened in two browsers, after the scrape is finished, both windows will be synced at the same time, automatically.
-