Skip to content

This project is a Go-based web scraper and article management system

Notifications You must be signed in to change notification settings

sz2514/Data-Scraping

Repository files navigation

Project Overview

This program runs every hour to check for updates on a specified website. If updates are found, the program downloads the new articles locally. Additionally, a web server runs on localhost:8080 to manage and serve the downloaded articles.

The project uses a Go routine with a concurrency limit of 5, which can be configured in the config.json file.


Prerequisites

This project requires two databases:

  • Redis
  • PostgreSQL

Make sure both databases are installed and properly configured on your system before running the code.


Routes

1. Home Page

URL:
localhost:8080/ localhost:8080/index

Description:
Displays all articles stored in the local downloads folder.

2. Article Management

Endpoint:
/articles/:id

Supported HTTP Methods:

  • GET – Retrieve the article.
  • PUT – Update the article (requires JWT authentication).
  • DELETE – Delete the article (requires JWT authentication).

Note:

  • To use the PUT or DELETE methods, you must pass a valid JWT token for verification.
  • You can obtain a JWT by logging in with your username and password on the login.html page. Upon successful login, the server will provide your JWT token.

Configuration

To customize project settings such as concurrency limits and API behavior, modify the config.json file.

  • Ensure the file adheres to valid JSON format.
  • To change the title of the project or other metadata, update the configuration accordingly.

JWT Authentication

This project implements JWT authentication to secure certain endpoints. For PUT and DELETE requests, include the JWT token in the request headers to authenticate your action.


Warnings

Be cautious with the DELETE method – Deleting an article is irreversible.

About

This project is a Go-based web scraper and article management system

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published