A backend service for collecting posts from multiple RSS feeds. The application runs as a service and exposes RESTful API for managing users and their desired RSS feeds. The service periodically scrapes the configured RSS feeds and collects posts/articles posted on those feeds.
Disclaimer: This is a pet project as part of my learning journey to build scalable, extendable, performant applications in Go. Some parts of the application idea and implementation approach are inspired from this Go course that I have done.
Salient features of the application are the following, which are further elaborated using user stories and sample use cases for the service.
- Fetching RSS feeds: The service supports mutiple users, and supports configuring multiple RSS feeds per user. Posts from those RSS feeds are collected periodically and saved in the database. These posts can be fetched by the users via the API.
- User management: users can be created, updated and deleted via corresponding API operations.
- Authorized access: RSS feeds and collected posts are linked with users and can only be accessed by the respective users. API keys are used to ensure authorization over applicable API endpoints and operations.
- Feeds management: RSS feeds can be configured by CRUD operations via the API.
- Relational database: The service makes use of Postgres database to store users, feeds, and collected posts.
Personal Website: An example of a 'single user' use case for this service can be found in a user's personal website. Blogs, articles and other news items from different sources (that support exposing RSS feeds) can be accumulated in a specific section of user's personal website as desired.
Platform for Job Seekers: An online service providing a platform for job seekers can enable its users to aggregate their blogposts, articles, publications etc. from different in one place, to boost up their profiles for potential employers.
- A frontend service wants to register the users with this backend service by creating respective user profiles (user objects), so that feeds can be configured with assocaited users.
- As an authorized user, I want to configure the list of desired RSS feeds, so that the service can periodically check them, fetch items from them, store them, and make them available on demand.
The software architecutre pattern of scraperss service resembles the Model-View-Controller pattern with the omission of the 'View' part, which can be thought of as an external component, e.g., a web UI generated by a frontend service.
Figure from this useful Medium Article
The overall system architecture is illustrated below, showcasing relevant modules and components that are internal and external to the scraperss service.
System Architecture
The API endpoints, supported HTTP methods over those endpoints, and main functionality and characteritics of those endpoints are listed in the following table.
Root API endpoint: /v1 Full path for example: http://{scraperss-container-address}:80/v1/{desired-endpoint}
| Method | endpoint | Authorization | Scope | Functionality |
|---|---|---|---|---|
| POST | /users | unauthorized | Admin | creates a new user and generates a unique private key for the user |
| GET | /users | unauthorized | Admin | returns the list of all users |
| GET | /users/{userID} | unauthorized | Admin | returns an individual user whose ID is provided, useful to get the IDs for deleting select users |
| DELETE | /users/{userID} | unauthorized | Admin | deletes a created user, along with their configured RSS feeds and posts from the database |
| POST | /feeds | authorized (using API Key) | Users | Users can access this endpoint using their API key in the Authorization header ApiKey <value> to create a new feed, which is linked to their user account |
| GET | /feeds | authorized (using API Key) | Users | returns the list of all the feeds created by a user |
| DELETE | /feeds/{feedID} | authorized (using API Key) | Users | deletes a particular feed, along with the collected posts from that feed |
Below are the formats for POST requests used for creating users and feeds over their respective endpoints:
To create a user, send in the JSON in the POST request body as follows:
{
"name": "Test User"
}
To create a feed for a user, use the following format in the POST request:
{
"name": "Feed Name"
"url": "<complete-url-for-the-feed>"
}
- Golang v1.24.0: The scraperss service is built in Go version v1.24.0 and requires Go toolchain to build it from source.
- Docker
Set up your password for the DB in the appropriate file. Name of the database, user, password, DB data etc. can also be configured in the docker compose manifest.
Run docker compose up --build in the root directory. This will spin up two containers, one each for scraperss and postgres services. The scraperss service can be reached using http at localhost:80 or directly at {container-address}:80.
The main package contains the following key components:
- main.go: serves as the main entry point of the application, reads environment config, initiates concurrent scraping, routes HTTP requests to appropriate handler funcs and implements the server.
- handler_users.go: contains handler functions for incoming HTTP requests on the /users endpoint, e.g., create user, get users, delete user etc.
- handler_feeds.go: contains handler functions for incomiung HTTP requests on the /feeds endpoint, e.g., creating a feed, deleting a feed etc.
- middleware_authz.go: implements authorization logic for the authorized endpoints of the API. Ensures authorization of incoming requests by checking the API key in the Authorization header and verifying if a user exists for that API key, before redirecting the request to an appropriate handler function for further processing.
- json.go: contains functions for writing error and json responses on the HTTP response writer.
- models.go: translate DB objects to structs with appropriate json keys that can be sent in response messages.
- rss.go: defines structs for items recieved on an RSS feed and a function to get RSS feeds from their URLs.
- scrape.go: implements functions to get feeds from the DB that need fetching and then scrapes each individual feed for its items in a concurrent fashion using go routines.
- Dockerfile: to build and run the scraperss service in a Docker container.
- compose.yaml: Docker compose file containing two services, scraperss and db (Postgres).
The internal auth package contains the following component:
- auth.go: extracts the API key from the Authorization header of incoming request.
The internal database package has been generated using the sqlc tool for the queries in the queries folder, and contains the following components:
- db.go: boilerplate code for running SQL queries on the database tables.
- models.go: contains models for the database objects, e.g., user, feed, etc.
- users.sql.go: contains methods to run queries on the users table.
- feeds.sql.go: contains methods to run queries on the feeds table.
Schema for the database tables used by this service can be seen in the schema folder.
The file main_test.go contains basic tests for some API endpoints of the scraperss service. To run the tests, run the following command in the root directory when the service is up and running:
go test -v
Update the URLs for API endpoints, defined as consts in the file, as per your setup's configuration.