Personal Aggregator

Note: Changes should only be made in the master branch, as gh-pages is the publishing branch.

personalAggregator.py is a Python script designed to aggregate content from various sources based on user-defined rules, and generate markdown files suitable for static site generators like Jekyll.

Prerequisites

Python 3.x
pip (Python package installer)
socialModules

Installation

Clone the repository:

git clone https://github.com/your-username/personalAggregator.git
cd personalAggregator

Install dependencies:
```
pip install -r requirements.txt
```

Configuration (`rules.json`)

The script uses a JSON file (e.g., rules.json) to define the aggregation rules. This file should contain a list of rule objects, where each object specifies how to fetch and process content from a particular URL.

Each rule object can have the following properties:

url (string, required): The URL of the page to scrape.
selector (string, required): A CSS selector to identify the main content block for each item.
title_selector (string, required): A CSS selector to extract the title of each item within its content block.
date_selector (string, required): A CSS selector to extract the date of each item within its content block.
date_format (string, optional): The format of the date string (e.g., %Y-%m-%d). If omitted, dateparser will attempt to parse the date automatically.
tags (array of strings, optional): A list of tags to apply to the generated markdown post.
output_filename_prefix (string, optional): A prefix to use for the generated markdown filename.

Example rules.json:

[
  {
    "url": "https://example.com/blog",
    "selector": ".post-item",
    "title_selector": ".post-title a",
    "date_selector": ".post-date",
    "date_format": "%Y-%m-%d",
    "tags": ["blog", "example"],
    "output_filename_prefix": "example-blog"
  },
  {
    "url": "https://anothersite.com/news",
    "selector": "article.news-entry",
    "title_selector": "h2.entry-title",
    "date_selector": ".entry-meta .date",
    "tags": ["news"],
    "output_filename_prefix": "another-news"
  }
]

Usage

Run the personalAggregator.py script from the root of the repository.

python _bin/personalAggregator.py [OPTIONS]

Options:

--config-file <path>: Path to your JSON configuration file (e.g., rules.json). Required.
--output-dir <path>: Directory where the generated markdown post files will be saved (e.g., _posts/). Required.
--num-posts <number>: The maximum number of posts to generate for each rule. Defaults to 1.
--log-level <level>: Set the logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL). Defaults to INFO.

Example Command:

To generate 5 posts from your rules.json file and save them to the _posts/ directory with INFO level logging:

python _bin/personalAggregator.py --config-file rules.json --output-dir _posts/ --num-posts 5 --log-level INFO

Name		Name	Last commit message	Last commit date
Latest commit History 10,095 Commits
.github		.github
.well-known		.well-known
_bin		_bin
_includes		_includes
_layouts		_layouts
_posts		_posts
_sass		_sass
bin		bin
css		css
images		images
js		js
.gitignore		.gitignore
.ruby-version		.ruby-version
CNAME		CNAME
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
LICENCE.md		LICENCE.md
README.md		README.md
_config.yml		_config.yml
contact.md		contact.md
favicon.ico		favicon.ico
feed.xml		feed.xml
index.html		index.html
keybase.txt		keybase.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Personal Aggregator

Prerequisites

Installation

Configuration (`rules.json`)

Usage

About

Uh oh!

Releases

Packages

Languages

License

fernand0/personalAggregator

Folders and files

Latest commit

History

Repository files navigation

Personal Aggregator

Prerequisites

Installation

Configuration (rules.json)

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Configuration (`rules.json`)

Packages