Parse Traveller books into other formats.
By books, we mean any piece of writing related to traveller or a similar RPG system (e.g. rulebooks, content books, roll tables, etc.).
Why do we need to parse books?
This is because most books are copyrighted. Distributing the content of these books without explicit permission is illegal. It's pretty safe to assume that there are going to be publishers who won't allow free distribution of their content, for obvious reasons.
NB: This project only contains descriptions of books, not the content within. To get the content, you need to purchase the original books.
As a show of goodwill, we want to explicitly ask publishers if they are okay with this script supporting parsing their content.
Here's a list of publishers who have said that this is fine:
- Stellagama Publishing
- Mongoose Publishing
Feel free to open an issue if you are a publisher and interested in this.
Some publishers might allow free distribution of their content. This is definitely an avenue to look into.
Stellagama Publishing specifically has shown interest in this.
We distinctly separate parsing of books from outputting content. This allows for greater flexibility:
- Parsing code doesn't need to know how the content will be used.
- Outputting code doesn't need to know how the content was parsed.
First, we convert the content within books into a machine-readable format, in the form of "Traveller objects".
TODO: Document the traveller object formats.
The code that runs is identical for all books. This makes it easier to add new books.
To account for differences between books, there are 'book description' files.
These are JSON files describing the book (see book_descriptions
folder for examples).
TODO: This is not implemented yet.
After parsing the books, we output the parsed objects into various formats.
-
- This is used by Tabula to extract tables from PDFs.
-
pdftohtml (version 4.x) from XpdfReader
-
This is used to convert PDFs to HTML. To then be parsed further.
-
Installing pdftohtml:
- It's available in package managers under the name
xpdf-tools
(e.g. in Scoop). - It is pre-packaged with some Linux distributions (e.g. Ubuntu).
- You can download it here (under "Download the Xpdf command line tools").
Note: If
pdftohtml
is not globally installed, you can setPDF_TO_HTML_EXECUTABLE
env var to the location of the executable. - It's available in package managers under the name
-
Note: The code is tested on Windows 11. But it should work fine on Linux and possibly Mac.
- Clone this repository.
- Install dependencies using poetry:
poetry install
- Run the CLI to see available commands:
poetry run traveller-book-parser
- You can also run
poetry shell
to start a new sub-shell. And then run the CLI withtraveller-book-parser
.
- You can also run
There is a cli.ps1
PowerShell script that does everything above (passing any arguments to the CLI).
The script can be configured using environment variables.
(You can create a .env
file in the root directory to set these as well.)
See traveller_book_parser/settings/settings.py
for a list of all settings.
You can also run:
traveller-book-parser schema Settings
This will dump the JSON schema of the Settings
model (by default to /data/output/schema/Settings.json
).
This project is open to contributions. Feel free to open an issue or pull request.
Install just to run utility commands.
To run linters, run:
just lint
To run tests, run:
just test
To run tests and update snapshots, run:
just test_update