The Python Graphical Authorship Attribution Program (PyGAAP) is an experimental reimplementation of the Duquesne University Evaluating Variations in Language Lab's JGAAP. Currently, PyGAAP is in early development. Although participation in the development and testing of PyGAAP is encouraged, it is not ready for actual text analysis. For the latest updates to the code, please see the developing
branch.
For users:
- Quickly create and customize a text data processing pipeline with pre-processing, feature extraction and filtering, text embedding, and classifier modules
- Save and load corpuses
- Perform batch experiments in the command-line with csvs
- Search for modules, including those with alternative names
- Parallel processing with Python's
Multiprocessing
For developers:
- Modular backend: match input/output types and it'll work
- Utilities to test-run an experiment without invoking a front-end
Some features from JGAAP are yet to be implemented in PyGAAP, including:
- Many of the text and analysis modules
- Extensive logging
To contribute to PyGAAP, simply fork the repository, create a new branch, make your desired changes, and submit a pull request. While adding a new module, you may find the developer manual useful. Additionally, please consider opening an issue on this repository with an explanation of your planned contribution so that we may track who is working on what.
- Clone the PyGAAP Git repository.
- Install Python 3. Depending on your Operating System, it may already be installed.
- Install the Python libraries required by PyGAAP. If you use pip, you can easily install the required libraries by executing one of the following commands from the root PyGAAP directory:
pip install -r requirements.txt
.python -m pip install -r requirements.txt
pip3 install -r requirements.txt
python3 -m pip install -r requirements.txt
- Run
python PyGAAP.py
to launch the PyGAAP GUI. Alternatively, PyGAAP can be executed on command line as well. Runpython PyGAAP.py -h
to print the command line help.
If you are having issues with PyGAAP that require support, please open an issue on this repository. As a reminder, PyGAAP is in early stages of development and should not be used for serious text analysis. If you require stable text analysis software, please use JGAAP instead.