A text prediction API using Prediction by Partial Matching (PPM). Train on any text to create personalized predictions, or use the default English training text.
- Train on any text of your choice
- Session-based models for individual customization
- Predict at letter, word, or sentence level
- Falls back to default English training if no custom training provided
- Node.js (>=20.0.0)
- npm (>=9.0.0)
-
Clone the repository:
git clone https://github.com/willwade/PPM-API.git cd PPM-API -
Install dependencies:
npm install
-
(Optional) Install Python dependencies for generating training text:
pip install datasets
Run the following command to start the API:
npm startThe API will be available at http://localhost:8080.
Once running, view the full API documentation at:
http://localhost:8080/api-docs
- Train a Model (Optional)
curl -X POST http://localhost:8080/train \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.gutenberg.org/cache/epub/19778/pg19778.txt",
"maxOrder": 5
}'Response:
{
"success": true,
"sessionId": "550e8400-e29b-41d4-a716-446655440000",
"message": "Training complete",
"trainingTimeMs": 1234,
"vocabularySizes": {
"letter": 52,
"word": 2000,
"sentence": 500
}
}- Make Predictions
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-H "x-session-id: 550e8400-e29b-41d4-a716-446655440000" \
-d '{
"input": "The quick brown",
"level": "word",
"numPredictions": 5
}'Response:
json
{
"input": "The quick brown",
"level": "word",
"predictions": [
{
"symbol": "fox",
"probability": 0.4,
"logProbability": -0.916
},
// ... more predictions
],
"contextOrder": 3,
"perplexity": 2.5
}You can train the model in two ways:
- Using a URL:
{
"url": "https://www.gutenberg.org/cache/epub/19778/pg19778.txt",
"maxOrder": 5
}- Using Direct Text:
{
"text": "Your training text here",
"maxOrder": 5
}Note: Provide either url OR text, but not both.
The API supports three prediction levels:
letter: Character-by-character predictionword: Word-by-word predictionsentence: Full sentence prediction
- When you train a model, you receive a
sessionId - Use this
sessionIdin thex-session-idheader for subsequent predictions - If no
sessionIdis provided, the API uses default English training text
- Fork this repository
- Connect your DigitalOcean account
- Create a new App from your forked repository
- Deploy using Node.js settings:
- Environment: Node.js
- Build Command:
npm install - Run Command:
npm start
To generate training text from datasets (Alice in Wonderland, AAC-like phrases, filtered dialogue):
- Run the Python script:
python generate_training_text.py
- The generated text will be saved to
training_text.txt.
If you encounter any issues, please open an issue.
This project is licensed under the GPL v 3.0 License - see the LICENSE file for details.
PPM JS Was developed by Google Research - https://github.com/google-research/google-research/tree/master/jslm