A wordfeud solver that uses a fine-tuned tesseract model to parse the board and rack from a screenshot and then solve for the highest scoring move.
- Python 3.12
- Tessaract OCR (
brew install tesseract)
Install dependencies:
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txtSet the TESSDATA_PREFIX environment variable to the dataset directory:
export TESSDATA_PREFIX=$(pwd)/datasetTake screenshot of your wordfeud board (only works on iOS), place it in it's own directory in the screenshots directory i.e:
├── screenshots
│ ├── IMG_0083
│ │ └── screenshot.pngRun the main script to parse (with the model you copied in the installation, otherwise defaults to eng) and solve all the screenshots in the screenshots directory.
python main.py -m words-with-cheaters --solvemain.py will set up the json files for you, but you will need to validate they are accurate.
├── screenshots
│ ├── IMG_0083
│ │ ├── board.json
│ │ ├── rack.json
│ │ └── screenshot.pngThe algorithm could be a lot faster but it generally solves for all possible words in <10 seconds for a 15x15 board with 7 tiles, including wild cards on an M2 in a single thread.
The way it works is to check every valid series on the board (a valid series includes exists if it touches another tile) for every length of word at and below the rack length as a pattern in the dictionary. It then checks if the rack can satisfy the resulting words before checking the whole board for validty and scoring the placement.
To improve the OCR training, first prepare a dataset for the OCR trainer:
Set up enough screenshot.png files with accurate board.json and rack.json files in the screenshots directory. Then run the prepare_dataset.py script to generate the training data:
python prepare_dataset.pyClone tesstrain next to this project:
cd .. & git clone git@github.com:tesseract-ocr/tesstrain.gitThen run the following command to generate the training data:
make training MODEL_NAME=words-with-cheaters \
START_MODEL=words-with-cheaters \
TESSDATA=../words-with-cheaters/dataset \
GROUND_TRUTH_DIR=../words-with-cheaters/dataset/trainingAnd update the model in the repository:
cp data/words-with-cheaters.traineddata ../words-with-cheaters/datasetsFinally, use your trained model to OCR the screenshots.
python main.py -m words-with-cheaters --solveNote: The OCR will not run if there are already board.json and rack.json files in the screenshot directory.
-
Serve the solver as an API, running this on a smaller machine might show that a optimized algorithm is necessary.
-
Implement a strategy algorithm to consider:
- Word length (as there is a significant bonus to finishing as fast as possible).
- Availability of multipliers produced by the move.
- Holding high value tiles if their value isn't being maximized by multipliers.
-
The dictionary is not complete.
-
Previously used wild cards should not count for points in any future moves. This will need to be encoded in the board state.
-
Parse a screenshot of the board and rack to get the board state and rack, this could also solve the above.