The makemore-tr project aims to generate authentic-sounding Turkish names using a deep learning model. The model architecture is based on the paper A Neural Probabilistic Language Model, 'Bengio et al. 2003'.
This project was inspired by and learned from Andrej Karpathy's tutorial 'Building makemore Part 2: MLP'.
The project consists of three main notebooks:
-
Data Cleaning (
data-cleaning.ipynb)- Cleans the Turkish names dataset
- Removes duplicates and unwanted characters
- Prepares a standardized list of names
-
Model Training (
makemore-tr.ipynb)- Implements a character-level language model using PyTorch
- Sets up vocabulary and creates datasets
- Trains neural network for name generation
-
Manual Backpropagation Implementation (
manual_backprop_tr.ipynb)- Implements backpropagation from scratch without using
loss.backward() - Provides deep insights into gradient flow and neural network training
- Includes batch normalization and optimization techniques
- Implements backpropagation from scratch without using
- PyTorch: Primary deep learning framework
- Used for tensor operations
- Neural network model implementation
- Loss calculation and optimization
- Python Standard Library
- File handling and text processing
- Data structure manipulation
- Random number generation
- Matplotlib
- Training progress visualization
- Loss curves plotting
- Model performance analysis
- Neural Network Components
- Embedding layer for character encoding
- Multi-layer perceptron (MLP)
- Batch normalization
- Tanh activation function
- Cross-entropy loss
- Jupyter Notebook
- Interactive development
- Code execution and experimentation
- Documentation and visualization
To run the notebooks, ensure you have the following installed:
torch>=1.0.0
matplotlib>=3.0.0
jupyter>=1.0.0
Here are some sample Turkish names generated by the trained model:
cant
süze
ergin
topvar
erk
can
say
ker
yıldıralp
evi
kara
dorulhan
gökmeter
ağatarakan
aslan
serkoç
nur
tapdsel
salkuşa
yurdu
Special thanks to:
- Kamil Toraman for providing the raw dataset
- Andrej Karpathy for the educational content and inspiration