Sudoku Classification Project

Introduction

This project aims to explore the learning capabilities of neural network-based models, specifically in understanding complex positional relationships. While CNNs can analyze individual images, we want to test whether a neural network can infer relationships when images are presented together, as in a Sudoku puzzle. Sudoku requires the understanding that each number must appear only once in every row and column. Instead of relying on separate algorithms, we aim to determine if the neural network can learn this concept on its own.

Dataset

The dataset for this project is available on Kaggle: Sudoku Images Based on MNIST. It contains the following files:

sudoku_testing_images.h5 (10k images)
sudoku_testing_set.h5 (30k images)
sudoku_training_images_final1.h5 (100k images)
sudoku_training_set.h5 (400k images)
sudoku_validation_images_20k.h5 (10k images)
sudoku_validation_set.h5 (30k images)

Dataset Details

The images were generated using the code provided in the Dataset directory of the associated repository. The dataset was created using MNIST digits, carefully partitioned into training, validation, and testing sets to ensure that the validation and test sets contain unseen data. The images adhere to Sudoku rules, ensuring unique digits in each row, column, and 3x3 grid. Each image is unique, with no repeated samples.

Experiments and Evaluation

This project explored three neural network architectures to classify Sudoku and non-Sudoku images using a dataset of 84x84 pixel images. Below are the details of each architecture:

A. Auto-Encoder (AE)

The AE uses an encoder-decoder structure to learn a compact representation of the input image. The latent space representation was later used with a classifier head for binary classification.

Architecture:

Encoder: Four convolutional layers for downsampling, followed by fully connected layers for encoding.
Decoder: Transposed convolution layers to reconstruct the image.
Classifier: A fully connected layer applied on the latent representation.

B. Variational Auto-Encoder (VAE)

The VAE builds on the AE by introducing probabilistic components, enabling better generalization through a distributional latent space.

Architecture:

Encoder: Outputs the mean and log-variance of the latent space for reparameterization.
Decoder: Reconstructs the input from the sampled latent vector.
Classifier: A fully connected layer applied on the latent vector.

C. Convolutional Neural Network (CNN)

The CNN architecture uses convolutional layers to extract hierarchical spatial features, with fully connected layers for classification.

Architecture:

Convolutional Layers: Four layers with ReLU activations and max-pooling.
Fully Connected Layers: Used after feature extraction for classification.
Output Layer: Sigmoid activation function for binary classification.

Breaking Through Performance Plateaus

Challenges:

Initially, all models performed poorly:

CNN Accuracy: 50-65%
AE and VAE Accuracy: Even lower than CNN.

Solutions:

Dataset Expansion: Increased the dataset size from 100k to 400k training samples, ensuring balanced classes and greater diversity.
Hyperparameter Tuning: Extensively tuned hyperparameters to improve model performance.
Model Refinement: Enhanced the CNN architecture to better capture intricate patterns.

These changes helped the CNN achieve a breakthrough in performance, reaching a testing accuracy of 87%, while AE and VAE showed limited improvements.

Final Results

Model	Accuracy
Auto-Encoder (AE)	51.01%
Variational Auto-Encoder (VAE)	68.56%
Convolutional Neural Network (CNN)	87.00%

Usage

1. Use the Pre-trained Models

You can find the trained models at the following drive link: Trained Models.

For testing:

A separate testing notebook was created to evaluate the performance of the three models. This notebook is designed to:

Load the testing dataset.
Load the trained models: Auto-Encoder (AE), Variational Auto-Encoder (VAE), and Convolutional Neural Network (CNN).
Run the testing dataset through each model to obtain predictions.
Present a selection of test examples along with insights into the model performances.

You can use testing_models.ipynb located in the models directory of this repository.

2. Train Your Own Models

To generate your own dataset, you can use the notebook Sudoku_Dataset_Generation.ipynb, which is designed to create the Sudoku and non-sudoku dataset based on MNIST digits.

For training your own models, the following notebooks are available in the code directory:

CNN.ipynb: This notebook contains the code to train the Convolutional Neural Network (CNN).
autoencoder.ipynb: This notebook contains the code to train the Auto-Encoder (AE).
variational-autoencoder.ipynb: This notebook contains the code to train the Variational Auto-Encoder (VAE).

Conclusion

This project demonstrated the efficiancy of CNNs in classifying Sudoku and non-Sudoku images, particularly when combined with a larger and more diverse dataset. Despite initial struggles, the CNN significantly outperformed AE and VAE architectures, establishing its robustness and capacity for generalization.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
Code		Code
Dataset		Dataset
Models		Models
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sudoku Classification Project

Introduction

Dataset

Dataset Details

Experiments and Evaluation

A. Auto-Encoder (AE)

Architecture:

B. Variational Auto-Encoder (VAE)

Architecture:

C. Convolutional Neural Network (CNN)

Architecture:

Breaking Through Performance Plateaus

Challenges:

Solutions:

Final Results

Usage

1. Use the Pre-trained Models

2. Train Your Own Models

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Nayalaith/CV-Project

Folders and files

Latest commit

History

Repository files navigation

Sudoku Classification Project

Introduction

Dataset

Dataset Details

Experiments and Evaluation

A. Auto-Encoder (AE)

Architecture:

B. Variational Auto-Encoder (VAE)

Architecture:

C. Convolutional Neural Network (CNN)

Architecture:

Breaking Through Performance Plateaus

Challenges:

Solutions:

Final Results

Usage

1. Use the Pre-trained Models

2. Train Your Own Models

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages