This project serves as an introduction to PyTorch and Weights & Biases (wandb) by implementing and experimenting with deep learning models for image and text classification. The primary objectives include understanding PyTorch’s computation graphs, implementing a basic classifier, logging experiments with wandb, and modifying the baseline model to enhance performance.
├── data/
│ ├── img_train.csv
│ ├── img_val.csv
│ ├── img_test.csv
│ ├── txt_train.csv
│ ├── txt_val.csv
│ ├── txt_test.csv
├── img_classifier.py
├── txt_classifier.py
├── README.md
To set up the environment locally, follow these steps:
- Install Miniconda and create a Python environment:
conda create -n py312 python=3.12 conda activate py312
- Install PyTorch (latest stable version):
pip install torch torchvision torchaudio
- Install Weights & Biases for experiment tracking:
pip install wandb
- Install additional dependencies:
pip install pandas
For Google Colab, set the runtime type to GPU and mount Google Drive before running the code.
This task involves classifying images into three categories: Parrot, Narwhal, Axolotl. The dataset consists of generated images, and the classifier is implemented using a simple feedforward neural network.
The initial image classification model is a fully connected feedforward neural network (MLP) with the following architecture:
- Input Layer: 256x256x3 flattened image vector.
- Hidden Layer 1: Fully connected layer with 512 neurons and ReLU activation.
- Hidden Layer 2: Fully connected layer with 512 neurons and ReLU activation.
- Output Layer: Fully connected layer with 3 neurons (for classification) and a softmax function.
The loss function used is CrossEntropyLoss, and the optimizer used is Stochastic Gradient Descent (SGD).
- Implemented a baseline classifier with two hidden layers of 512 units each.
- Integrated Weights & Biases for logging.
- Modified the model to support grayscale images.
- Experimented with different optimizers (e.g., Adadelta vs. SGD).
- Implemented a convolutional neural network (CNN) to compare performance with the feedforward model.
| Model | Train Accuracy | Test Accuracy |
|---|---|---|
| Baseline (MLP) | 98.22% | 73.19% |
| CNN Model | 95.78% | 74.22% |
This task involves classifying news articles as real or fake, using an LSTM-based model and a simple bag-of-words model.
The baseline model consists of a Bag-of-Words (BoW) representation using nn.EmbeddingBag.
The improved model replaces this with an LSTM-based classifier, structured as follows:
- Embedding Layer: Converts words to dense vectors.
- LSTM Layer: Captures sequential dependencies in text.
- Pooling Layer: Adaptive max pooling to aggregate features.
- Fully Connected Layer: Maps the pooled LSTM outputs to class scores.
The loss function used is CrossEntropyLoss, and the optimizer used is Adam.
- Implemented a baseline classifier using
nn.EmbeddingBag. - Replaced it with an LSTM-based classifier with max-pooling.
- Switched from SGD to Adam optimizer.
- Analyzed performance differences and how padding affects LSTM models.
| Model | Validation Accuracy | Test Accuracy |
|---|---|---|
| Baseline (Bag-of-Words) | ~92% | ~91% |
| LSTM Model | ~89% | ~88% |
Training was conducted over multiple runs with different hyperparameters and architectures. Each run was logged using Weights & Biases (wandb) to track training and validation performance.
- Loss curves and accuracy plots were analyzed to identify underfitting/overfitting trends.
- Hyperparameters such as learning rate, batch size, and optimizer choice were adjusted.
- Model architectures were iteratively refined to improve generalization.
- PyTorch’s
autogradsystem enables easy computation of gradients. - Weights & Biases helps in tracking experiment results effectively.
- CNNs generalize better than MLPs for image classification.
- LSTMs are sensitive to padding, and choosing the right architecture is crucial.
- Train the image classifier:
python img_classifier.py
- Train the text classifier:
python txt_classifier.py
- View experiment logs on Weights & Biases by navigating to wandb.ai.
- The dataset is synthetically generated for classification tasks.
- Some experiments, like grayscale image classification, show the impact of color information on predictions.
This project is part of 10-623 Generative AI at Carnegie Mellon University, with datasets and starter code provided by the course instructors.


