Professor: Ivanovitch Medieros Dantas da Silva
Student: Luiz Eduardo Nunes Cho-Luck - 20241012311
This project was developed as the final assessment for the course PPGEEC2318 - Machine Learning. The primary objective of this initial phase was to design five versions to a convolutional neural network (CNN) for the classification of cloud types based on ground-level imagery. Before proceeding, a brief explanation of clouds and their classifications is provided.
Clouds are visible collections of water droplets, ice particles, or a combination of both, suspended in the atmosphere. They often also contain particles such as dust, smoke, and industrial residues. Clouds are continuously evolving, frequently changing their shape, size, and appearance, which are mainly determined by two properties:
-
Luminance—the amount of light reflected, transmitted, or scattered by cloud particles;
-
Color, influenced by incident light from natural or artificial sources (city lights).
-
Color, influenced by incident light from natural or artificial sources (city lights).
Clouds are classified into three main categories: Upper, Middle, and Lower. The classification is based on the altitude at which the clouds base are found. Table 1 presents the most frequent vertical distribution of clouds in the three main regions of the Earth:
Table 1 - Most Frequent Vertical Distribution of Clouds
| Layer | Polar Regions | Temperate Regions | Tropical Region |
|----------|---------------|-------------------|-----------------|
| Upper | 3 to 8 km | 5 to 13 km | 6 to 18 km |
| Middle | 2 to 4 km | 2 to 7 km | 2 to 8 km |
| Lower | up to 2 km | up to 2 km | up to 2 km |
Source: WMO (1956).
Along with their altitude, clouds are also classified based on their shape, which can be seen in Figure 1.
Source: UCAR CENTER FOR SCIENCE EDUCATION
As can be seen in Figure 1, only two types of clouds can produce precipitation: cumulonimbus and nimbostratus. Cumulonimbus clouds are characterized by their tall, vertical shape and are associated with thunderstorms, while nimbostratus clouds are characterized by their flat, horizontal shape and are associated with persistent, widespread precipitation.
- Original Source: Thitinan Kliangsuwan. Cloud Type Classification 3, 2022. and Howard-Cloud-X.
The two original datasets showed several inconsistencies regarding the previously labeled cloud types. Therefore, a manual selection of images was necessary to obtain the best dataset for training and validation. Additionally, due to the similarity between some cloud classes, five types were selected along with the “Clear Sky” class. The chosen classes were: Cirrus, Altocumulus, Cumulonimbus, Cumulus, Nimbostratus, and Clear Sky.
Figure 2 shows a randomly chosen sample from each class, selected from the training data.
After this process, the final dataset had a total of 657 images for train and 165 for validation.
We've tested 5 versions of a CNN for image classification tasks with 6 output classes.
Some characteristics are shared across all four versions:
-
Input: RGB images (3 channels)
-
Dropout: Optional dropout layers can be applied for regularization (30%)
-
Fully Connected Layers: One hidden (50 units) and one output layer (6 units for class scores)
-
Forward Pass: Data flows sequentially through the featurizer (convolutions and pooling) and the classifier (fully connected layers)
-
Loss Function: Cross-Entropy Loss (
nn.CrossEntropyLoss) -
Batch Size: 16
-
Learning Rate: 3e-4
- The Personal Model 2 had different Learning Rate (LR), becouse we apply a function to find the best LR for the model.
Table 2 presents the main differences between the proposed models.
Table 2 - Description of the differences in model configurations
| Model Name | Optimizer | Featues | Conv. Layers | Activation Func.| Ephocs |Input Size|
|------------------|-----------|---------|--------------|-----------------|--------|----------|
| Base Model (BM) | Adam | 5 | 2 | ReLU | 10 | 28x28 |
| BM + n_feature | Adam | 15 | 2 | ReLU | 10 | 28x28 |
| BM + conv blocks | Adam | 5 | 4 | ReLU | 10 | 28x28 |
| Personal Model 1 | AdamW | 5 | 2 | ELU | 154 | 128x128 |
| Personal Model 2 | AdamW | 5 | 2 | ELU | 56 | 128x128 |
To avoid a long README file, we will highlight the mainly results. For more details, please acess the notebooks of each experiment in the folder notebooks.
Across all five CNN configurations tested, the confusion matrices (Figures 03 to 07) revealed that overall classification performance was strongly influenced by architectural choices such as the number of convolutional layers, the number of feature maps, the activation function, and the optimizer used.
The BM (Figure 03) and BM + n_feature increse (Figure 04) changes generally maintained solid diagonal patterns, showing robust performance despite an increase in parameters when using 15 features. Both models were able to classify Clear Sky and Cumulus clouds with high accuracy in all experiments, while Cirrus hadn't the best performance in the BM, but improved in the BM + n_feature increase.
Figure 03 - Base Model and Base Model with no Dropout (ND)
Figure 04 - BM + n_feature increse and BM + n_feature increse with no Dropout
However, BM + conv blocks changes (Figure 05), with its deeper 4-layer convolutional stack, exhibited clear signs of degradation when dropout was applied—its confusion matrices showed lighter diagonals and more off-diagonal errors, suggesting that excessive regularization in this deeper design led to underfitting or loss of discriminative power. The model just managed to classify Cumulus and Nimbostratus clouds, while the other classes were confused with each other. The confusion matrix without dropout shows a slight improvement, but still not enough to achieve good performance.
Figure 05 - BM + conv blocks changes and BM + conv blocks changes with no Dropout
In contrast, the Personal Model 1 (Figure 06), which combined the AdamW optimizer with ELU activations, consistently delivered the strongest results across all classes, with highly concentrated diagonal entries even with dropout. This suggests that the choice of optimizer and activation function played a key role in improving generalization without sacrificing accuracy.
Figure 06 - Personal Model 1 and Personal Model 1 with no Dropout
For the Personal Model 2, we added a function to find the best learning rate, which resulted in a slightly worse performance than the Personal Model 1, but still better than the other models. The confusion matrix (Figure 07) shows that the model was able to classify all classes with good accuracy, but with some confusion between Cumulus, Cirrus and Nimbostratus.
Figure 07 - Personal Model 2 and Personal Model 2 with no Dropout
Notably, dropout generally helped reduce minor misclassifications in simpler models but was detrimental in the deeper variant, underscoring that regularization needs to be carefully balanced with model capacity. These findings highlight that even subtle changes in optimizer, nonlinearity, and network depth can have meaningful impacts on CNN performance for multi-class image classification.
The accuracy results reinforce the trends observed in the confusion matrices (Table 3). The Personal Model 1 achieved the highest accuracy (75%), showing the benefit of combining the AdamW optimizer and ELU activation in improving generalization and class separation. The Personal Model 2 also performed well (70%), indicating that the choice of learning rate and architecture can significantly influence results, even if it was slightly less effective than the Personal Model 1. BM + n_features slightly outperformed the BM (60% vs 57%), suggesting that increasing the number of features can modestly improve representational power. By contrast, BM + conv_blocks had the lowest accuracy (32%), indicating that simply deepening the network without careful tuning of regularization and capacity led to underfitting or unstable learning. Overall, these results highlight the importance of architectural and optimization choices in achieving robust performance in multi-class image classification tasks.
## Table 3 - Accuracy Results for the Tested Models
| Model Name | Acc | Acc - ND |
|------------------|--------|-----------|
| Base Model | 0.57 | 0.61 |
| Model 1 | 0.60 | 0.65 |
| Model 2 | 0.32 | 0.56 |
| Personal Model 1 | 0.75 | 0.80 |
| Personal Model 2 | 0.70 | 0.77 |
We chose to analyze BM + conv_blocks and the Personal Model 1 because they represent the extremes in performance: BM + conv_blocks had the lowest accuracy and showed clear signs of underfitting or poor feature learning, while the Personal Model 1 achieved the highest accuracy. By comparing their filters and hooks, we can better understand what differentiates well-learned representations from poor ones and gain insights into how architectural choices and regularization affect feature extraction. In particular, by examining the activation maps from both models, we see that the Personal Model’s 1 two-layer design preserves clearer, well-defined features at each convolutional stage, with evident textural and edge patterns (Figure 07).
Figure 07 - Feature Activations Across Layers - Personal Model
In contrast, Model 2, with its deeper four-layer architecture, exhibits increasingly blurred and diffuse activations in the later layers, suggesting over-compression and loss of discriminative information. This difference directly aligns with their classification performance, where the Personal Model achieves significantly higher accuracy by maintaining better hierarchical feature representations.
Figure 08 - Feature Activations Across Layers - Model 2
The experiments demonstrate that even small architectural choices and hyperparameter adjustments can have a substantial impact on multiclass cloud classification performance. Models with deeper convolutional stacks require careful tuning of regularization to avoid underfitting or information loss, as seen in BM + conv_blocks' reduced accuracy and blurred activation maps. In contrast, the Personal Model 1, combining the AdamW optimizer and ELU activation, achieved the highest accuracy by preserving clear, interpretable features across layers and demonstrating strong generalization. These results highlight the importance of balancing model capacity, activation functions, and optimizers to build effective CNNs for challenging visual classification tasks.
- The dataset is relatively small, which may limit generalization to new cloud images.
- Class imbalance was only partially addressed through manual selection.
- The models were trained on low-resolution inputs (e.g., 28x28 or 128x128), potentially missing finer textural details.
- No systematic hyperparameter tuning (e.g., grid search) was conducted.
- Increase dataset size with more labeled images.
- Apply data augmentation systematically.
- Explore advanced architectures like ResNet or EfficientNet.
- Use automated hyperparameter search to optimize dropout, and filter count.
- Evaluate model robustness under varying lighting conditions.
- Howard-Cloud-X Dataset (Kaggle).
- PyTorch Documentation: https://pytorch.org
- Thitinan Kliangsuwan. Cloud Type Classification 3 (Kaggle, 2022).
- UCAR Center for Science Education. Cloud Types
- World Meteorological Organization. International Cloud Atlas, 1956.













