You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+49-8Lines changed: 49 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -53,12 +53,12 @@ The original paper uses [Caltech-UCSD Birds][2], [MIT Scenes][3] and [Oxford Flo
53
53
54
54
The [Tiny-Imagenet][6] dataset was used and the 200 odd classses were split into 4 tasks with 50 classes being assigned to each task randomly. This division can also be arbitrary and no speciaal consideration has been given to the decision to split the dataset evenly. Each of these tasks has a "train" and a "test" folder to validate the performance on these wide ranging tasks.
55
55
56
+
The purpose behind using the MNIST dataset was to introduce some tasks that were significantly different to the ones in the Tiny Imagenet dataset. This is an attempt to recreate the setting of the original paper on a lower scale
57
+
56
58
57
59
Training
58
60
------------------------------
59
-
Download the first model from this [link][11] and place it in the `models` folder. This is because the paper assumes that the first expert is an Alexnet model pretrained on the ImageNet and the rest of this implementation is built on this assumption.
60
-
61
-
Training a model on a given task takes place using the **`main.py`** file. Simply execute the following lines to begin the training process
61
+
Training a model on a given task takes place using the **`generate_models.py`** file. Simply execute the following lines to begin the training process
62
62
63
63
Execute the following lines of code (along with the necessary arguments) to generate to generate the expert models for the 4 tasks
64
64
@@ -68,8 +68,8 @@ python3 generate_models.py
68
68
The file takes the following arguments
69
69
70
70
****init_lr***: Initial learning rate for the model. The learning rate is decayed every 5 epochs.**Default**: 0.1
71
-
****num_epochs_encoder***: Number of epochs you want to train the encoder model for. **Default**: 15
72
-
****num_epochs_model***: Number of epochs you want to train the model for. **Default**: 40
71
+
****num_epochs_encoder***: Number of epochs you want to train the encoder model for. **Default**: 5
72
+
****num_epochs_model***: Number of epochs you want to train the model for. **Default**: 15
73
73
****batch_size***: Batch Size. **Default**: 16
74
74
****use_gpu***: Set the GPU flag to ``True`` to use the GPU. **Default**: ``False``
75
75
@@ -89,17 +89,24 @@ Once you invoke the **`generate_models.py`** file with the appropriate arguments
89
89
90
90
Refer to the docstrings and the inline comments that are made in `encoder_train.py` and `model_train.py` for a more detailed view
91
91
92
+
### MAKE SURE THAT YOU TRAIN THE MODEL FOR ATLEAST 10 EPOCHS BUT ALSO KEEP IT BELOW 25 EPOCHS
93
+
94
+
Training procedure is really volatile, and these were the boundaries that I could find. I did not carry out an extensive search over the optimum number of epochs and these boundaries were obtained from initial tests. For this range, the loss function **atleast returned a numerical value**, however even in this case, if the model gets stuck in a bad optimum, the loss function starts giving out NaN values and this snowballs into the model not learning at all.
95
+
92
96
93
97
Evaluating the model
94
98
-------------------------------
95
99
96
100
To recreate the experiments performed, first execute the following lines of code
97
101
98
102
```sh
99
-
python3 data_prep.py
103
+
cd data_utils
104
+
python3 data_prep_tin.py
105
+
python3 data_prep_mninst.py
106
+
cd ../
100
107
```
101
108
102
-
This will download the tiny-imagenet dataset to the Data folder and split it into 4 tasks with each task consisting of 50 classes each. The directory structure of the downloaded datasets would be:
109
+
This will download the tiny-imagenet dataset (TIN) and the MNIST dataset to the Data folder and split it into 4 + 5 tasks with each task consisting of 50 classes (TIN) + 2 classes (MNIST) each. The directory structure of the downloaded datasets would be:
103
110
104
111
```
105
112
Data
@@ -120,9 +127,43 @@ Next to assess how well the model adapts to a particular task at hand, execute t
120
127
python3 test_models.py
121
128
```
122
129
123
-
****task_number***: Select the task you want to test out the ensemble with; choose from 1-4 **Default**: 1
124
130
****use_gpu***: Set the GPU flag to ``True`` to use the GPU. **Default**: ``False``
125
131
132
+
Results
133
+
--------------------------------------
134
+
135
+
My system could not handle all the number of tasks in this sequence (9 in all) and it frequently froze up before completion. The test_models module is `O(number_of_tasks X number_of_tasks X sizeof(task))`. This is necessary since for each task we need to search over all the autoencoders created for the best performing model and activate the corresponding trained_model over which the final epoch_accuracy is calculated. Due to this, I manually cut the number of classes in each of the TIN dataset to 25 and used only one of the task from the MNIST dataset. It is quite clear from the architecture proposed in this paper that this is not optimum.
136
+
137
+
**Another key caveat** is that in all these trained models that are derived from the Alexnet architecture, only the last two convolutional layers and the classification layers are being trained. The rest of the layers are frozen and hence are not trained and the results are reported for this setting
138
+
139
+
The present `test_models.py` is written assuming that your system can handle all the tasks in the full seqeunce. Please make the necessary changes to make the testing procedure compatible with your compuatational requirements.
140
+
141
+
The results reported are for this particular setting [Number of epochs used for training: 15]:
142
+
143
+
**Input_Task_Number**: The task that was fed to the model\
144
+
**Model_activated**: The model that was identifed for this task. The correct model was identified in these cases\
145
+
**Accuracy**: Has been rounded to the nearest two decimals [number of right labels identfied]
146
+
147
+
148
+
| Input Task_number| Model activated | Accuracy (in %)|
149
+
| :------------: | :----------: | -----------: |
150
+
| 3 | 3 | 63 |
151
+
| 1 | 1 | 64 |
152
+
| 5 | 5 | 59 |
153
+
| 2 | 2 | 54 |
154
+
| 4 | 4 | 69 |
155
+
156
+
157
+
Final Takeaways
158
+
--------------------------------
159
+
The ideas proposed in this model; loads only the required model into memory at inference. However it is a really expensive procedure to search over all the autoencoders to identify the correct model and this situation will only get worse with an increasing number of tasks. Clearly this would not scale to much longer sequences. It is also not clear how the authors stabilized the training procedure for the **"Learning without Forgetting"** approach.
160
+
161
+
162
+
To-Do's for this Project
163
+
---------------------------------
164
+
-[] Figure out ways to stablize the training procedure, have isolated the problem to the distillation loss calculation
0 commit comments