Skip to content

Commit eac5d71

Browse files
committed
Adds the results to the README
1 parent 454d439 commit eac5d71

File tree

1 file changed

+49
-8
lines changed

1 file changed

+49
-8
lines changed

README.md

Lines changed: 49 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -53,12 +53,12 @@ The original paper uses [Caltech-UCSD Birds][2], [MIT Scenes][3] and [Oxford Flo
5353

5454
The [Tiny-Imagenet][6] dataset was used and the 200 odd classses were split into 4 tasks with 50 classes being assigned to each task randomly. This division can also be arbitrary and no speciaal consideration has been given to the decision to split the dataset evenly. Each of these tasks has a "train" and a "test" folder to validate the performance on these wide ranging tasks.
5555

56+
The purpose behind using the MNIST dataset was to introduce some tasks that were significantly different to the ones in the Tiny Imagenet dataset. This is an attempt to recreate the setting of the original paper on a lower scale
57+
5658

5759
Training
5860
------------------------------
59-
Download the first model from this [link][11] and place it in the `models` folder. This is because the paper assumes that the first expert is an Alexnet model pretrained on the ImageNet and the rest of this implementation is built on this assumption.
60-
61-
Training a model on a given task takes place using the **`main.py`** file. Simply execute the following lines to begin the training process
61+
Training a model on a given task takes place using the **`generate_models.py`** file. Simply execute the following lines to begin the training process
6262

6363
Execute the following lines of code (along with the necessary arguments) to generate to generate the expert models for the 4 tasks
6464

@@ -68,8 +68,8 @@ python3 generate_models.py
6868
The file takes the following arguments
6969

7070
* ***init_lr***: Initial learning rate for the model. The learning rate is decayed every 5 epochs.**Default**: 0.1
71-
* ***num_epochs_encoder***: Number of epochs you want to train the encoder model for. **Default**: 15
72-
* ***num_epochs_model***: Number of epochs you want to train the model for. **Default**: 40
71+
* ***num_epochs_encoder***: Number of epochs you want to train the encoder model for. **Default**: 5
72+
* ***num_epochs_model***: Number of epochs you want to train the model for. **Default**: 15
7373
* ***batch_size***: Batch Size. **Default**: 16
7474
* ***use_gpu***: Set the GPU flag to ``True`` to use the GPU. **Default**: ``False``
7575

@@ -89,17 +89,24 @@ Once you invoke the **`generate_models.py`** file with the appropriate arguments
8989

9090
Refer to the docstrings and the inline comments that are made in `encoder_train.py` and `model_train.py` for a more detailed view
9191

92+
### MAKE SURE THAT YOU TRAIN THE MODEL FOR ATLEAST 10 EPOCHS BUT ALSO KEEP IT BELOW 25 EPOCHS
93+
94+
Training procedure is really volatile, and these were the boundaries that I could find. I did not carry out an extensive search over the optimum number of epochs and these boundaries were obtained from initial tests. For this range, the loss function **atleast returned a numerical value**, however even in this case, if the model gets stuck in a bad optimum, the loss function starts giving out NaN values and this snowballs into the model not learning at all.
95+
9296

9397
Evaluating the model
9498
-------------------------------
9599

96100
To recreate the experiments performed, first execute the following lines of code
97101

98102
```sh
99-
python3 data_prep.py
103+
cd data_utils
104+
python3 data_prep_tin.py
105+
python3 data_prep_mninst.py
106+
cd ../
100107
```
101108

102-
This will download the tiny-imagenet dataset to the Data folder and split it into 4 tasks with each task consisting of 50 classes each. The directory structure of the downloaded datasets would be:
109+
This will download the tiny-imagenet dataset (TIN) and the MNIST dataset to the Data folder and split it into 4 + 5 tasks with each task consisting of 50 classes (TIN) + 2 classes (MNIST) each. The directory structure of the downloaded datasets would be:
103110

104111
```
105112
Data
@@ -120,9 +127,43 @@ Next to assess how well the model adapts to a particular task at hand, execute t
120127
python3 test_models.py
121128
```
122129

123-
* ***task_number***: Select the task you want to test out the ensemble with; choose from 1-4 **Default**: 1
124130
* ***use_gpu***: Set the GPU flag to ``True`` to use the GPU. **Default**: ``False``
125131

132+
Results
133+
--------------------------------------
134+
135+
My system could not handle all the number of tasks in this sequence (9 in all) and it frequently froze up before completion. The test_models module is `O(number_of_tasks X number_of_tasks X sizeof(task))`. This is necessary since for each task we need to search over all the autoencoders created for the best performing model and activate the corresponding trained_model over which the final epoch_accuracy is calculated. Due to this, I manually cut the number of classes in each of the TIN dataset to 25 and used only one of the task from the MNIST dataset. It is quite clear from the architecture proposed in this paper that this is not optimum.
136+
137+
**Another key caveat** is that in all these trained models that are derived from the Alexnet architecture, only the last two convolutional layers and the classification layers are being trained. The rest of the layers are frozen and hence are not trained and the results are reported for this setting
138+
139+
The present `test_models.py` is written assuming that your system can handle all the tasks in the full seqeunce. Please make the necessary changes to make the testing procedure compatible with your compuatational requirements.
140+
141+
The results reported are for this particular setting [Number of epochs used for training: 15]:
142+
143+
**Input_Task_Number**: The task that was fed to the model\
144+
**Model_activated**: The model that was identifed for this task. The correct model was identified in these cases\
145+
**Accuracy**: Has been rounded to the nearest two decimals [number of right labels identfied]
146+
147+
148+
| Input Task_number| Model activated | Accuracy (in %)|
149+
| :------------: | :----------: | -----------: |
150+
| 3 | 3 | 63 |
151+
| 1 | 1 | 64 |
152+
| 5 | 5 | 59 |
153+
| 2 | 2 | 54 |
154+
| 4 | 4 | 69 |
155+
156+
157+
Final Takeaways
158+
--------------------------------
159+
The ideas proposed in this model; loads only the required model into memory at inference. However it is a really expensive procedure to search over all the autoencoders to identify the correct model and this situation will only get worse with an increasing number of tasks. Clearly this would not scale to much longer sequences. It is also not clear how the authors stabilized the training procedure for the **"Learning without Forgetting"** approach.
160+
161+
162+
To-Do's for this Project
163+
---------------------------------
164+
-[ ] Figure out ways to stablize the training procedure, have isolated the problem to the distillation loss calculation
165+
166+
126167

127168
References
128169
----------

0 commit comments

Comments
 (0)