|
| 1 | +# Manipulating embeddings with obfuscations. |
| 2 | + |
| 3 | +## Description |
| 4 | + |
| 5 | +This codebase provides some techniques to create robust embeddings under |
| 6 | +various obfuscations. The code provided here robustifies the embeddings |
| 7 | +themselves, without fine tuning the rest of the model. The intent for this |
| 8 | +is to train models which are robust to obfuscations, without the need of |
| 9 | +retraining a very large architecture from scratch. |
| 10 | + |
| 11 | +The approach taken for this in this repository is done by generating |
| 12 | +extra obfuscated embeddings. These embeddings are trained so that they mimic the |
| 13 | +real obfuscated embeddings of each image, for any given obfuscation type. |
| 14 | +These embeddings are then used as extra data, to train a downstream classifier. |
| 15 | +Modeling these obfuscated embeddings is intended to help the model later |
| 16 | +on classify images under unseen obfuscations. More specifically, the generated |
| 17 | +obfuscated embeddings are used as additional training data when training a |
| 18 | +classifier. |
| 19 | + |
| 20 | +## Methods |
| 21 | + |
| 22 | +The files in this repository cover two basic methods: |
| 23 | + |
| 24 | +- ```multiple_decoders.py```: This trains a model using an autoencoder style |
| 25 | +architecture, with one decoder per obfuscation type. This model receives a |
| 26 | +clean embedding as input, and generates a corresponding obfuscated embedding |
| 27 | +for each obfuscation type. This allows the model to be more flexible, |
| 28 | +as there is a separate portion dedicated to each obfuscation type. |
| 29 | + |
| 30 | +- ```parameter_generator.py```: This trains a model using an autoencoder |
| 31 | +style architecture, where the decoder is not trained, but rather its |
| 32 | +parameters are produced by a different architecture, which is trained. The |
| 33 | +latter receives as input the obfuscation type, and provides as output the |
| 34 | +parameters of the decoder corresponding to each seen obfuscation. |
| 35 | + |
| 36 | +Finally, ```linear_finetuning.py``` is provided, which trains only a linear |
| 37 | +classifier on top of frozen embeddings. A sample run command for this is the |
| 38 | +following: |
| 39 | + |
| 40 | +## Auxiliary files: |
| 41 | + |
| 42 | +- ```configs.py```: File containing metadata for the dataset and the models |
| 43 | +used. |
| 44 | + |
| 45 | +- ```extended_model.py```: File containing architecture definitions for our |
| 46 | + models. |
| 47 | + |
| 48 | +- ```losses.py```: File containing the losses for our models. |
| 49 | + |
| 50 | +- ```obfuscations.py```: File containing definitions for the datasets that |
| 51 | +we use. |
| 52 | + |
| 53 | +## Data required |
| 54 | + |
| 55 | +The provided code can receive data in two formats for the parameter |
| 56 | +```data_dir_train``` (directory of data to be used during training): |
| 57 | + |
| 58 | +- In the case of ```input_feature_name==pixel```, the data is assumed to be |
| 59 | +in the format of ```tf.train.Example``` protos, where each field has a key |
| 60 | +named ```label```, and one key of the form ```image_{obf}```, for each |
| 61 | +obfuscation ```obf``` in the set of valid obfuscations. |
| 62 | + |
| 63 | +- In the case of ```input_feature_name==embed```, the data is assumed to be |
| 64 | +in the format of ```tf.train.Example``` protos, with a key named ```label``` |
| 65 | +containing the label of the image and a key named ```embed```, containing a |
| 66 | +matrix of size $N \times d$, where $N$ is the number of obfuscations and |
| 67 | +$d$ is the dimension of the embedding. |
| 68 | + |
| 69 | +Contributor: Georgios Smyrnis |
0 commit comments