Skip to content

Commit c0cfd23

Browse files
Neural-Link Teamtensorflow-copybara
Neural-Link Team
authored andcommitted
internal change
PiperOrigin-RevId: 500861198
1 parent 0bc8c9c commit c0cfd23

File tree

13 files changed

+3737
-0
lines changed

13 files changed

+3737
-0
lines changed

research/meo/README.md

+69
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Manipulating embeddings with obfuscations.
2+
3+
## Description
4+
5+
This codebase provides some techniques to create robust embeddings under
6+
various obfuscations. The code provided here robustifies the embeddings
7+
themselves, without fine tuning the rest of the model. The intent for this
8+
is to train models which are robust to obfuscations, without the need of
9+
retraining a very large architecture from scratch.
10+
11+
The approach taken for this in this repository is done by generating
12+
extra obfuscated embeddings. These embeddings are trained so that they mimic the
13+
real obfuscated embeddings of each image, for any given obfuscation type.
14+
These embeddings are then used as extra data, to train a downstream classifier.
15+
Modeling these obfuscated embeddings is intended to help the model later
16+
on classify images under unseen obfuscations. More specifically, the generated
17+
obfuscated embeddings are used as additional training data when training a
18+
classifier.
19+
20+
## Methods
21+
22+
The files in this repository cover two basic methods:
23+
24+
- ```multiple_decoders.py```: This trains a model using an autoencoder style
25+
architecture, with one decoder per obfuscation type. This model receives a
26+
clean embedding as input, and generates a corresponding obfuscated embedding
27+
for each obfuscation type. This allows the model to be more flexible,
28+
as there is a separate portion dedicated to each obfuscation type.
29+
30+
- ```parameter_generator.py```: This trains a model using an autoencoder
31+
style architecture, where the decoder is not trained, but rather its
32+
parameters are produced by a different architecture, which is trained. The
33+
latter receives as input the obfuscation type, and provides as output the
34+
parameters of the decoder corresponding to each seen obfuscation.
35+
36+
Finally, ```linear_finetuning.py``` is provided, which trains only a linear
37+
classifier on top of frozen embeddings. A sample run command for this is the
38+
following:
39+
40+
## Auxiliary files:
41+
42+
- ```configs.py```: File containing metadata for the dataset and the models
43+
used.
44+
45+
- ```extended_model.py```: File containing architecture definitions for our
46+
models.
47+
48+
- ```losses.py```: File containing the losses for our models.
49+
50+
- ```obfuscations.py```: File containing definitions for the datasets that
51+
we use.
52+
53+
## Data required
54+
55+
The provided code can receive data in two formats for the parameter
56+
```data_dir_train``` (directory of data to be used during training):
57+
58+
- In the case of ```input_feature_name==pixel```, the data is assumed to be
59+
in the format of ```tf.train.Example``` protos, where each field has a key
60+
named ```label```, and one key of the form ```image_{obf}```, for each
61+
obfuscation ```obf``` in the set of valid obfuscations.
62+
63+
- In the case of ```input_feature_name==embed```, the data is assumed to be
64+
in the format of ```tf.train.Example``` protos, with a key named ```label```
65+
containing the label of the image and a key named ```embed```, containing a
66+
matrix of size $N \times d$, where $N$ is the number of obfuscations and
67+
$d$ is the dimension of the embedding.
68+
69+
Contributor: Georgios Smyrnis

research/meo/linear_finetuning/BUILD

+49
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# Copyright 2022 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# https://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
load("//devtools/python/blaze:pytype.bzl", "pytype_strict_binary", "pytype_strict_library")
16+
17+
package(
18+
default_visibility = ["//research/meo:__subpackages__"],
19+
licenses = ["notice"], # Apache 2.0
20+
)
21+
22+
pytype_strict_library(
23+
name = "linear_finetuning_lib",
24+
srcs = [
25+
"linear_finetuning.py",
26+
],
27+
srcs_version = "PY3",
28+
deps = [
29+
# package absl:app
30+
# package absl/flags
31+
# package absl/logging
32+
"//research/meo/mlp_baseline:mlp_baseline_lib",
33+
"//research/meo/mlp_baseline:multiple_decoders_lib",
34+
# package numpy
35+
# package tensorflow:tensorflow_no_contrib
36+
],
37+
)
38+
39+
pytype_strict_binary(
40+
name = "linear_finetuning",
41+
srcs = ["linear_finetuning.py"],
42+
python_version = "PY3",
43+
deps = [
44+
":linear_finetuning_lib",
45+
# package absl:app
46+
# package absl/flags
47+
# package absl/logging
48+
],
49+
)

0 commit comments

Comments
 (0)