-
model_name: The model name.
-
version: The model version.
-
train_path: The path to the pre-processed training questions.
-
train_object_path: The path to the GQA HDF5 file containing the object (Faster-RCNN) featurizations.
-
train_object_info_path: The path to the GQA objects JSON file containing the meta-data for the HDF5 file.
-
validation_path: The path to the pre-processed validation questions.
-
test_path: The path to the test path.
-
image_path: The path to the original GQA images (only required for visualization).
-
model_path: The output path for the model's checkpoints as well as the prediction output JSON.
-
attribute_file: The absolute path to gqa_all_attribute.json.
-
class_file: The absolute path to gqa_all_class.json.
-
relation_file: The absolute path to gqa_relation.json.
-
word_embedding_file: The path to the Glove word-embedding text file.
-
vocabulary_file: The absolute path to gqa_vocab.json.
-
h5_prefix: The prefix for the HDF5 object features chunk files (default: "gqa_objects").
-
h5_chunk_num: The number of chunks for the HDF5 object features (default: 16).
-
repetition_num: The number of runs for the entire training loop.
-
epoch_num: The number of epochs for each training run.
-
error_dim: The number of test metrics reported by the model on the validation/test splits (default: 1).
-
metric_index: The index of the metric on the validation split based on which the best model checkpoint is selected across the training steps.
-
train_batch_size: The number of questions in a single train batch.
-
test_batch_size: The number of questions in a single test/validation batch.
-
learning_rate: The learning rate.
-
weight_decay: The weight decay.
-
dropout: The dropout rate.
-
clip_norm: The clip norm rate used for gradient clipping.
-
verbose: The flag indicating whether to show the execution logs (true/false).
-
max_cache_size: The maximum cache size (default: 100000).
-
box_features_dim: The dimension of the GQA objects feature vectors (default: 2048).
-
oracle_input_dim: The input (output) dimension of the visual oracle (the initial featurizer network).
-
oracle_output_dim: The output dimension of the visual oracle (default: 1).
-
word_embedding_dim: The word embedding dimension (default: 300 for Glove)
-
classifier_oracle: The flag indicating whether to use the classifier-based architecture for the visual oracle (default: true)
-
featurizer_layers_config: A list containing the dimensions of the hidden layers for the MLP representing the featurizer network.
-
attribute_network_layers_config: A list containing the dimensions of the hidden layers for the MLP representing the attribute classifier in the visual oracle.
-
relation_network_layers_config: A list containing the dimensions of the hidden layers for the MLP representing the relation classifier in the visual oracle.
-
operator_layers_config: N/A (default: [])
-
normalize_oracle: The flag indicating whether to normalize the output probabilities of the visual oracle based on their categories (default: true).
-
freeze_featurizer: The flag indicating whether to freeze the parameters of the featurizer network.
-
freeze_attribute_network: The flag indicating whether to freeze the parameters of the attribute network within the visual oracle.
-
freeze_relation_network: The flag indicating whether to freeze the parameters of the relation network within the visual oracle.
-
freeze_embedding_network: The flag indicating whether to freeze the parameters of the last layer of the visual oracle (aka the embedding layer).
-
activate_attention_transfer: The flag indicating whether to activate the attention calibration mechanism.
-
attention_transfer_state_dim: The hidden dimention of the LSTM cell used in the attention calibration network.
-
freeze_attention_network: he flag indicating whether to freeze the parameters of the attention calibration network.
-
trainable_gate: N/A (default: false)
-
likelihood_threshold: The minimum likelihood an answer must have to be considered as a potential option for an open question.
-
hard_mode: The flag indicating whether to use Min/Max for logical conjunction/disjunction at the test time (default: false).
-
cpu_cores_num: Number of CPU cores used for fetching data (automatically set to the maximum cores available if not specified).
-
in_memory: N/A (default: true)
-
gpu_num: The maximum number of GPUs the framework is allowed to use for parallelization.
-
ckeckpointing_frequency: The number of training steps (batches) after which checkpointing happens.
-
first_answer: The flag indicating whether to return only the first answer for an open question in the case of tied likelihoods for mutiple options at the test time (default: false).