README indents

adithya8 · Apr 26, 2021 · 6e23ebb · 6e23ebb
1 parent 9c83a9d
commit 6e23ebb
Show file tree

Hide file tree

Showing 2 changed files with 18 additions and 6 deletions.
diff --git a/README.md b/README.md
@@ -10,7 +10,8 @@ The DLATK first requires to tokenize the messages before generating the embeddin
 
 The embedding generation command:
 
-    CUDA_VISIBLE_DEVICES=0 python3 dlatkInterface.py -d db -t table_name -c user_id --add_emb --emb_model roberta-base --emb_layers 11 --emb_msg_aggregation mean --batch_size 30
+	CUDA_VISIBLE_DEVICES=0 python3 dlatkInterface.py -d db -t table_name -c user_id --add_emb \
+	--emb_model roberta-base --emb_layers 11 --emb_msg_aggregation mean --batch_size 30
 
 table_name = {D_20, T_20}
 
@@ -20,15 +21,19 @@ table_name = {D_20, T_20}
 
 The DLATK command to extract the dimension reduction is done in two steps as explained in the report. The first step involved learning the reduction on the domain data and storing the learnt model in a pickle file:
 
-    python3 dlatkInterface.py -d db -t table_name -c user_id --group_freq_thresh 1000 -f 'feat$roberta_ba_meL11con$table_name$user_id$16to16' --model {dimred_model} --fit_reducer --k 128 --save_model --picklefile dimred_model_128.pickle
+	python3 dlatkInterface.py -d db -t table_name -c user_id --group_freq_thresh 1000 \
+	-f 'feat$roberta_ba_meL11con$table_name$user_id$16to16' --model {dimred_model} \
+	--fit_reducer --k 128 --save_model --picklefile dimred_model_128.pickle
 
 The number of dimensions to reduce to (components) can be changed by altering the argument of `--k`
 
 The dimred_model here could be `pca, nmf, fa, ae` (for non linear auto-encoders). For ae, it is expected to prefix the command with CUDA_VISIBLE_DEVICES env variable to specify GPU.  
 
 The second step would be applying this learnt reduction model on the task data to generate the reduced representations.
 
-    python3 dlatkInterface.py -d db -t table_name -c user_id --group_freq_thresh 1000 -f 'feat$roberta_ba_meL11con$table_name$user_id$16to16' --transform_to_feats {dimred_table_name} --load --picklefile dimred_model_128.pickle
+	python3 dlatkInterface.py -d db -t table_name -c user_id --group_freq_thresh 1000 \
+	-f 'feat$roberta_ba_meL11con$table_name$user_id$16to16' \
+	--transform_to_feats {dimred_table_name} --load --picklefile dimred_model_128.pickle
 
 The name of the table to stored the dimension reduced representations is given in pace of dimred_table_name. 
 
@@ -38,11 +43,17 @@ The name of the table to stored the dimension reduced representations is given i
 
 The commands to perform bootstrapped training followed by evaluation for regression task is given by:
 
-    python3 dlatkInterface.py -d db -t task_table_name -c user_id --group_freq_thresh 1000 -f '{feat_table_name}' --outcome_table 20_outcomes --outcomes age ext ope --train_reg --model ridgehighcv --train_bootstraps 10 --where 'r10pct_test_fold is NOT NULL' --train_bootstraps_ns 50 100 200 500 1000 --no_standardize --save_models --picklefile reg_model_{feat_table_name}.pickle
+	python3 dlatkInterface.py -d db -t task_table_name -c user_id --group_freq_thresh 1000 \
+	-f '{feat_table_name}' --outcome_table 20_outcomes --outcomes age ext ope --train_reg \
+	--model ridgehighcv --train_bootstraps 10 --where 'r10pct_test_fold is NOT NULL' \
+	--train_bootstraps_ns 50 100 200 500 1000 --no_standardize \
+	--save_models --picklefile reg_model_{feat_table_name}.pickle
 
 The feat table name is either the raw embeddings table name or the dimension reduced feature table name. The regression outcomes are listed in the `--outcomes` flag. The number of times to perform the bootstrapping is specified in `--train_bootstraps` flag and the sample sizes for bootstrapping is specifed in `--train_bootstrap_ns` flag. Finally the ridge model is saved as a pickle file. 
 
-    python3 ~/NLP/dlatk/dlatkInterface.py -d db-t table_name -c user_id --group_freq_thresh 1000 -f '{feat_table_name}' --outcome_table 20_outcomes --outcomes age ext ope --predict_reg --where 'facet_fold = 1' --load --picklefile reg_model_{feat_table_name}.pickle > output.txt
+	python3 ~/NLP/dlatk/dlatkInterface.py -d db-t table_name -c user_id --group_freq_thresh 1000 \
+	-f '{feat_table_name}' --outcome_table 20_outcomes --outcomes age ext ope --predict_reg --where \
+	'facet_fold = 1' --load --picklefile reg_model_{feat_table_name}.pickle > output.txt
 
 This command would store the evaluation result for the ten runs in output.txt. 
 

diff --git a/models/README.md b/models/README.md
@@ -17,7 +17,8 @@ Here is an example showing how:
 
 If the user embeddings have been generated using [DLATK](https://github.com/DLATK/DLATK/) by following the commands [here](https://github.com/adithya8/ContextualEmbeddingDR#commands-to-extract-dimension-reduced-tables-using-a-specific-method), then you can use these pickle files directly by using the following command:
 
-	python dlatkInterface.py -d {database-name} -t {table-name} -g {group-name} -f {user-embeddings-table-name} --transform_to_feats {dimred-table-name} --load --pickle {path-to-pickle-file}
+	python dlatkInterface.py -d {database-name} -t {table-name} -g {group-name} -f {user-embeddings-table-name} \
+	--transform_to_feats {dimred-table-name} --load --pickle {path-to-pickle-file}
 
 
 ### **Using CSVs through python**