nlpyang · nikisix · Oct 18, 2019 · Oct 18, 2019 · Oct 18, 2019 · Oct 20, 2019
diff --git a/mapping_for_corenlp.txt b/mapping_for_corenlp.txt
@@ -0,0 +1,2 @@
+./raw_data_covid/small_test_tgt/covid.raw_src
+./raw_data_covid/small_test_tgt/covid.raw_tgt
diff --git a/raw_data/.gitignore b/raw_data/.gitignore
diff --git a/raw_data/temp.raw_src b/raw_data/temp.raw_src
@@ -0,0 +1,2 @@
+this Terry Jones had a love of the absurd that contributed much to the anarchic humour of Monty Python's Flying Circus. His style of visual comedy, leavened with a touch of the surreal, inspired many comedians who followed him. It was on Python that he honed his directing skills, notably on Life of Brian and The Meaning of Life. A keen historian, he wrote a number of books and fronted TV documentaries on ancient and medieval history. Terence Graham Parry Jones was born in Colwyn Bay in north Wales on 1 February 1942. His grandparents ran the local amateur operatic society and staged Gilbert and Sullivan concerts on the town's pier each year His family moved to Surrey when he was four but he always felt nostalgic about his native land. "I couldn't bear it and for the longest time I wanted Wales back," he once said. "I still feel very Welsh and feel it's where I should be really." After leaving the Royal Grammar School in Guildford, where he captained the school, he went on to read English at St Edmund Hall, Oxford. However, as he put it, he "strayed into history", the subject in which he graduated. While at Oxford he wrote sketches for the Oxford Revue and performed alongside a fellow student, Michael Palin.
+(CNN) An Iranian chess referee says she is frightened to return home after she was criticized online for not wearing the appropriate headscarf during an international tournament. Currently the chief adjudicator at the Women's World Chess Championship held in Russia and China, Shohreh Bayat says she fears arrest after a photograph of her was taken during the event and was then circulated online in Iran. "They are very sensitive about the hijab when we are representing Iran in international events and even sometimes they send a person with the team to control our hijab," Bayat told CNN Sport in a phone interview Tuesday. The headscarf, or the hijab, has been a mandatory part of women's dress in Iran since the 1979 Islamic revolution but, in recent years, some women have mounted opposition and staged protests about headwear rules. Bayat said she had been wearing a headscarf at the tournament but that certain camera angles had made it look like she was not. "If I come back to Iran, I think there are a few possibilities. It is highly possible that they arrest me [...] or it is possible that they invalidate my passport," added Bayat. "I think they want to make an example of me." The photographs were taken at the first stage of the chess championship in Shanghai, China, but Bayat has since flown to Vladivostok, Russia, for the second leg between Ju Wenjun and Aleksandra Goryachkina. She was left "panicked and shocked" when she became aware of the reaction in Iran after checking her phone in the hotel room. The 32-year-old said she felt helpless as websites reportedly condemned her for what some described as protesting the country's compulsory law. Subsequently, Bayat has decided to no longer wear the headscarf. "I'm not wearing it anymore because what is the point? I was just tolerating it, I don't believe in the hijab," she added. "People must be free to choose to wear what they want, and I was only wearing the hijab because I live in Iran and I had to wear it. I had no other choice." Bayat says she sought help from the country's chess federation. She says the federation told her to post an apology on her social media channels. She agreed under the condition that the federation would guarantee her safety but she said they refused. "My husband is in Iran, my parents are in Iran, all my family members are in Iran. I don't have anyone else outside of Iran. I don't know what to say, this is a very hard situation," she said. CNN contacted the Iranian Chess Federation on Tuesday but has yet to receive a response.
diff --git a/runs/cnn-abs.sh b/runs/cnn-abs.sh
@@ -0,0 +1,114 @@
+# STEP 2
+# There's a bug in the databuilder where the file referenced is not created and populated.
+# Here's the workaround:
+# python command you're _supposed to be able to run:
+# python src/preprocess.py \
+  # --mode tokenize \
+  # --raw_path ../raw_data_1 \
+  # --save_path ../results  \
+  # --log_file ../logs/cnndm.log
+
+# Java command you can _actually_ run:
+java edu.stanford.nlp.pipeline.StanfordCoreNLP \
+  -annotators tokenize,ssplit \
+  -ssplit.newlineIsSentenceBreak always \
+  -filelist mapping_for_corenlp.txt \
+  -outputFormat json \
+  -outputDirectory ./results
+# note: mapping_for_corenlp.txt is actually a file you need to make that should contain one input_file per line
+# ouput in results directory
+
+
+# STEP 3
+python src/preprocess.py \
+  --mode format_to_lines \
+  --raw_path results \
+  --save_path json_data \
+  --n_cpus 1 \
+  --use_bert_basic_tokenizer false \
+  --map_path urls \
+  --log_file logs/format_to_lines.log
+
+# Output files will now be in the json directory
+
+
+# STEP 4
+python src/preprocess.py \
+  --mode format_to_bert \
+  --raw_path ./json_data \
+  --save_path ./bert_data \
+  --lower \
+  --n_cpus 1 \
+  --log_file ./logs/preprocess.log
+
+# Output in bert_data
+
+
+# STEP 5. Model Training
+# --visible_gpus 0,1,2 \ # for multiple gpus
+# --visible_gpus 0,1,2 \ # for a single gpu
+python src/train.py \
+  --task abs \
+  --mode train \
+  --ext_dropout 0.1 \
+  --lr .002\
+  --report_every 50 \
+  --save_checkpoint_steps 4 \
+  --batch_size 3000 \
+  --train_steps 5 \
+  --accum_count 2 \
+  --log_file ./logs/abs_bert_cnndm \
+  --use_interval true \
+  --warmup_steps 1 \
+  --max_pos 512 \
+  --model_path ./models \
+  --bert_data_path ./bert_data/cnndm_sample
+
+# outputs to models (example): model_step_4.pt
+
+# all in one attempt mentioned in the jan 22 update
+#   --test_from PreSumm/models/model_step_49.pt \
+
+python src/train.py \
+  --task abs \
+  --mode test_text \
+  --ext_dropout 0.1 \
+  --lr .002\
+  --report_every 50 \
+  --save_checkpoint_steps 99 \
+  --batch_size 3000 \
+  --accum_count 2 \
+  --log_file logs/ext_bert \
+  --use_interval true \
+  --warmup_steps 100 \
+  --max_pos 512 \
+  --train_steps 100 \
+  --visible_gpus 0 \
+  --model_path models/ \
+  --result_path results \
+  --bert_data_path bert_data_covid \
+  --text_src raw_data_covid/small_test_tgt/covid.raw_src \
+  --text_tgt raw_data_covid/small_test_tgt/covid.raw_tgt \
+  --test_from models/model_step_148000.pt
+
+python src/train.py \
+  --task abs \
+  --mode train \
+  --ext_dropout 0.1 \
+  --lr .002\
+  --report_every 50 \
+  --save_checkpoint_steps 99 \
+  --batch_size 3000 \
+  --accum_count 2 \
+  --log_file logs/ext_bert \
+  --use_interval true \
+  --warmup_steps 100 \
+  --max_pos 512 \
+  --train_steps 100 \
+  --visible_gpus 0 \
+  --model_path models/ \
+  --result_path results \
+  --bert_data_path bert_data_covid/ \
+  --text_src raw_data_covid/small_test_tgt/covid.raw_src \
+  --text_tgt raw_data_covid/small_test_tgt/covid.raw_tgt \
+  --test_from models/model_step_148000.pt
diff --git a/runs/cnn-ext.sh b/runs/cnn-ext.sh
@@ -0,0 +1,115 @@
+# STEP 2
+# There's a bug in the databuilder where the file referenced is not created and populated.
+# Here's the workaround:
+# python command you're _supposed to be able to run:
+# python src/preprocess.py \
+  # --mode tokenize \
+  # --raw_path ../raw_data_1 \
+  # --save_path ../results  \
+  # --log_file ../logs/cnndm.log
+
+# Java command you can _actually_ run:
+java edu.stanford.nlp.pipeline.StanfordCoreNLP \
+  -annotators tokenize,ssplit \
+  -ssplit.newlineIsSentenceBreak always \
+  -filelist mapping_for_corenlp.txt \
+  -outputFormat json \
+  -outputDirectory ./results
+# note: mapping_for_corenlp.txt is actually a file you need to make that should contain one input_file per line
+# output in results directory
+
+
+# STEP 3
+python src/preprocess.py \
+  --mode format_to_lines \
+  --raw_path results \
+  --save_path json_data \
+  --n_cpus 1 \
+  --use_bert_basic_tokenizer false \
+  --map_path urls \
+  --log_file logs/cnndm.log
+
+# Output files will now be in the json directory
+
+
+# STEP 4
+python src/preprocess.py \
+  --mode format_to_bert \
+  --raw_path ./json_data \
+  --save_path ./bert_data \
+  --lower \
+  --n_cpus 1 \
+  --log_file ./logs/preprocess.log
+
+# Output in bert_data
+
+
+# STEP 5. Model Training
+# --visible_gpus 0,1,2 \ # for multiple gpus
+# --visible_gpus 0,1,2 \ # for a single gpu
+python src/train.py \
+  --task ext \
+  --mode train \
+  --ext_dropout 0.1 \
+  --lr .002\
+  --report_every 50 \
+  --save_checkpoint_steps 4 \
+  --batch_size 3000 \
+  --train_steps 5 \
+  --accum_count 2 \
+  --log_file ./logs/ext_bert_cnndm \
+  --use_interval true \
+  --warmup_steps 1 \
+  --max_pos 512 \
+  --model_path ./models \
+  --bert_data_path ./bert_data/cnndm_sample
+
+
+
+# all in one attempt mentioned in the jan 22 update
+#   --test_from PreSumm/models/model_step_49.pt \
+
+python src/train.py \
+  --task ext \
+  --mode test_text \
+  --ext_dropout 0.1 \
+  --lr .002\
+  --report_every 50 \
+  --save_checkpoint_steps 99 \
+  --batch_size 3000 \
+  --accum_count 2 \
+  --log_file logs/ext_bert_cnndm \
+  --use_interval true \
+  --warmup_steps 100 \
+  --max_pos 512 \
+  --train_steps 100 \
+  --visible_gpus 0 \
+  --model_path models/ \
+  --result_path results \
+  --bert_data_path bert_data/cnndm_sample \
+  --text_src raw_data/temp.raw_src \
+  --text_tgt raw_data/temp.raw_tgt \
+  --test_from models/bertext_cnndm_transformer.pt
+
+
+python src/train.py \
+  --task abs \
+  --mode test_text \
+  --ext_dropout 0.1 \
+  --lr .002\
+  --report_every 50 \
+  --save_checkpoint_steps 99 \
+  --batch_size 3000 \
+  --accum_count 2 \
+  --log_file logs/ext_bert \
+  --use_interval true \
+  --warmup_steps 100 \
+  --max_pos 512 \
+  --train_steps 100 \
+  --visible_gpus 0 \
+  --model_path models/ \
+  --result_path results \
+  --bert_data_path bert_data_covid \
+  --text_src raw_data_covid/covid.raw_src \
+  --text_tgt raw_data_covid/covid.raw_tgt \
+  --test_from models/model_step_148000.pt
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		./raw_data_covid/small_test_tgt/covid.raw_src
		./raw_data_covid/small_test_tgt/covid.raw_tgt
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		this Terry Jones had a love of the absurd that contributed much to the anarchic humour of Monty Python's Flying Circus. His style of visual comedy, leavened with a touch of the surreal, inspired many comedians who followed him. It was on Python that he honed his directing skills, notably on Life of Brian and The Meaning of Life. A keen historian, he wrote a number of books and fronted TV documentaries on ancient and medieval history. Terence Graham Parry Jones was born in Colwyn Bay in north Wales on 1 February 1942. His grandparents ran the local amateur operatic society and staged Gilbert and Sullivan concerts on the town's pier each year His family moved to Surrey when he was four but he always felt nostalgic about his native land. "I couldn't bear it and for the longest time I wanted Wales back," he once said. "I still feel very Welsh and feel it's where I should be really." After leaving the Royal Grammar School in Guildford, where he captained the school, he went on to read English at St Edmund Hall, Oxford. However, as he put it, he "strayed into history", the subject in which he graduated. While at Oxford he wrote sketches for the Oxford Revue and performed alongside a fellow student, Michael Palin.
		(CNN) An Iranian chess referee says she is frightened to return home after she was criticized online for not wearing the appropriate headscarf during an international tournament. Currently the chief adjudicator at the Women's World Chess Championship held in Russia and China, Shohreh Bayat says she fears arrest after a photograph of her was taken during the event and was then circulated online in Iran. "They are very sensitive about the hijab when we are representing Iran in international events and even sometimes they send a person with the team to control our hijab," Bayat told CNN Sport in a phone interview Tuesday. The headscarf, or the hijab, has been a mandatory part of women's dress in Iran since the 1979 Islamic revolution but, in recent years, some women have mounted opposition and staged protests about headwear rules. Bayat said she had been wearing a headscarf at the tournament but that certain camera angles had made it look like she was not. "If I come back to Iran, I think there are a few possibilities. It is highly possible that they arrest me [...] or it is possible that they invalidate my passport," added Bayat. "I think they want to make an example of me." The photographs were taken at the first stage of the chess championship in Shanghai, China, but Bayat has since flown to Vladivostok, Russia, for the second leg between Ju Wenjun and Aleksandra Goryachkina. She was left "panicked and shocked" when she became aware of the reaction in Iran after checking her phone in the hotel room. The 32-year-old said she felt helpless as websites reportedly condemned her for what some described as protesting the country's compulsory law. Subsequently, Bayat has decided to no longer wear the headscarf. "I'm not wearing it anymore because what is the point? I was just tolerating it, I don't believe in the hijab," she added. "People must be free to choose to wear what they want, and I was only wearing the hijab because I live in Iran and I had to wear it. I had no other choice." Bayat says she sought help from the country's chess federation. She says the federation told her to post an apology on her social media channels. She agreed under the condition that the federation would guarantee her safety but she said they refused. "My husband is in Iran, my parents are in Iran, all my family members are in Iran. I don't have anyone else outside of Iran. I don't know what to say, this is a very hard situation," she said. CNN contacted the Iranian Chess Federation on Tuesday but has yet to receive a response.