Add additional fields in batch reader, run_model output file & knowledge distillation data files #783

haowu666 · 2019-07-16T05:46:56Z

[The updated version] generalized the previous version (adding post_id). The main workflow can read list of addtional fields such as post_id, page_id, page_url into context, then parse it along with classification metric reporter. Now PyText users are able to include these fields in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the fileds. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional fields as a dictionary in the generated data files, which is helpful for building teacher and student network for multi-label experiments.

In the debug file for the test dataset, one header example is
#predicted, actual, scores_str, text, post_id, post_url

In generating knowledge distillation data for training dataset, one header example is
#label_list, score, logit, label_names(in order), text, {"post_id": 123456, "post_url": http://post_url.com}

[The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution.

Differential Revision: D16271134

Summary: Pull Request resolved: facebookresearch#783 Goal: Read post_id field and parse it along with output file Make each gen_kd_data[i] where i in {0,1,2} includes post_id as well With post_id inside the output file, PyText users are able to look into the details of model output and explore additional metric associated with team target Differential Revision: D16271134 fbshipit-source-id: 0438830dba8f366627048338b1a84357f3fa3f93

Summary: Pull Request resolved: facebookresearch#783 Goal: Read post_id field and parse it along with output file Make each gen_kd_data[i] where i in {0,1,2} includes post_id as well With post_id inside the output file, PyText users are able to look into the details of model output and explore additional metric associated with team target Differential Revision: D16271134 fbshipit-source-id: 999f08228945f537893deb130a437ee89e87443f

…D data files in knowledge distillation (facebookresearch#783) Summary: [The updated version] generalized the previous version (adding post_id). The main workflow can read addtional field such as post_id, page_id, page_url into context, then parse it along with classification metric reporter. Now PyText users are able to include this field in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the filed. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional field in the generated data files, which is helpful for building teacher and student network for multi-label experiments. [The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution. Pull Request resolved: facebookresearch#783 Differential Revision: D16271134 fbshipit-source-id: c6b2a4ba128b5a1d2141840777a8b33d80895c29

…D data files in knowledge distillation (facebookresearch#783) Summary: [The updated version] generalized the previous version (adding post_id). The main workflow can read addtional field such as post_id, page_id, page_url into context, then parse it along with classification metric reporter. Now PyText users are able to include this field in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the filed. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional field in the generated data files, which is helpful for building teacher and student network for multi-label experiments. [The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution. Pull Request resolved: facebookresearch#783 Differential Revision: D16271134 fbshipit-source-id: bf4544e20f60a6936a71b2a98bbd4715941085ca

…D data files in knowledge distillation (facebookresearch#783) Summary: [The updated version] generalized the previous version (adding post_id). The main workflow can read addtional field such as post_id, page_id, page_url into context, then parse it along with classification metric reporter. Now PyText users are able to include this field in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the filed. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional field in the generated data files, which is helpful for building teacher and student network for multi-label experiments. [The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution. Pull Request resolved: facebookresearch#783 Differential Revision: D16271134 fbshipit-source-id: 6310550f52de6eb8c32af5d0e40137c0f54b717f

…D data files in knowledge distillation (facebookresearch#783) Summary: [The updated version] generalized the previous version (adding post_id). The main workflow can read addtional field such as post_id, page_id, page_url into context, then parse it along with classification metric reporter. Now PyText users are able to include this field in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the filed. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional field in the generated data files, which is helpful for building teacher and student network for multi-label experiments. [The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution. Pull Request resolved: facebookresearch#783 Differential Revision: D16271134 fbshipit-source-id: 3c32543d3178d19c7f0ca746116f6f41e6d9f166

…D data files in knowledge distillation (facebookresearch#783) Summary: [The updated version] generalized the previous version (adding post_id). The main workflow can read addtional field such as post_id, page_id, page_url into context, then parse it along with classification metric reporter. Now PyText users are able to include this field in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the filed. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional field in the generated data files, which is helpful for building teacher and student network for multi-label experiments. [The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution. Pull Request resolved: facebookresearch#783 Differential Revision: D16271134 fbshipit-source-id: d1805fd59c426e09eb5dfd37b28ce6895ae5b7f2

…D data files in knowledge distillation (facebookresearch#783) Summary: [The updated version] generalized the previous version (adding post_id). The main workflow can read addtional field such as post_id, page_id, page_url into context, then parse it along with classification metric reporter. Now PyText users are able to include this field in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the filed. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional field in the generated data files, which is helpful for building teacher and student network for multi-label experiments. [The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution. Pull Request resolved: facebookresearch#783 Differential Revision: D16271134 fbshipit-source-id: d00a36aaaec29ab51f3cb774bac793039c7f0140

…nerated KD data files in knowledge distillation (facebookresearch#783) Summary: [The updated version] generalized the previous version (adding post_id). The main workflow can read list of addtional fields such as post_id, page_id, page_url into context, then parse it along with classification metric reporter. Now PyText users are able to include these fields in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the fileds. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional fields as a dictionary in the generated data files, which is helpful for building teacher and student network for multi-label experiments. In the debug file for the test dataset, one header example is #predicted, actual, scores_str, text, post_id, post_url In generating knowledge distillation data for training dataset, one header example is #label_list, score, logit, label_names(in order), text, {"post_id": 123456, "post_url": http://post_url.com} [The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution. Pull Request resolved: facebookresearch#783 Differential Revision: D16271134 fbshipit-source-id: ed368ec5d2e4859b7c2e162c3afb01c6748599aa

…nerated KD data files in knowledge distillation (facebookresearch#783) Summary: [The updated version] generalized the previous version (adding post_id). The main workflow can read list of addtional fields such as post_id, page_id, page_url into context, then parse it along with classification metric reporter. Now PyText users are able to include these fields in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the fileds. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional fields as a dictionary in the generated data files, which is helpful for building teacher and student network for multi-label experiments. In the debug file for the test dataset, one header example is #predicted, actual, scores_str, text, post_id, post_url In generating knowledge distillation data for training dataset, one header example is #label_list, score, logit, label_names(in order), text, {"post_id": 123456, "post_url": http://post_url.com} [The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution. Pull Request resolved: facebookresearch#783 Differential Revision: D16271134 fbshipit-source-id: ae70ae5526031158afacb8edf8dc0faabdab15bc

…nerated KD data files in knowledge distillation (facebookresearch#783) Summary: [The updated version] generalized the previous version (adding post_id). The main workflow can read list of addtional fields such as post_id, page_id, page_url into context, then parse it along with classification metric reporter. Now PyText users are able to include these fields in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the fileds. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional fields as a dictionary in the generated data files, which is helpful for building teacher and student network for multi-label experiments. In the debug file for the test dataset, one header example is #predicted, actual, scores_str, text, post_id, post_url In generating knowledge distillation data for training dataset, one header example is #label_list, score, logit, label_names(in order), text, {"post_id": 123456, "post_url": http://post_url.com} [The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution. Pull Request resolved: facebookresearch#783 Differential Revision: D16271134 fbshipit-source-id: abf2ca517f67009b8ad3631306e56ff5c99d8859

…nerated KD data files in knowledge distillation (facebookresearch#783) Summary: [The updated version] generalized the previous version (adding post_id). The main workflow can read list of addtional fields such as post_id, page_id, page_url into context, then parse it along with classification metric reporter. Now PyText users are able to include these fields in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the fileds. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional fields as a dictionary in the generated data files, which is helpful for building teacher and student network for multi-label experiments. In the debug file for the test dataset, one header example is #predicted, actual, scores_str, text, post_id, post_url In generating knowledge distillation data for training dataset, one header example is #label_list, score, logit, label_names(in order), text, {"post_id": 123456, "post_url": http://post_url.com} [The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution. Pull Request resolved: facebookresearch#783 Differential Revision: D16271134 fbshipit-source-id: 3b01cafd7999d80522fb29ba4221735bbba7bfdb

…nerated KD data files in knowledge distillation (facebookresearch#783) Summary: [The updated version] generalized the previous version (adding post_id). The main workflow can read list of addtional fields such as post_id, page_id, page_url into context, then parse it along with classification metric reporter. Now PyText users are able to include these fields in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the fileds. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional fields as a dictionary in the generated data files, which is helpful for building teacher and student network for multi-label experiments. In the debug file for the test dataset, one header example is #predicted, actual, scores_str, text, post_id, post_url In generating knowledge distillation data for training dataset, one header example is #label_list, score, logit, label_names(in order), text, {"post_id": 123456, "post_url": http://post_url.com} [The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution. Pull Request resolved: facebookresearch#783 Differential Revision: D16271134 fbshipit-source-id: 0031c517f5e59b557bfaabd88e38b7d500612c73

…nerated KD data files in knowledge distillation (facebookresearch#783) Summary: [The updated version] generalized the previous version (adding post_id). The main workflow can read list of addtional fields such as post_id, page_id, page_url into context, then parse it along with classification metric reporter. Now PyText users are able to include these fields in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the fileds. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional fields as a dictionary in the generated data files, which is helpful for building teacher and student network for multi-label experiments. In the debug file for the test dataset, one header example is #predicted, actual, scores_str, text, post_id, post_url In generating knowledge distillation data for training dataset, one header example is #label_list, score, logit, label_names(in order), text, {"post_id": 123456, "post_url": http://post_url.com} [The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution. Pull Request resolved: facebookresearch#783 Differential Revision: D16271134 fbshipit-source-id: 32299d2ce96ed260d5a415061b97b625b06b059e

facebook-github-bot · 2019-08-07T22:58:39Z

This pull request has been merged in e14ea15.

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Jul 16, 2019

haowu666 force-pushed the export-D16271134 branch from e8175f1 to b6d54cf Compare July 16, 2019 16:41

haowu666 force-pushed the export-D16271134 branch from b6d54cf to 525ba0c Compare July 16, 2019 17:40

haowu666 force-pushed the export-D16271134 branch from 525ba0c to d34c8e5 Compare August 1, 2019 20:10

haowu666 force-pushed the export-D16271134 branch from d34c8e5 to 13007bc Compare August 1, 2019 23:20

haowu666 force-pushed the export-D16271134 branch from 13007bc to 7eedd1e Compare August 1, 2019 23:39

haowu666 changed the title ~~PyText post_id implementation & gen KD data~~ PyText add additional field in output file of run_model & generated KD data files in knowledge distillation Aug 2, 2019

haowu666 force-pushed the export-D16271134 branch from 7eedd1e to e8c3f68 Compare August 2, 2019 00:55

haowu666 force-pushed the export-D16271134 branch from e8c3f68 to e35213e Compare August 2, 2019 01:29

haowu666 force-pushed the export-D16271134 branch from e35213e to 9aa09a3 Compare August 2, 2019 03:10

haowu666 force-pushed the export-D16271134 branch from 9aa09a3 to db454ba Compare August 5, 2019 20:47

haowu666 changed the title ~~PyText add additional field in output file of run_model & generated KD data files in knowledge distillation~~ PyText add list of additional fields in output file of run_model & generated KD data files in knowledge distillation Aug 5, 2019

haowu666 force-pushed the export-D16271134 branch from db454ba to 2833609 Compare August 5, 2019 21:41

haowu666 force-pushed the export-D16271134 branch from 2833609 to 74caf0e Compare August 5, 2019 21:41

haowu666 force-pushed the export-D16271134 branch from 74caf0e to 173da9a Compare August 6, 2019 04:43

haowu666 force-pushed the export-D16271134 branch from 173da9a to 0b6dc59 Compare August 6, 2019 05:50

haowu666 force-pushed the export-D16271134 branch from 0b6dc59 to 1fbd3cc Compare August 6, 2019 06:10

haowu666 changed the title ~~PyText add list of additional fields in output file of run_model & generated KD data files in knowledge distillation~~ PyText add list of additional fields in batch reader, run_model output file & knowledge distillation data files Aug 7, 2019

facebook-github-bot closed this in e14ea15 Aug 7, 2019

facebook-github-bot added the Merged label Aug 7, 2019

haowu666 changed the title ~~PyText add list of additional fields in batch reader, run_model output file & knowledge distillation data files~~ Add additional fields in batch reader, run_model output file & knowledge distillation data files Aug 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add additional fields in batch reader, run_model output file & knowledge distillation data files #783

Add additional fields in batch reader, run_model output file & knowledge distillation data files #783

haowu666 commented Jul 16, 2019 •

edited

Loading

facebook-github-bot commented Aug 7, 2019

Add additional fields in batch reader, run_model output file & knowledge distillation data files #783

Add additional fields in batch reader, run_model output file & knowledge distillation data files #783

Conversation

haowu666 commented Jul 16, 2019 • edited Loading

facebook-github-bot commented Aug 7, 2019

haowu666 commented Jul 16, 2019 •

edited

Loading