Skip to content
This repository was archived by the owner on Nov 22, 2022. It is now read-only.

Add additional fields in batch reader, run_model output file & knowledge distillation data files #783

Closed
wants to merge 1 commit into from

Conversation

haowu666
Copy link

@haowu666 haowu666 commented Jul 16, 2019

[The updated version] generalized the previous version (adding post_id). The main workflow can read list of addtional fields such as post_id, page_id, page_url into context, then parse it along with classification metric reporter. Now PyText users are able to include these fields in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the fileds. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional fields as a dictionary in the generated data files, which is helpful for building teacher and student network for multi-label experiments.

In the debug file for the test dataset, one header example is
#predicted, actual, scores_str, text, post_id, post_url

In generating knowledge distillation data for training dataset, one header example is
#label_list, score, logit, label_names(in order), text, {"post_id": 123456, "post_url": http://post_url.com}

[The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution.

Differential Revision: D16271134

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Jul 16, 2019
haowu666 pushed a commit to haowu666/pytext that referenced this pull request Jul 16, 2019
Summary:
Pull Request resolved: facebookresearch#783

Goal:
Read post_id field and parse it along with output file
Make each gen_kd_data[i] where i in {0,1,2} includes post_id as well
With post_id inside the output file, PyText users are able to look into the details of model output and explore additional metric associated with team target

Differential Revision: D16271134

fbshipit-source-id: 0438830dba8f366627048338b1a84357f3fa3f93
haowu666 pushed a commit to haowu666/pytext that referenced this pull request Jul 16, 2019
Summary:
Pull Request resolved: facebookresearch#783

Goal:
Read post_id field and parse it along with output file
Make each gen_kd_data[i] where i in {0,1,2} includes post_id as well
With post_id inside the output file, PyText users are able to look into the details of model output and explore additional metric associated with team target

Differential Revision: D16271134

fbshipit-source-id: 999f08228945f537893deb130a437ee89e87443f
haowu666 pushed a commit to haowu666/pytext that referenced this pull request Aug 1, 2019
…D data files in knowledge distillation (facebookresearch#783)

Summary:
[The updated version] generalized the previous version (adding post_id). The main workflow can read addtional field such as post_id, page_id, page_url into context,  then parse it along with classification metric reporter. Now PyText users are able to include this field in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the filed. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional field in the generated data files, which is helpful for building teacher and student network for multi-label experiments.

[The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution.
Pull Request resolved: facebookresearch#783

Differential Revision: D16271134

fbshipit-source-id: c6b2a4ba128b5a1d2141840777a8b33d80895c29
haowu666 pushed a commit to haowu666/pytext that referenced this pull request Aug 1, 2019
…D data files in knowledge distillation (facebookresearch#783)

Summary:
[The updated version] generalized the previous version (adding post_id). The main workflow can read addtional field such as post_id, page_id, page_url into context,  then parse it along with classification metric reporter. Now PyText users are able to include this field in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the filed. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional field in the generated data files, which is helpful for building teacher and student network for multi-label experiments.

[The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution.
Pull Request resolved: facebookresearch#783

Differential Revision: D16271134

fbshipit-source-id: bf4544e20f60a6936a71b2a98bbd4715941085ca
haowu666 pushed a commit to haowu666/pytext that referenced this pull request Aug 1, 2019
…D data files in knowledge distillation (facebookresearch#783)

Summary:
[The updated version] generalized the previous version (adding post_id). The main workflow can read addtional field such as post_id, page_id, page_url into context,  then parse it along with classification metric reporter. Now PyText users are able to include this field in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the filed. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional field in the generated data files, which is helpful for building teacher and student network for multi-label experiments.

[The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution.
Pull Request resolved: facebookresearch#783

Differential Revision: D16271134

fbshipit-source-id: 6310550f52de6eb8c32af5d0e40137c0f54b717f
@haowu666 haowu666 changed the title PyText post_id implementation & gen KD data PyText add additional field in output file of run_model & generated KD data files in knowledge distillation Aug 2, 2019
haowu666 pushed a commit to haowu666/pytext that referenced this pull request Aug 2, 2019
…D data files in knowledge distillation (facebookresearch#783)

Summary:
[The updated version] generalized the previous version (adding post_id). The main workflow can read addtional field such as post_id, page_id, page_url into context,  then parse it along with classification metric reporter. Now PyText users are able to include this field in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the filed. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional field in the generated data files, which is helpful for building teacher and student network for multi-label experiments.

[The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution.
Pull Request resolved: facebookresearch#783

Differential Revision: D16271134

fbshipit-source-id: 3c32543d3178d19c7f0ca746116f6f41e6d9f166
haowu666 pushed a commit to haowu666/pytext that referenced this pull request Aug 2, 2019
…D data files in knowledge distillation (facebookresearch#783)

Summary:
[The updated version] generalized the previous version (adding post_id). The main workflow can read addtional field such as post_id, page_id, page_url into context,  then parse it along with classification metric reporter. Now PyText users are able to include this field in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the filed. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional field in the generated data files, which is helpful for building teacher and student network for multi-label experiments.

[The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution.
Pull Request resolved: facebookresearch#783

Differential Revision: D16271134

fbshipit-source-id: d1805fd59c426e09eb5dfd37b28ce6895ae5b7f2
haowu666 pushed a commit to haowu666/pytext that referenced this pull request Aug 2, 2019
…D data files in knowledge distillation (facebookresearch#783)

Summary:
[The updated version] generalized the previous version (adding post_id). The main workflow can read addtional field such as post_id, page_id, page_url into context,  then parse it along with classification metric reporter. Now PyText users are able to include this field in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the filed. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional field in the generated data files, which is helpful for building teacher and student network for multi-label experiments.

[The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution.
Pull Request resolved: facebookresearch#783

Differential Revision: D16271134

fbshipit-source-id: d00a36aaaec29ab51f3cb774bac793039c7f0140
haowu666 pushed a commit to haowu666/pytext that referenced this pull request Aug 5, 2019
…nerated KD data files in knowledge distillation (facebookresearch#783)

Summary:
[The updated version] generalized the previous version (adding post_id). The main workflow can read list of addtional fields such as post_id, page_id, page_url into context,  then parse it along with classification metric reporter. Now PyText users are able to include these fields in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the fileds. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional fields as a dictionary in the generated data files, which is helpful for building teacher and student network for multi-label experiments.

In the debug file for the test dataset, one header example is
#predicted, actual, scores_str, text, post_id, post_url

In generating knowledge distillation data for training dataset, one header example is
#label_list, score, logit, label_names(in order), text, {"post_id": 123456, "post_url": http://post_url.com}

[The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution.
Pull Request resolved: facebookresearch#783

Differential Revision: D16271134

fbshipit-source-id: ed368ec5d2e4859b7c2e162c3afb01c6748599aa
@haowu666 haowu666 changed the title PyText add additional field in output file of run_model & generated KD data files in knowledge distillation PyText add list of additional fields in output file of run_model & generated KD data files in knowledge distillation Aug 5, 2019
haowu666 pushed a commit to haowu666/pytext that referenced this pull request Aug 5, 2019
…nerated KD data files in knowledge distillation (facebookresearch#783)

Summary:
[The updated version] generalized the previous version (adding post_id). The main workflow can read list of addtional fields such as post_id, page_id, page_url into context,  then parse it along with classification metric reporter. Now PyText users are able to include these fields in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the fileds. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional fields as a dictionary in the generated data files, which is helpful for building teacher and student network for multi-label experiments.

In the debug file for the test dataset, one header example is
#predicted, actual, scores_str, text, post_id, post_url

In generating knowledge distillation data for training dataset, one header example is
#label_list, score, logit, label_names(in order), text, {"post_id": 123456, "post_url": http://post_url.com}

[The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution.
Pull Request resolved: facebookresearch#783

Differential Revision: D16271134

fbshipit-source-id: ae70ae5526031158afacb8edf8dc0faabdab15bc
haowu666 pushed a commit to haowu666/pytext that referenced this pull request Aug 5, 2019
…nerated KD data files in knowledge distillation (facebookresearch#783)

Summary:
[The updated version] generalized the previous version (adding post_id). The main workflow can read list of addtional fields such as post_id, page_id, page_url into context,  then parse it along with classification metric reporter. Now PyText users are able to include these fields in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the fileds. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional fields as a dictionary in the generated data files, which is helpful for building teacher and student network for multi-label experiments.

In the debug file for the test dataset, one header example is
#predicted, actual, scores_str, text, post_id, post_url

In generating knowledge distillation data for training dataset, one header example is
#label_list, score, logit, label_names(in order), text, {"post_id": 123456, "post_url": http://post_url.com}

[The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution.
Pull Request resolved: facebookresearch#783

Differential Revision: D16271134

fbshipit-source-id: abf2ca517f67009b8ad3631306e56ff5c99d8859
haowu666 pushed a commit to haowu666/pytext that referenced this pull request Aug 6, 2019
…nerated KD data files in knowledge distillation (facebookresearch#783)

Summary:
[The updated version] generalized the previous version (adding post_id). The main workflow can read list of addtional fields such as post_id, page_id, page_url into context,  then parse it along with classification metric reporter. Now PyText users are able to include these fields in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the fileds. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional fields as a dictionary in the generated data files, which is helpful for building teacher and student network for multi-label experiments.

In the debug file for the test dataset, one header example is
#predicted, actual, scores_str, text, post_id, post_url

In generating knowledge distillation data for training dataset, one header example is
#label_list, score, logit, label_names(in order), text, {"post_id": 123456, "post_url": http://post_url.com}

[The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution.
Pull Request resolved: facebookresearch#783

Differential Revision: D16271134

fbshipit-source-id: 3b01cafd7999d80522fb29ba4221735bbba7bfdb
haowu666 pushed a commit to haowu666/pytext that referenced this pull request Aug 6, 2019
…nerated KD data files in knowledge distillation (facebookresearch#783)

Summary:
[The updated version] generalized the previous version (adding post_id). The main workflow can read list of addtional fields such as post_id, page_id, page_url into context,  then parse it along with classification metric reporter. Now PyText users are able to include these fields in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the fileds. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional fields as a dictionary in the generated data files, which is helpful for building teacher and student network for multi-label experiments.

In the debug file for the test dataset, one header example is
#predicted, actual, scores_str, text, post_id, post_url

In generating knowledge distillation data for training dataset, one header example is
#label_list, score, logit, label_names(in order), text, {"post_id": 123456, "post_url": http://post_url.com}

[The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution.
Pull Request resolved: facebookresearch#783

Differential Revision: D16271134

fbshipit-source-id: 0031c517f5e59b557bfaabd88e38b7d500612c73
…nerated KD data files in knowledge distillation (facebookresearch#783)

Summary:
[The updated version] generalized the previous version (adding post_id). The main workflow can read list of addtional fields such as post_id, page_id, page_url into context,  then parse it along with classification metric reporter. Now PyText users are able to include these fields in the file of saving results, and it's useful to look into the details of the model output, explore other metric assocaited with team target, or look for other relevant info form hive table by searching on the fileds. Also, in knowledge distillation, each gen_kd_data[i] where i in {0,1,2} (training, validation, test) can handle label list for multi-label task, and include the additional fields as a dictionary in the generated data files, which is helpful for building teacher and student network for multi-label experiments.

In the debug file for the test dataset, one header example is
#predicted, actual, scores_str, text, post_id, post_url

In generating knowledge distillation data for training dataset, one header example is
#label_list, score, logit, label_names(in order), text, {"post_id": 123456, "post_url": http://post_url.com}

[The previous version] aimed at reading post_id and parsing it along with the output file, so that the file is useful for our metric calculation and building KD - teacher/student in our workflow in the following steps. Other users that have the same desired goal can use this Diff as a potentially alternative solution.
Pull Request resolved: facebookresearch#783

Differential Revision: D16271134

fbshipit-source-id: 32299d2ce96ed260d5a415061b97b625b06b059e
@haowu666 haowu666 changed the title PyText add list of additional fields in output file of run_model & generated KD data files in knowledge distillation PyText add list of additional fields in batch reader, run_model output file & knowledge distillation data files Aug 7, 2019
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in e14ea15.

@haowu666 haowu666 changed the title PyText add list of additional fields in batch reader, run_model output file & knowledge distillation data files Add additional fields in batch reader, run_model output file & knowledge distillation data files Aug 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed Do not delete this pull request or issue due to inactivity. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants