-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sample data, pre-trained word embedding #3
Comments
zhihu-title-desc-multiple-label-v6.txt.zip, it contains three files: https://pan.baidu.com/s/1mHgELJUHewQZ9zHDo_uhmA
|
Could you please upload it in a different service? I am having a hard time downloading the data and word2vec model. Thanks! |
When I run TextRNN model. Terminal reports : |
you may use zhihu-word2vec-title-desc.bin-100 or file from your own. |
where is the file: /test-zhihu-forpredict-title-desc-v6.txt |
and also: train-zhihu6-title-desc.txt |
there're 2 data_util_zhihu.py in folders aa1_data_util and a07_Transformer |
what's the data_type ? train, test, _ = load_data(vocabulary_word2index, vocabulary_word2index_label,data_type='train') |
I cant see files there in the links mentioned. They give below error in Chinese Oh, the page you visit does not exist. possible reason:
#########################
######################################### Could you please upload them to https://github.com/brightmart/text_classification in sample data folder? |
Nevermind. After few tries the above links worked. But bin100 is not downloading for some reason/ |
@pmahend1 Same. The bin100 is not downloading even in China. |
@pmahend1 @deatherving |
@brightmart Thanks. The file is accessible. |
@brightmart |
Hi,
You can download some test data ,find it from closed issue.
Bright
…________________________________
发件人: sky_shine <notifications@github.com>
发送时间: 2017年12月6日 13:57
收件人: brightmart/text_classification
抄送: brightmart; Mention
主题: Re: [brightmart/text_classification] sample data, pre-trained word embedding (#3)
when I run p5_fastTextB_predict.py ,where come with the error below:
FileNotFoundError: [Errno 2] No such file or directory: 'test-zhihu-forpredict-v4only-title.txt'
In addition,where is zhihu-word2vec-multilabel.bin-100?
―
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#3 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ASuYMKE4FWVVhDhdOPqUAKMvXC24J5gMks5s9iyvgaJpZM4Oezj1>.
|
@brightmart I could download the file now. Thanks 👍 |
@pmahend1
How can I solve this issue? |
above links to download "zhihu-word2vec-title-desc.bin-100' are not working. Thanks |
@brightmart please share links to download dataset. |
@brightmart Do I need an account on pan.baidu.com to download the dataset. Can you please upload the data to the repo? |
no need account. |
what directory should I put the file ‘zhihu-word2vec-title-desc.bin-100’? |
thank you so much |
@parahaoer I think put it in the same directory with the "model_train.py". For example, when you use TextCNN in the directory"a02_TextCNN", you process p7_TextCNN_train.py to train the model, at this time you should put the file 'zhihu-word2vec-title-desc.bin-100’ in the same directory. |
Could you please upload data 'zhihu-word2vec-title-desc.bin-100’ as well. The links do not work. Appreciate any quick response. |
@brightmart I load the zhihu-word2vec-title-desc.bin-100 as the wordvector file,train-zhihu4-only-title-all.txt as the trainning file,set multi_label_flag=false,use_embedding=true, |
hi. thanks for your feedback. as long as you can see that training and validation loss during training process is decreasing, it will be fine. the previous reported f1 score is not a right indicator of accuracy. I am updating the way of how to compute f1 score today. it is good to see that you can make it work for these several models. can you commit your version to this repository as a new branch? |
Respect Sir: It is a good project! [NEW, try use this, update 2018-08-12] |
set hyper parameter of use pretrain word embedding to false, you will no need to have 'zhihu-word2vec-title-desc.bin-100.
…________________________________
发件人: hlshao <notifications@github.com>
发送时间: 2018年8月24日 14:26
收件人: brightmart/text_classification
抄送: brightmart; State change
主题: Re: [brightmart/text_classification] sample data, pre-trained word embedding (#3)
Respect Sir:
It is a good project!
Could you please provide the file "zhihu-word2vec-title-desc.bin-100" in some where?
and the link below is out of date too...
Many thanks if you can help.
[NEW, try use this, update 2018-08-12]
zhihu-title-desc-multiple-label-v6.txt.zip, it contains three files:
https://pan.baidu.com/s/1mHgELJUHewQZ9zHDo_uhmA
train-zhihu-title-desc-multiple-label-v6.txt(around 3 million training data, multiple labels)
test-zhihu-title-desc-multiple-label-v6.txt(around 70k validation/test data,multiple labels)
train-zhihu-title-desc-multiple-label-200k-v6.txt( 200k training data,multiple labels, a subset of file one.)
―
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub<#3 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ASuYMCMg3LAtterhP33hDleahzuoJnmsks5uT5x-gaJpZM4Oezj1>.
|
I am using the TextRNN (a03) and cannot find this flag. Neither the downloads are working. I have changed following in p8_TextRNN_train.py: but the error is still the same: I have also changed this flag in a02_TextCNN since the TextRNN uses code ( Can you please share the pretrained embeddings or point me to the right place? edit: This one seems to be up to date: https://pan.baidu.com/s/1jIP9e6q. I am using your instructions to download @brightmart . After step 1 I get a windows that tells me to download the netdisk client from Baidu: The installer is in Chinese, which I don't speak, and there is no english version. |
@brightmart Cant down load file 'test-zhihu-forpredict-title-desc-v6.txt', please share it on any other platform |
re-generated data, and save as cached file, available to download. check this session in README.md: #Sample data: cached file |
|
Hi, I'm facing the same problem, how did you solve it? Thanks in advance |
hi.where is the file: /test-zhihu-forpredict-title-desc-v6.txt |
I'm getting this issue when I run training on a08 entity network and a06 seq2seq models.
Can I get or train this file?
zhihu-word2vec-title-desc.bin-100
Also, do you have sample datasets compatible with these models?
The text was updated successfully, but these errors were encountered: