This repository was archived by the owner on Jul 7, 2023. It is now read-only.
This repository was archived by the owner on Jul 7, 2023. It is now read-only.
cannot download babi data ("HTTP Error 403: Forbidden) #1206
Open
Description
Description
I consistently get "HTTP Error 403: Forbidden" error when trying to download the "babi" dataset. It happens when using either "t2t-datagen" or "t2t-trainer". Workaround provided at end of issue.
Environment information
OS: Windows 10, version 1809
$ pip freeze | grep tensor
mesh-tensorflow==0.0.3
tensor2tensor==1.10.0
tensorboard==1.11.0
tensorflow==1.10.0
$ python -V
Python 3.5.5 :: Anaconda custom (64-bit
For bugs: reproduction and error logs
# Steps to reproduce:
1. cd to where your tensor2tensor scripts are installed (in my case: C:\anaconda3\envs\tensorflow\Scripts)
2. python t2t-datagen --problem=babi_qa_concat_task10_10k
Error logs:
(tensorflow) C:\anaconda3\envs\tensorflow\Scripts>python t2t-datagen --problem=babi_qa_concat_task10_10k
c:\anaconda3\envs\tensorflow\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
WARNING:tensorflow:It is strongly recommended to specify --data_dir. Data will be written to default data_dir=C:\Users\rfernand\AppData\Local\Temp.
INFO:tensorflow:Generating problems:
babi:
* babi_qa_concat_task10_10k
INFO:tensorflow:Generating data for babi_qa_concat_task10_10k.
INFO:tensorflow:Downloading http://www.thespermwhale.com/jaseweston/babi/tasks_1-20_v1-2.tar.gz to /tmp/t2t_datagen\tasks_1-20_v1-2.tar.gz
Traceback (most recent call last):
File "c:\anaconda3\envs\tensorflow\lib\site-packages\tensor2tensor\data_generators\generator_utils.py", line 215, in maybe_download
tf.gfile.Copy(uri, filepath)
File "c:\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 397, in copy
compat.as_bytes(oldpath), compat.as_bytes(newpath), overwrite, status)
File "c:\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 519, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme 'http' not implemented (file: 'http://www.thespermwhale.com/jaseweston/babi/tasks_1-20_v1-2.tar.gz')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "t2t-datagen", line 28, in <module>
tf.app.run()
File "c:\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "t2t-datagen", line 23, in main
t2t_datagen.main(argv)
File "c:\anaconda3\envs\tensorflow\lib\site-packages\tensor2tensor\bin\t2t_datagen.py", line 198, in main
generate_data_for_registered_problem(problem)
File "c:\anaconda3\envs\tensorflow\lib\site-packages\tensor2tensor\bin\t2t_datagen.py", line 260, in generate_data_for_registered_problem
problem.generate_data(data_dir, tmp_dir, task_id)
File "c:\anaconda3\envs\tensorflow\lib\site-packages\tensor2tensor\data_generators\text_problems.py", line 296, in generate_data
self.generate_encoded_samples(data_dir, tmp_dir, split)), paths)
File "c:\anaconda3\envs\tensorflow\lib\site-packages\tensor2tensor\data_generators\generator_utils.py", line 155, in generate_files
for case in generator:
File "c:\anaconda3\envs\tensorflow\lib\site-packages\tensor2tensor\data_generators\babi_qa.py", line 383, in generate_encoded_samples
generator = self.generate_samples(data_dir, tmp_dir, dataset_split)
File "c:\anaconda3\envs\tensorflow\lib\site-packages\tensor2tensor\data_generators\babi_qa.py", line 347, in generate_samples
tmp_dir = _prepare_babi_data(tmp_dir, data_dir)
File "c:\anaconda3\envs\tensorflow\lib\site-packages\tensor2tensor\data_generators\babi_qa.py", line 126, in _prepare_babi_data
file_path = generator_utils.maybe_download(tmp_dir, _TAR, _URL)
File "c:\anaconda3\envs\tensorflow\lib\site-packages\tensor2tensor\data_generators\generator_utils.py", line 220, in maybe_download
uri, inprogress_filepath, reporthook=download_report_hook)
File "c:\anaconda3\envs\tensorflow\lib\urllib\request.py", line 188, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "c:\anaconda3\envs\tensorflow\lib\urllib\request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "c:\anaconda3\envs\tensorflow\lib\urllib\request.py", line 472, in open
response = meth(req, response)
File "c:\anaconda3\envs\tensorflow\lib\urllib\request.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File "c:\anaconda3\envs\tensorflow\lib\urllib\request.py", line 510, in error
return self._call_chain(*args)
File "c:\anaconda3\envs\tensorflow\lib\urllib\request.py", line 444, in _call_chain
result = func(*args)
File "c:\anaconda3\envs\tensorflow\lib\urllib\request.py", line 590, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
Issue Workaround
I found that the following code, replacing line 113 in the file "lib\site-packages\tensor2tensor\data_generators\babi_qa.py", fixed the problem:
# use agent signature of chrome to avoid "HTTP Error 403: Forbidden" errors on download on datasets like "babi"
use_workaround = True
if use_workaround:
file_path = os.path.join(tmp_dir, _TAR)
if not os.path.exists(file_path):
import urllib
opener=urllib.request.build_opener()
opener.addheaders=[('User-Agent', \
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36")]
urllib.request.install_opener(opener)
urllib.request.urlretrieve(_URL, file_path)
else:
file_path = generator_utils.maybe_download(tmp_dir, _TAR, _URL)
Metadata
Metadata
Assignees
Labels
No labels