Untar processor cannot handle members within subfolders #223

drammock opened this issue Dec 31, 2020 · 1 comment · Fixed by #224

drammock opened this issue Dec 31, 2020 · 1 comment · Fixed by #224


drammock commented Dec 31, 2020

Description of the problem

individual member files of an archive cannot be selected if they reside within a subfolder inside the archive.

For demonstration, I'm using an archive called MNE-kiloword-data.tar.gz (URL for it is in the code below).
Inside that archive is one folder with one file in it: MNE-kiloword-data/kword_metadata-epo.fif

I want to end up with the file here: /home/username/Desktop/MNE-kiloword-data/kword_metadata-epo.fif
In other words, I want the whole archive's contents to end up on my Desktop (this is just for illustration purposes; in reality we will put it somewhere more sensible).

I have tried several combinations of fname, path, and members below, and always end up with a FileNotFound error. I've also tried explicitly passing the subdirectory inside the archive as one of the members, like this:

processor = pooch.Untar(members=['MNE-kiloword-data/', 'MNE-kiloword-data/kword_metadata-epo.fif'])

...which also didn't work.

Full code that generated the error

fname = 'kword_metadata-epo.fif'
urls = {fname: ''}
registry = {fname: 'md5:3a124170795abbd2e48aae8727e719a8'}
path = '/home/username/Desktop/MNE-kiloword-data/'  # <--- update with your real username
processor = pooch.Untar(members=['MNE-kiloword-data/kword_metadata-epo.fif'])

foo = pooch.retrieve(url=urls[fname], known_hash=registry[fname],
                     fname=fname, path=path, processor=processor)

Full error message

Downloading data from '' to file '/home/username/Desktop/MNE-kiloword-data/kword_metadata-epo.fif'.
Extracting 'MNE-kiloword-data/kword_metadata-epo.fif' from '/home/username/Desktop/MNE-kiloword-data/kword_metadata-epo.fif' to '/home/username/Desktop/MNE-kiloword-data/kword_metadata-epo.fif.untar'
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-1-f2eb5fdc640e> in <module>
      6 processor = pooch.Untar(members=['MNE-kiloword-data/kword_metadata-epo.fif'])
----> 8 foo = pooch.retrieve(url=urls[fname], known_hash=registry[fname],
      9                      fname=fname,
     10                      path=path, processor=processor)

/opt/miniconda3/envs/mnedev/lib/python3.8/site-packages/pooch/ in retrieve(url, known_hash, fname, path, processor, downloader)
    238     if processor is not None:
--> 239         return processor(str(full_path), action, None)
    241     return str(full_path)

/opt/miniconda3/envs/mnedev/lib/python3.8/site-packages/pooch/ in __call__(self, fname, action, pooch)
     78             if not os.path.exists(extract_dir):
     79                 os.makedirs(extract_dir)
---> 80             self._extract_file(fname, extract_dir)
     81         # Get a list of all file names (including subdirectories) in our folder
     82         # of unzipped files.

/opt/miniconda3/envs/mnedev/lib/python3.8/site-packages/pooch/ in _extract_file(self, fname, extract_dir)
    186                     try:
    187                         # Save it to our desired file name
--> 188                         with open(os.path.join(extract_dir, member), "wb") as output:
    189                             output.write(
    190                     finally:

FileNotFoundError: [Errno 2] No such file or directory: '/home/username/Desktop/MNE-kiloword-data/kword_metadata-epo.fif.untar/MNE-kiloword-data/kword_metadata-epo.fif'

System information

  • Operating system: Linux (Xubuntu 20.04)
  • Python installation (Anaconda, system, ETS): miniconda
  • Version of Python: 3.8.6
  • Version of this package: 1.3.0
  If using conda, paste the output of conda list below:
Additional notes

FWIW, ultimately I will want to use the Pooch class; I had similar difficulties getting this to work with pooch.Pooch.fetch() however. My basic problem seems to be that when using a processor like Unzip() or Untar(), there is no option to put the output of the unzipping/untarring in an arbitrary location (i.e., there is only a suffix attribute). I think that if it were possible to do this:

processor = pooch.Untar(folder_to_put_untarred_content_into='/some/arbitrary/path')

...then my difficulties would go away (i.e., for my use case, if I could specify where to untar the archive, then I wouldn't need to bother with the members parameter at all, so I wouldn't hit the problem of members nested within subfolders). I suspect that implementing the target directory for Untar() is probably easier anyway, though I can imagine cases where both capabilities would be useful.

