Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX, ENH: Handle nested archive members; allow arbitrary unpack locations #224

Merged
merged 9 commits into from
May 5, 2021

Conversation

drammock
Copy link
Contributor

@drammock drammock commented Jan 4, 2021

closes #223

This PR does two things:

  1. handle cases where the members passed to a processor reside within subfolders inside the archive, by creating the necessary destination subfolders before attempting to write the unpacked file. Previously, using a processor like this Untar(members=['archive_internal_subfolder/desired_file.txt']) would fail with a FileNotFoundError.
  2. allow processors to unpack to arbitrary locations, instead of being restricted to a folder with the same name as the archive with a suffix appended. This is done with a new optional parameter to the processors called extract_dir. If extract_dir is None (the default) then the current suffix-based approach is used as a fallback.

Reminders:

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst and the base __init__.py file for the package. N/A
  • Write detailed docstrings for all functions/classes/methods. It often helps to design better code if you write the docstrings first.
  • If adding new functionality, add an example to the docstring, gallery, and/or tutorials.
  • Add your full name, affiliation, and ORCID (optional) to the AUTHORS.md file (if you haven't already) in case you'd like to be listed as an author on the Zenodo archive of the next release.

@welcome
Copy link

welcome bot commented Jan 4, 2021

💖 Thanks for opening your first pull request! 💖

Please make sure you read the following:

  • Authorship Guidelines: Our rules for giving you credit for your contributions, including authorship on publications and Zenodo archives.
  • Contributing Guide: What the review process is like and our infrastructure for testing and documentation.
  • Code of Conduct: How we expect people to interact in our projects.

A few things to keep in mind:

  • Remember to run make format to make sure your code follows our style guide.
  • If you need help writing tests, take a look at the existing ones for inspiration. If you don't know where to start, let us know and we'll walk you through it.
  • All new features should be documented. It helps to write the docstrings for your functions/classes before writing the code. This will help you think about your code design and results in better code.
  • No matter what, we are really grateful that you put in the effort to do this! ⭐

@drammock drammock marked this pull request as draft January 4, 2021 18:30
@drammock drammock force-pushed the allow-arbitrary-unzip-locations branch from 992c1dd to f9f8dc7 Compare January 4, 2021 20:42
@drammock
Copy link
Contributor Author

drammock commented Jan 4, 2021

This is ready for review. I'll need a little advice on testing/coverage; I thought that I could just change this line to use 'subdir/tiny-data.txt' and change this line accordingly, and then I'd hit the missed lines. But when I tried that locally I got KeyError: "filename 'subdir/tiny-data.txt' not found" when the test attempts to unpack the file... I thought that there was a copy of tiny-data.txt inside subdir in the testing data archive, but probably I am just not understanding something.

On a side note, make test does not seem to run all tests (?). Specifically the additional parametrize cases I added are not picked up (they are run by pytest pooch/tests/test_processors.py but not by make test), but I don't know why.

@drammock drammock marked this pull request as ready for review January 4, 2021 20:52
@leouieda
Copy link
Member

leouieda commented Jan 6, 2021

@drammock thanks for the issue and PR! Life's a bit hectic at the moment so I'll be a little while before I can review this. If anyone else wants to have a go, they are more than welcome 🙂

On a side note, make test does not seem to run all tests (?).

Maybe something with the Python environment that is running each? This is the first I'm hearing of something like this which is very odd.

But when I tried that locally I got KeyError: "filename 'subdir/tiny-data.txt' not found" when the test attempts to unpack the file.

This is a bug in our code that when we go to save the file to the filesystem, we don't create missing parent directories like subdir (hence the file not found error). What you'd need to do is check if the member file has one or more preceding directories and create them in the destination path before creating the file. Or something equivalent to that but more clever.

@drammock
Copy link
Contributor Author

drammock commented Jan 6, 2021

This is a bug in our code that when we go to save the file to the filesystem, we don't create missing parent directories like subdir (hence the file not found error).

That is exactly the problem that I thought I fixed in this PR (point number 1 in the PR description). And it seems to be working in my local testing with my own data, but it's not working with the sample testing data included with Pooch.

What you'd need to do is check if the member file has one or more preceding directories and create them in the destination path before creating the file.

I thought that's what these lines do:

https://github.com/fatiando/pooch/pull/224/files#diff-0951777c6ece4ad11ab3ee09041b96c0e1befd3c52c98a8bc96f86601f222b1fR150-R153

@drammock
Copy link
Contributor Author

drammock commented Jan 9, 2021

OK, I figured out how to write the correct test so that the new code has 100% coverage. But now all the windows CIs are failing because (I think) os.makedirs is not doing its job? I very rarely use windows, so any windows users feel free to jump in with advice.

Also, the various tests in test_processors.py are looking a little redundant, there might be a way to combine a coupel tests by making the pytest.parametrize more complex (i.e., parametrize over which members to extract). But that seems like a matter of taste, so I won't do it unless one of your core devs asks for it.

@drammock drammock force-pushed the allow-arbitrary-unzip-locations branch from fc7f5af to ef96a24 Compare January 11, 2021 20:54
@drammock
Copy link
Contributor Author

OK @leouieda I've managed to get all the tests to pass now, and also removed some of the redundancy in the processors test by using multiple parametrize decorators on the same test. I think I've done all I can do to make it an easy review at this point, so the ball is in your (or one of the other devs) court. Let me know if I'm mistaken and there's something more you'd like to see here.

@drammock
Copy link
Contributor Author

bump. Over at MNE-Python we want to switch to using pooch, but need the functionality in this PR to make it possible. All tests are passing... any idea how long we're likely to have to wait?

Copy link
Member

@leouieda leouieda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drammock sorry for the very long delay on this. It's been a tough semester.

Just found some time to review this and it looks great! Thanks for the refactoring of the processor tests as well (and the typo fixes). Merging as soon as CIs are green again and I'll cut a new release in the following weeks.

@leouieda leouieda merged commit 858d8f1 into fatiando:master May 5, 2021
@welcome
Copy link

welcome bot commented May 5, 2021

🎉🎉🎉 Congrats on merging your first pull request and welcome to the team! 🎉🎉🎉

If you would like to be added as a author on the Zenodo archive of the next release, make sure that you have added your full name, affiliation, and ORCID (optional) to the AUTHORS.md file of this repository.

We hope that this was a good experience for you. Let us know if there is any way that the contributing process could be improved.

@drammock drammock deleted the allow-arbitrary-unzip-locations branch May 5, 2021 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Untar processor cannot handle members within subfolders
2 participants