-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Describe the bug, including details regarding any error messages, version, and platform.
Conda-forge very recently added minio, which means we could now use that, at least in the conda(-forge) jobs. I'd be interested in doing that because I see a bunch of skips in the test suite of the following kind (not exhaustive):
SKIPPED [2] pyarrow/tests/test_dataset.py:2623: `minio` command cannot be located
SKIPPED [1] pyarrow/tests/test_dataset.py:2638: `minio` command cannot be located
SKIPPED [1] pyarrow/tests/test_dataset.py:2664: `minio` command cannot be located
SKIPPED [1] pyarrow/tests/test_dataset.py:4754: `minio` command cannot be located
SKIPPED [1] pyarrow/tests/test_dataset.py:4821: `minio` command cannot be located
SKIPPED [1] pyarrow/tests/test_fs.py:471: `minio` command cannot be located
SKIPPED [2] pyarrow/tests/test_fs.py:562: `minio` command cannot be located
so if all it takes is adding the test dependency - why not?
As it turns out, this blows up pretty badly on main (through a proxy PR on our infra), which is at least partly, because arrow currently pins to a relatively outdated version
arrow/ci/scripts/install_minio.sh
Lines 54 to 56 in 4fac528
| # Use specific versions for minio server and client to avoid CI failures on new releases. | |
| minio_version="minio.RELEASE.2022-05-26T05-48-41Z" | |
| mc_version="mc.RELEASE.2022-05-09T04-08-26Z" |
while the oldest available version in conda-forge is 2023.08.23.10.07.06.
Aside from 2-3 exceptions, the failures are all from the teardown of the s3fs and py_fsspec_s3fs fixtures, where the code unconditionally does fs.delete_dir(bucket), even though the test presumably scribbled stuff in there. This leads to errors of the kind:
_ ERROR at teardown of test_filesystem_is_functional_after_pickling[builtin_pickle-S3FileSystem] _
request = <SubRequest 's3fs' for <Function test_filesystem_is_functional_after_pickling[builtin_pickle-S3FileSystem]>>
s3_server = {'connection': ('localhost', 41707, 'arrow', 'apachearrow'), 'process': <Popen: returncode: None args: ['minio', '--compat', 'server', '--quiet', '-...>, 'tempdir': local('/tmp/pytest-of-conda/pytest-0')}
@pytest.fixture
def s3fs(request, s3_server):
[...]
> fs.delete_dir(bucket)
pyarrow/tests/test_fs.py:258:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pyarrow/_fs.pyx:616: in pyarrow._fs.FileSystem.delete_dir
check_status(self.fs.DeleteDir(directory))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> raise convert_status(status)
E OSError: When deleting bucket 'pyarrow-filesystem': AWS Error UNKNOWN (HTTP status 409) during DeleteBucket operation: Unable to parse ExceptionName: BucketNotEmpty Message: The bucket you tried to delete is not empty
pyarrow/error.pxi:91: OSError
more specifically, close to ~60 of them:
= 4 failed, 7276 passed, 223 skipped, 19 deselected, 28 xfailed, 2 xpassed, 8 warnings, 57 errors in 177.84s (0:02:57) =
My first guess would be that:
- either minio got stricter (less buggier) in when it allows deletion of non-empty buckets
- or something in the conda-forge setup does not yet accurately reproduce what the CI is doing here.
I'm quite out of my depths here, but I think a alternative would be to somehow pipe through ForceBucketDelete or --force. A cheap alternative is the following patch (also doesn't cover 100%), which just doesn't care about failed bucket deletes:
--- a/python/pyarrow/tests/test_fs.py
+++ b/python/pyarrow/tests/test_fs.py
@@ -256,7 +256,10 @@ def s3fs(request, s3_server):
allow_move_dir=False,
allow_append_to_file=False,
)
- fs.delete_dir(bucket)
+ try:
+ fs.delete_dir(bucket)
+ except OSError:
+ pass
@pytest.fixture
@@ -358,7 +361,10 @@ def py_fsspec_s3fs(request, s3_server):
allow_move_dir=False,
allow_append_to_file=True,
)
- fs.delete_dir(bucket)
+ try:
+ fs.delete_dir(bucket)
+ except OSError:
+ pass
@pytest.fixture(params=[After that patch, the only remaining errors are then:
FAILED pyarrow/tests/test_fs.py::test_get_file_info[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - AssertionError: assert <FileType.Directory: 3> == <FileType.NotFound: 0>
+ where <FileType.Directory: 3> = <FileInfo for 'pyarrow-filesystem/a/aa/aaa/': type=FileType.Directory>.type
+ and <FileType.NotFound: 0> = FileType.NotFound
FAILED pyarrow/tests/test_fs.py::test_delete_dir[S3FileSystem] - Failed: DID NOT RAISE <class 'OSError'>
FAILED pyarrow/tests/test_fs.py::test_delete_dir_contents[S3FileSystem] - Failed: DID NOT RAISE <class 'OSError'>
FAILED pyarrow/tests/test_fs.py::test_move_directory[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - Failed: DID NOT RAISE <class 'OSError'>
This issue does not happen on 13.0.0?!
The final kicker is that all this is passing with arrow 13 - I even checked that the tests didn't get skipped. So it appears there are at least two things at work here: a change in minio behaviour & a change in pyarrow somewhere.
Component(s)
Continuous Integration, Python