-
Notifications
You must be signed in to change notification settings - Fork 396
Prevent returning cached entry if the entry is degenerate #1873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Prevent returning cached entry if the entry is degenerate #1873
Conversation
When `DatabricksFileSystem.ls` is called with a directory, it will query the API and build a directory listing, then cache it. If a subsequent `ls` is called on a directory that is child to the first directory, the results that are returned are for the child directory itself, as though it were just a file. The correct behavior is to get the contents of this child direcctory. This PR ignores such responses so that `ls` will call the API again with the child directory to get its contents. These results are then cached as usual and available for future repeated `ls` calls with the child directory.
I think I've fixed the 3.9 test issues (annotation issues). Not sure if there's anything I can do about the 3.12 ones (rate-limiting issues) |
OK, let's give this a try and see how it goes for others |
The dbfs tests only run on py3.9 currently, and you can see they are now failing. This might just mean having to re-record the vcr data, but I am not sure. |
Sorry for the delay, my annual leave happened =) The VCR cassettes are re-recorded and the tests should pass now (they do locally). |
When
DatabricksFileSystem.ls
is called with a directory, it will query the API and build a directory listing, then cache it.If a subsequent
ls
is called on a directory that is child to the first directory, the results that are returned are for the child directory itself, as though it were just a file. The correct behavior is to get the contents of this child directory.This PR ignores such responses so that
ls
will call the API again with the child directory to get its contents. These results are then cached as usual and available for future repeatedls
calls with the child directory.This addresses issue: #1865 by overriding
AbstractFileSystem._ls_from_cache
with a new implementation that only uses the parent's cache entry if it indicates thatpath
is not a "directory". If it is a "directory", nothing is returned to allow forDatabricksFileSystem.ls
to make a new API call to DBFS to get the contents of that path.Also, if
path
is not found in the cached listing of its parent (if it exists), then it strongly suggests thatpath
was created since the parent was cached. Using this information,DatabricksFileSystem.ls
will invalidate the parent's cache entry (by deleting it) before continuing.