-
-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improves performance of VirtualFolder.getAllFiles() #915
improves performance of VirtualFolder.getAllFiles() #915
Conversation
for (SpoonFile f : getFiles()) { | ||
// we take care not to add a file that was already found in a folder | ||
if (!result.contains(f)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are your sure hashCode/equals of SpoonFile are correct? I'm not sure they are well tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The VirtualFile and ZipFile really miss hashCode, so they would not work correctly with the HashSet. Do you agree to add hashCode method to them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not to have equals/hashcode in those classes, in order to not to maintain them in the future. And consequently, it seems betetr to avoid a hashset here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The not using HashSet means a performance problems with bigger number of files. The MS Windows file.equals() method is much faster then File.getCannonicalFile, but it is still quite slow. So recursive calling of it in List.contains needs time too. I will measure it tomorrow and give you exact number.
for (SpoonFile f : getAllFiles()) { | ||
SpoonFolder folder = f.getParent(); | ||
if (!result.contains(folder)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are your sure hashCode/equals of SpoonFolder are correct? I'm not sure they are well tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The VirtualFolder really miss hashCode, so it would not work correctly with the HashSet. Do you agree to add hashCode method to it?
//Measure how long it needs to return all files | ||
long startTime = System.currentTimeMillis(); | ||
folder.getAllFiles(); | ||
assertTrue("The VirtualFolder.getAllFiles() is too slow", (System.currentTimeMillis() - startTime)<1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Spoon, we try not to make assertions based on currentTimeMillis, they are too dependent on the platform. To me, you can remove the tests. Everybody agrees?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you check the times needed by execution of each test? I do not know your testing environment, but some tools reports a warning/error even when test passes, but needs much longer time then previous run of the same test. If you have such execution time comparator, then it would make sense to keep the test there - I would just remove assert which checks absolute time.
58b23aa
to
9af3fc5
Compare
I have measured the performance of solution with and without hashSet and the results are similar. So I have rolled back changes in VirtualFolder so the hashSet is not used, so the hashCode methods does not have to be added. |
What if we simply remove all usages of getCanonicalFile? Do we get a performance improvement as well? |
Yes, it improves performance too, but then two files, which are same will may be return false on File.equals() when they are using different path ... |
we can keep the canonical version. However, I'd remove the isCanonical boolean, if not canonicalizable, we throw a SpoonException, for sake of least astonishment. |
9af3fc5
to
cd7ef5e
Compare
Contains
The performance improvement consists of these changes:
Note: Before the test needed 80s on 900 files. After performance fix it needs 0.2s
Note: The usage of helper HashSet would be enough to fix performance of VirtualFolder getAllFiles() and getSubFolders(), but FileSystemFile.toFile() and FileSystemFolder.toFile() and getPath() is probably called on other places in Spoon too, therefore I suggest to take both changes.