Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improves performance of VirtualFolder.getAllFiles() #915

Merged
merged 1 commit into from
Oct 31, 2016

Conversation

pvojtechovsky
Copy link
Collaborator

@pvojtechovsky pvojtechovsky commented Oct 28, 2016

Contains

  1. test which measures performance of VirtualFolder.getAllFiles()
  2. test which measures performance of VirtualFolder.getSubFolders()
  3. code which improves that performance.

The performance improvement consists of these changes:

  1. FileSystemFile calls slow File.getCanonicalFile() only once in constructor. It is not called later. I hope it breaks nothing. Please check it.
  2. FileSystemFolder calls slow File.getCanonicalFile() only once in constructor. It is not called later. I hope it breaks nothing. Please check it.
  3. VirtualFolder getAllFiles() and getSubFolders() uses helper HashSet to collect distinct set of files

Note: Before the test needed 80s on 900 files. After performance fix it needs 0.2s
Note: The usage of helper HashSet would be enough to fix performance of VirtualFolder getAllFiles() and getSubFolders(), but FileSystemFile.toFile() and FileSystemFolder.toFile() and getPath() is probably called on other places in Spoon too, therefore I suggest to take both changes.

for (SpoonFile f : getFiles()) {
// we take care not to add a file that was already found in a folder
if (!result.contains(f)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are your sure hashCode/equals of SpoonFile are correct? I'm not sure they are well tested.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VirtualFile and ZipFile really miss hashCode, so they would not work correctly with the HashSet. Do you agree to add hashCode method to them?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to have equals/hashcode in those classes, in order to not to maintain them in the future. And consequently, it seems betetr to avoid a hashset here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The not using HashSet means a performance problems with bigger number of files. The MS Windows file.equals() method is much faster then File.getCannonicalFile, but it is still quite slow. So recursive calling of it in List.contains needs time too. I will measure it tomorrow and give you exact number.

for (SpoonFile f : getAllFiles()) {
SpoonFolder folder = f.getParent();
if (!result.contains(folder)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are your sure hashCode/equals of SpoonFolder are correct? I'm not sure they are well tested.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VirtualFolder really miss hashCode, so it would not work correctly with the HashSet. Do you agree to add hashCode method to it?

//Measure how long it needs to return all files
long startTime = System.currentTimeMillis();
folder.getAllFiles();
assertTrue("The VirtualFolder.getAllFiles() is too slow", (System.currentTimeMillis() - startTime)<1000);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Spoon, we try not to make assertions based on currentTimeMillis, they are too dependent on the platform. To me, you can remove the tests. Everybody agrees?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you check the times needed by execution of each test? I do not know your testing environment, but some tools reports a warning/error even when test passes, but needs much longer time then previous run of the same test. If you have such execution time comparator, then it would make sense to keep the test there - I would just remove assert which checks absolute time.

@pvojtechovsky pvojtechovsky force-pushed the virtualFolderPerformance branch 2 times, most recently from 58b23aa to 9af3fc5 Compare October 30, 2016 14:10
@pvojtechovsky
Copy link
Collaborator Author

I have measured the performance of solution with and without hashSet and the results are similar. So I have rolled back changes in VirtualFolder so the hashSet is not used, so the hashCode methods does not have to be added.
The remaining changes makes sense. They really improves performance 60s->0.2s on 900 files on MS Windows 10. So it makes sense to accept it.
I have removed platform specific test, which failed on bad performance.
I have squashed all commits into one.
Is there anything else needed to accept this PR?

@monperrus
Copy link
Collaborator

What if we simply remove all usages of getCanonicalFile? Do we get a performance improvement as well?

@pvojtechovsky
Copy link
Collaborator Author

pvojtechovsky commented Oct 30, 2016

Yes, it improves performance too, but then two files, which are same will may be return false on File.equals() when they are using different path ...
/root/aDir/../backInRoot/x.txt
/root/backInRoot/x.txt
But I am not sure, it has to be tested

@monperrus
Copy link
Collaborator

we can keep the canonical version. However, I'd remove the isCanonical boolean, if not canonicalizable, we throw a SpoonException, for sake of least astonishment.

@monperrus monperrus merged commit b4591f5 into INRIA:master Oct 31, 2016
@pvojtechovsky pvojtechovsky deleted the virtualFolderPerformance branch November 6, 2016 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants