Skip to content

parallelize creation of file history cache for individual files #3636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jun 17, 2021

Conversation

vladak
Copy link
Member

@vladak vladak commented Jun 15, 2021

This change parallelizes the creation of file history cache for individual files, mimicking what is already done for renamed files.

On my laptop with i7-8665U (4 cores) and built in SSD I am seeing some decent speedup (6 vs 9 minutes) when creating history cache for the OpenSSL Git repository (renamed files on, merge commits on).

There is some potential for further speedup - the directories are still created sequentially and the process should avoid duplications.

@vladak vladak added the indexer label Jun 15, 2021
@ahornace
Copy link
Contributor

Nice work! Looks good besides the failing test.

@vladak vladak marked this pull request as draft June 15, 2021 15:23
@vladak
Copy link
Member Author

vladak commented Jun 15, 2021

Nice work! Looks good besides the failing test.

The failing test is actually a problem. Given that History objects share the HistoryEntry objects from the map constructed in FileHistoryCache#store() and the tags are assigned to the HistoryEntry objects, the parallel processing breaks this. I think the per entry tags need to be moved to the History object itself.

@vladak vladak marked this pull request as ready for review June 16, 2021 16:26
Copy link
Contributor

@ahornace ahornace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! All comments are just nitpicks, feel free to ignore them :)

@vladak
Copy link
Member Author

vladak commented Jun 17, 2021

After measuring the history cache runs I can no longer see the dramatic speedup (or rather the master runs much faster than before for some reason). The increased efficiency is still visible, though:

  • before:
    histcache-before
  • after:
    histcache-after

@vladak
Copy link
Member Author

vladak commented Jun 17, 2021

Checked that the tags are displayed normally in the UI after recreating the history cache from scratch.

@vladak vladak merged commit 383c498 into oracle:master Jun 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants