Skip to content

FileHistoryCache#store is one big memory hog #3243

Closed
@vladak

Description

@vladak

Observing the RSS/CPU of indexer process that does indexing from scratch of multiple repositories with heavy history (Linux kernel, FreeBSD, ...) running with 16 threads and 48 GiB heap (the machine has 32 CPUs and 256 GB RAM), there is clearly something bad going on. The indexing is in the phase of generating history caches for all projects. The indexer process has ~50 GB RSS, is busy on the CPU (say 60%), the usage grows, stays a bit at the maximum (70%) and then quickly falls down (to low 60's %). This cycle repeats every couple of seconds (assuming the GC is busy collecting and then is either done or gets stopped because it spent too much time on the CPU) while there are Mercurial/Git log processes running, getting the history of the whole repository. This happens with 1.4.15.

I have not done any heap analysis yet, however by looking at FileHistoryCache#store, this is just asking for trouble. First, the whole repository history is stored in memory (in the form of History/HistoryEntry objects) which could be quite sizeable of its own (sample Linux kernel repo has 500k+ changsets and 50k+ files on disk) and is then converted to the inverted map:

HashMap<String, List<HistoryEntry>> map = new HashMap<>();

When tagged history is enabled (as is the case for the indexer run I am observing), it gets even worse:

466                  /*
467                   * We need to do deep copy in order to have different tags
468                   * per each commit.
469                   */
470                  if (env.isTagsEnabled() && repository.hasFileBasedTags()) {
471                      list.add(new HistoryEntry(e));
472                  } else {
473                      list.add(e);
474                  }

In such case there will be distinct HistoryEntry object for each changeset that touched given file. In overall, this will lead to explosive growth of HistoryEntry objects.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions