Skip to content

The InfoStream progress lines from concurrent HNSW merging report duplicate times #15967

@mikemccand

Description

@mikemccand

I'm digging into nightly benchy and concurrent HNSW merging, trying to extend segment traces to record details of HNSW concurrency.

InfoStream has these useful lines:

HNSW 0 [2026-04-15T14:58:00.633261871Z; hnsw-merge-1-thread-91]: built 950000 in 11447/11447 ms

But the in X/X ms is always the same X. Here's the source:

  private long printGraphBuildStatus(int node, long start, long t) {
    long now = System.nanoTime();
    infoStream.message(
        HNSW_COMPONENT,
        String.format(
            Locale.ROOT,
            "built %d in %d/%d ms",
            node,
            TimeUnit.NANOSECONDS.toMillis(now - t),
            TimeUnit.NANOSECONDS.toMillis(now - start)));
    return now;
  }

And it's called from addVectors in the same class:

  /** add vectors in range [minOrd, maxOrd) */
  protected void addVectors(int minOrd, int maxOrd) throws IOException {
    if (frozen) {
      throw new IllegalStateException("This HnswGraphBuilder is frozen and cannot be updated");
    }
    long start = System.nanoTime(), t = start;
    if (infoStream.isEnabled(HNSW_COMPONENT)) {
      infoStream.message(HNSW_COMPONENT, "addVectors [" + minOrd + " " + maxOrd + ")");
    }
    for (int node = minOrd; node < maxOrd; node++) {
      addGraphNode(node);
      if ((node % 10000 == 0) && infoStream.isEnabled(HNSW_COMPONENT)) {
        t = printGraphBuildStatus(node, start, t);
      }
    }
  }

Since the chunk size is always 2048 (I think?), a worker working on a chunk could print 0 or 1 times ... which means the incoming t is always the same as start.

I suspect this was factored out from a place where start was the true start time (to create the whole graph), and this InfoStream message is trying to show "time since we started" and "time since the last progress print" or so. It's useful to fix this so segment traces can help debug HNSW merge concurrency...

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions