Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OAK-11131 - indexing-job: AOT Blob downloader may download blobs that are not needed for the indexes #1738

Merged
merged 8 commits into from
Sep 26, 2024

Conversation

nfsantos
Copy link
Contributor

@nfsantos nfsantos commented Sep 25, 2024

The AOT Blob downloader makes several checks on each line/node in the FFS before downloading an non-inlined blob. One of these checks is if the line/node is needed by the indexers for which the feature is enabled. The current implementation is checking if any of the indexers requires the line. This is incorrect, because it may download blobs for lines that are not required by the indexers with AOT blob download enabled. In the worst case, this will slow down significantly the downloader, which may make it fall behind the indexer, thereby preventing it from downloading the blobs for the lines that are actually needed by the indexers.

This PR changes the logic so that the AOT blob downloader to only check against the indexes for which AOT blob download is enabled when deciding if a line/node is to be considered for blob download or not.

This PR also adds additional tests for the AOT blob downloader.

…nsidered for download by matching only against the indexes for which the feature is enabled. Previously, it was checking against all indexes, which could lead to downloading blobs for nodes that are not indexed by an index that needs the blob.

Add tests for AOT blob downloader.
@@ -86,6 +87,13 @@ public Set<String> getRelativeIndexedNodeNames() {
return result;
}

@Override
public String getIndexName() {
// This is susceptible to infinite recursion if a composite indexer contains itself, directly or indirectly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This is susceptible to infinite recursion if a composite indexer contains itself, directly or indirectly.
// Many methods are susceptible to infinite recursion if a composite indexer contains itself, directly or indirectly.
// In this case, the methods would throw a StackOverflowException

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree, it should be a warning in the public documentation of the class. The users of this class have to be aware to avoid this issue. I updated the comment.

@@ -126,6 +126,11 @@ public Set<String> getRelativeIndexedNodeNames() {
return definition.getRelativeNodeNames();
}

@Override
public String getIndexName() {
return definition.getIndexName();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the usage of getIndexName() is fine. However, just FYI:

For on-prem and managed services, we support the same node name in multiple locations. For example it's possible to have

  • /oak:index/damAssetLucene
  • /content/dam/oak:index/damAssetLucene

both in the same repository.

It is unlikely that this causes issues for this case here however (in my view).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definition.getIndexName() returns the full path of the index, so there is no risk of collisions for indexes that are outside /oak:index.

@nfsantos nfsantos merged commit 9d00dc8 into apache:trunk Sep 26, 2024
1 of 2 checks passed
@nfsantos nfsantos deleted the OAK-11131 branch September 26, 2024 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants