-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Make GeoIp downloader multi-project aware #128282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…oip/GeoIpDownloader.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Updated the title, and will for sure squash the commits before merging
The GeoIP yaml tests requires more work in MP, which is loading the databases from system index to ingest node. To limit the size of this PR, I will include that part in another PR |
if (clusterService.state().nodes().isLocalNodeElectedMaster() == false) { | ||
// we should only start/stop task from single node, master is the best as it will go through it anyway | ||
return; | ||
} | ||
if (enabled) { | ||
startTask(() -> {}); | ||
startTask(projectResolver.getProjectId(), () -> {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only works because the project resolver currently still returns a fallback project ID (which will be removed in the future). This will need to be updated in the future anyway as well (when these settings are made project-aware), but I'd still prefer that we just hard-code ProjectId.DEFAULT
instead. That way we at least don't rely on temporary behavior. The same goes for the methods/settings below.
GeoIpDownloader currentDownloader = getCurrentTask(); | ||
if (currentDownloader != null) { | ||
currentDownloader.requestReschedule(); | ||
for (var projectMetadataEntry : event.state().metadata().projects().entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer that we put the annotation here as well. That way we're sure it won't be forgotten.
...s/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/GeoIpDownloaderTaskExecutor.java
Outdated
Show resolved
Hide resolved
boolean taskIsBootstrapped = taskIsBootstrappedByProject.computeIfAbsent(projectId, k -> false); | ||
if (taskIsBootstrapped != true) { | ||
taskIsBootstrappedByProject.put(projectId, true); | ||
this.taskIsBootstrappedByProject.computeIfAbsent(projectId, k -> hasAtLeastOneGeoipProcessor(projectMetadata)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed that taskIsBootstrappedByProject
is used async in startTask
and stopTask
. The current code isn't 100% airtight. I was checking on ConcurrentHashMap
if there's anything we can use, but I don't think so. ConcurrentHashMap#replace
is close, but it doesn't account for unset/null
values... So, I think we have to go back to the AtomicBoolean
inside the map again, sorry about that. I think the other maps are OK like this.
var atLeastOneGeoipProcessor = atLeastOneGeoipProcessorByProject.computeIfAbsent(projectId, k -> false); | ||
boolean newAtLeastOneGeoipProcessor = hasAtLeastOneGeoipProcessor(projectMetadata); | ||
|
||
if (newAtLeastOneGeoipProcessor && atLeastOneGeoipProcessor == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var atLeastOneGeoipProcessor = atLeastOneGeoipProcessorByProject.computeIfAbsent(projectId, k -> false); | |
boolean newAtLeastOneGeoipProcessor = hasAtLeastOneGeoipProcessor(projectMetadata); | |
if (newAtLeastOneGeoipProcessor && atLeastOneGeoipProcessor == false) { | |
var atLeastOneGeoipProcessor = atLeastOneGeoipProcessorByProject.getOrDefault(projectId, false); | |
if (atLeastOneGeoipProcessor == false && hasAtLeastOneGeoipProcessor(projectMetadata)) { |
and move the put
below to inside the if-block.
- doing
getOrDefault
instead ofcomputIfAbsent
avoids the redundant new value - performing the (relatively expensive check) as the second condition allows us to skip it if we already set this value to
true
once before (as it'll never get set tofalse
, we only remove the entry) - we don't need to
put
false
on every run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can move the put
inside the if block or avoid hasAtLeastOneGeoipProcessor
. It's possible to remove a Geoip processor, in which case we need to set the value from true
to false
. It's also easier to reason about the logic this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah you're right; I misread the old code. I think it's still worth doing getOrDefault
instead of computeIfAbsent
and I'd wrap the put
inside a if (atLeastOneGeoipProcessor != newAtLeastOneGeoipProcessor)
, to save some unnecessary updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer still using computeIfAbsent
, there's a value always exists in the map for each project, it's just one put for every project, I don't think it's a huge problem. I added the conditional put though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what the value is of doing the compute? It's an extra put on the map without any value.
...s/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/GeoIpDownloaderTaskExecutor.java
Outdated
Show resolved
Hide resolved
...est-geoip/src/test/java/org/elasticsearch/ingest/geoip/GeoIpDownloaderTaskExecutorTests.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/ClusterChangedEvent.java
Outdated
Show resolved
Hide resolved
.putProjectMetadata(ProjectMetadata.builder(project2).build()) | ||
.build(); | ||
ClusterChangedEvent event = new ClusterChangedEvent("_na_", newState, originalState); | ||
assertTrue(event.customMetadataChanged(project2.id(), IndexGraveyard.TYPE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this IndexGraveyard
line a leftover, or did you put it there intentionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's intentional, IndexGraveyard
is always added when a new project is created, this checks that IndexGraveyard
is "changed".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, could you put a comment here and below to mention that?
this.taskIsBootstrappedByProject.computeIfAbsent( | ||
projectId, | ||
k -> new AtomicBoolean(hasAtLeastOneGeoipProcessor(projectMetadata)) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these lines are correct. We're already setting taskIsBootstrapped
above, right? Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
var atLeastOneGeoipProcessor = atLeastOneGeoipProcessorByProject.computeIfAbsent(projectId, k -> false); | ||
boolean newAtLeastOneGeoipProcessor = hasAtLeastOneGeoipProcessor(projectMetadata); | ||
|
||
if (newAtLeastOneGeoipProcessor && atLeastOneGeoipProcessor == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah you're right; I misread the old code. I think it's still worth doing getOrDefault
instead of computeIfAbsent
and I'd wrap the put
inside a if (atLeastOneGeoipProcessor != newAtLeastOneGeoipProcessor)
, to save some unnecessary updates.
.putProjectMetadata(ProjectMetadata.builder(project2).build()) | ||
.build(); | ||
ClusterChangedEvent event = new ClusterChangedEvent("_na_", newState, originalState); | ||
assertTrue(event.customMetadataChanged(project2.id(), IndexGraveyard.TYPE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, could you put a comment here and below to mention that?
This change makes the GeoIp persistent task executor/downloader multi-project aware.
<project-id>/geoip-downloader
for cluster in MP modeTo keep the size of PR review friendly, this PR only focus on the downloading part of GeoIP database, there will be more changes coming in separate PRs to make GeoIP multi-project aware in general.