-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk Indexing Optimization and Refresh on Async Strategies #467
Conversation
…_while_resetting` is active.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, great idea! Did you try it on production?
Another optimization option is to start the index without any replicas, and only later adding them, but that really depends on the use case
Does it make sense to split this 2 options (the refresh interval and the replicas number) to be able to enable them separately?
lib/chewy/index/actions.rb
Outdated
result = | ||
if Chewy.use_enhance_index_settings_while_resetting | ||
name = build_index_name(suffix: suffix) | ||
client.indices.put_settings index: name, body: { index: { number_of_replicas: 0, refresh_interval: -1 } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we refactor this? Make number_of_replicas
a separate method with the block or so. I mean, this method became way huge, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll refactor.
lib/chewy/index/actions.rb
Outdated
|
||
def extract_index_settings(*args) | ||
return {} unless settings_hash.key?(:settings) || settings_hash[:settings].key?(:index) | ||
settings_hash[:settings][:index].select { |k, _| args.include?(k) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
settings_hash[:settings][:index].slice(*args)
is slightly simpler.
rubocop is now 0.47.0 but I locked it to 0.46.0 since new offenses were flagged |
Well, i fixed 0.47.0 offenses. Is that ok? |
Sure it is ok. Did you take a look on my comment about separate options? What do you think? |
.rubocop.yml
Outdated
@@ -32,3 +32,8 @@ Style/MultilineMethodCallIndentation: | |||
|
|||
Style/MultilineOperationIndentation: | |||
EnforcedStyle: indented | |||
|
|||
Metrics/BlockLength: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Now we need this in every project :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Honestly, I don't know about separating options. When would put back the original number_of_replicas if it was separated ? I would say the doc is confusing about that. What do you think about the force_merge after ?
|
I mean, setting refresh interval should be separate from setting the number of replicas. Somebody can decide to use either one, but not both. |
I would change the config settings to reflect user's choice :only_refresh
:only_replicas
:both
:none |
Hm, what about 2 separate boolean options instead? |
No prob |
Well, looks like elastic/elasticsearch-ruby@cbfd6eb It was removed in 2.0 (became a plugin). 5.0, it's back again but with a new definition |
:reset_no_replicas, | ||
|
||
# Refresh or not when import async (sidekiq, resque, activejob) | ||
:disable_refresh_async |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, why did you add this?
I'm thinking about some options passed to the block: Chewy.strategy(:sidekiq, refresh: false) { do_stuff }
Instead of the global option. However it would not affect anything, so I'm ok to accept it if you don't want to change your mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both ways could work. We could set the strategy globally for the async strategies
or we could simply pass some options to the block to have a different behavior for some cases. What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this makes sense. I'm just thinking of it being slightly excess. But yeah, we can keep both ways. Tell me when you think this PR is ready, I'll review it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ready to review sir 👍
Ok, let's try it out. Thanks for this functionality! |
Hello Pyro,
First of all, have a great year 👏👏
After some reading on ElasticSearch documentation, it seems that dynamically disabling
refresh_interval
and decreasingnumber_of_replicas
to 0 should enhance bulk indexing on indices. If thought it could be a nice configuration setting when zero downtime index resetting. I called ituse_enhance_index_settings_while_resetting
Also, I added another configuration setting to disable
refresh
on async strategies likesidekiq
,resque
andactive_jobs
. If you choose an async strategy, probably that you don't need a near-real time searchable document. Well in our case, we don't so I thought it would be nice addition.Thanks