Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Predictive Primary shard count by ISM and Index Management #1225

Open
aswath86 opened this issue Aug 8, 2024 · 14 comments
Open

[FEATURE] Predictive Primary shard count by ISM and Index Management #1225

aswath86 opened this issue Aug 8, 2024 · 14 comments
Labels
enhancement New request

Comments

@aswath86
Copy link

aswath86 commented Aug 8, 2024

Is your feature request related to a problem?

  • Users who are new to OpenSearch or any Lucene based search engines often learn about sharding best practices the hard way when they end up producing 500,900GiB shard size or toooo many smaller shards (usually under 1gb each).
  • As an OpenSearch consultant (who has reviewed countless OpenSearch Clusters) I see majority of the OpenSearch indices use 5 primary (which is the default in Amazon OpenSearch), which goes to say that users do not pay attention to sharding strategy, especially when they are new to OpenSearch
  • To give an concrete example, in a multi-tenant cluster setup with varying ingestion rates, relying solely on a static primary count in the index template isn't sufficient. Consider a log observability workload with over 200 different types of applications, each represented by its own index-pattern. These rolling indices have varying size, ranging from extremely large to extremely small. Additionally, the ingestion rate fluctuates over time, causing low-volume indices to temporarily become medium-sized before returning to their original state. In such a dynamic environment, a "set it and forget it" approach for primary shard counts isn't ideal. Manual adjustments are impractical, and shard size-based retention doesn't work well for short retention periods (e.g., 30 hours). Using a static primary count at the index template level can lead to skewed metrics and storage inefficiencies.

What solution would you like?

  • Long term solution - to have a predictive primary shard sizing for indices, adaptively applied, and considering resource utilization, workload, data stored, and other metrics from the fleet with ML for classification. More on this later.

  • First step towards the long term solution (solution for time-series workload) - An ISM policy driven Action, let's call it "Mutating Rollover", which looks at past x hours of ingestion rate (where x is the HOT retention period), takes into account the number of data nodes, predicts a suitable primary shard count for the next rollover index. The prediction may not be precise but at least better than having a static primary shard count for an index with fluctuating ingestion rate.

  • Additional solution (solution for search workload) - Index Management allows to create an index. This page should ask for additional details such as,
    1. Type of workload (search/timeseries/vector)
    2. Expected primary storage size
    3. Number of data nodes (should probably can be auto populated)
    4. Multi-tenant domain domain or not
    5. etc., etc.,
    and suggest a primary shard count that the user can either choose or ignore and go with a manual value.

What alternatives have you considered?
Developed a python script which does looks at the past ingestion rate for an index to determine the suitable primary shard count for the index rollover. The script will run every 1 hour to update the index templates. This could be available out of the box as a "Mutating Rollover" Action in ISM. The python script does the below,

* GET _cat/indices/_hot?v&h=index,creation.date.string,pri.store.size,pri&s=index:desc,creation.date.string:desc #used to get calculate the ingestion volume for the last 30 hours
* 0gb to 30gb would be the general shard size
* size based rollover at shard size 30gb (30gb shard size is used in this example as optimal shard size but optimal shard size depends on the workload, usually ranging from 10gb to 50gb)
* For each index family, for last 30 hours if total ingested size (gb) is,
  - less than 10gb —> 1p1r sharding strategy —> 0gb to 10 gb would be the shard size range
  - 10gb to 30gb —> 2p1r sharding strategy —> 5gb to 15 gb would be the shard size range
  - 30gb to 60gb —> 3p1r sharding strategy —> 10gb to 20gb would be the shard size range
  - 60gb+ to 30*NO_OF_DATA_NODES gb —> 1p for every 30gb —> approx. 30gb would be the shard size range
  - 30*NO_OF_DATA_NODES+ gb -> (NO_OF_DATA_NODES)p —> approx. 30gb would be the shard size range
  - (optional) 30*(NO_OF_DATA_NODES/2) -> (NO_OF_DATA_NODES/2)p —> approx. 30gb would be the shard size range

Do you have any additional context?
I can provide the python script (for the above mentioned alternative solution) if it helps but it's fairly straight forward.


Below added on Sep 3rd, 2024

Goals

  1. Nudge users to adopt sharding strategy best practices — Not a lot of them read the best practices documentation? Only Amazon OpenSearch has the sharding strategy best practices and this is not even available in the opensource documentation? Especially for new OpenSearch users.
  2. Reduce storage & shard skewness — Storage skewness will introduce other hardware resource usage skewness, starting with CPU and Memory, disk throttle etc.,
  3. Avoid too small/large shards
  4. Reduce node hot spots

??? means undecided/unsure

ISM — Mutating Rollover

This can either be an Action that is separate from existing Rollover Action which only determines the primary count and sets its at the index template OR an extension to the Rollover Action. Properties for the Actions would be,

  1. Min Index age
  2. Min doc count — not necessary if this is separate Action???
  3. Min Index size — not necessary if this is separate Action???
  4. Min Primary shard size
  5. Default primary shard count — could be the last value OR 5 OR value set by the user???
  6. HOT retention period — The same value used in the warm migration
  7. Alert Channel — When the ingestion rate deviates from the estimation (a deviation is when the primary count estimation changes between the _hot estimation Vs. estimation based on current index Vs estimation based on last 1 hour, 2 hours, 3 hours etc., )
  8. Ingestion rate deviation threshold — ???

Sample implementation

To calculate the ingestion volume for the last 30 hours. Consider all the indices of an Index pattern in _hot.

GET _cat/indices/_hot?v&h=index,creation.date.string,pri.store.size,pri&
s=index:desc,creation.date.string:desc 
  • It was found that 30gb to be an optimal shard size for the given workload (but optimal shard size depends on the workload, usually ranging from 10gb to 50gb)
  • Size based rollover at shard size 30gb
  • 0gb to 30gb would be the general shard size
  • For each index pattern, in the last last 30 hours if total ingested size (gb) is,
    • less than 10gb —> 1p1r sharding strategy —> 0gb to 10 gb would be the shard size range
    • 10gb to 30gb —> 2p1r sharding strategy —> 5gb to 15 gb would be the shard size range
    • 30gb to 60gb —> 3p1r sharding strategy —> 10gb to 20gb would be the shard size range
    • 60gb+ to 30*NO_OF_DATA_NODES gb —> 1p for every 30gb —> approx. 30gb would be the shard size range
    • 30*NO_OF_DATA_NODES+ gb -> (NO_OF_DATA_NODES)p —> approx. 30gb would be the shard size range
    • (optional) 30*(NO_OF_DATA_NODES/2) -> (NO_OF_DATA_NODES/2)p —> approx. 30gb would be the shard size range

Estimating the ingestion rate

Get the sum of primary store size of all the indices of an index pattern. The HOT retention period is already known (let’s assume ‘x’ hours is HOT retention period). Estimate the ingestion volume for the last x hours. Look at the sample implementation (above) for estimating the primary shard count

GET _cat/indices/_hot?v&h=index,creation.date.string,pri.store.size,pri&
s=index:desc,creation.date.string:desc

Validate the accuracy of the estimation

  1. Get the _count of documents from the current index for the last 1 hour, 2 hours, 3 hours etc., — This type of validation may be misleading if there is an off-hour traffic. Instead of last 1 hour, 2 hours, 3 hours random one hour window can be used, says now-7, now-5, now-1 etc.,
GET opensearch_dashboards_sample_data_logs_00002/_count
{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "timestamp": {
              "gte": "now-1h",
              "lte": "now"
            }
          }
        }
      ]
    }
  }
}
  1. Get the docs.count, pri.store.size of the current index
GET _cat/indices/opensearch_dashboards_sample_data_logs_00002?
h=index,creation.date.string,pri.store.size,pri,docs.count&
format=json
  1. Get the current timestamp
  2. Find the ingestion rate deviation from the original estimate. A deviation is when the primary count estimation changes between the _hot estimation Vs. estimation based on current index Vs estimation based on last 1 hour, 2 hours, 3 hours etc.,
  3. Optionally, Get the pri.store.size of the last rolled-over Index AND/OR last by one rolled-over Index for further analysis on ingestion rate deviation

The worst that can happen

  1. Unpredictable issues???

Incomplete section

Challenges

  1. Find all indices of an index pattern in _hot is currently not a straightforward API call as GET _cat/indices/_hot/<<index_name>> doesn’t work. This needs to be fixed first.
  2. Assumption here is 'timestamp' or any field that is used as Time Field uses current time. Otherwise this won't work
  3. How do we handle data streams ??? May be handle this at Index Management Rollover — https://opensearch.org/docs/latest/dashboards/im-dashboards/rollover/
  4. Not suitable for longer HOT retentions???

Index Management — Create Index

The Create Index page on the OSD can be suggest the primary shard count. Ask the user to input the following information so a primary shard can be suggested to them. They can choose the suggested sharding strategy or overwrite it manually. Goal is to guide the user to adopt the Sharding Strategy best practices.

Side question: Do people really use this to create indices? :-) What can be done for indices created via PUT <<index_name>>. That should be the ultimate goal

  • Type of Index — Search or Timeseries or Vector
  • Number of Data nodes (NO_OF_DATA_NODES) — Auto populate so user can verify it
  • Expected storage volume
  • Index growth rate
  • Expected Search SLA — ???
  • Expected type of searches — Intensive aggregation query usage can influence the optimal shard size
  • Desired shard size — This should be auto populated based on the above input but user can overwrite it
  • When Timeseries is chosen as the type of index, ISM should be suggested here for the user

Quoting AOS best practices documentation — https://docs.aws.amazon.com/opensearch-service/latest/developerguide/sizing-domains.html#bp-storage

  • The total size of the source data plus the index is often 110% of the source, with the index up to 10% of the source data.
  • (Source data + room to grow) * (1 + indexing overhead) / desired shard size = approximate number of primary shards
@dblock
Copy link
Member

dblock commented Aug 26, 2024

[Catch All Triage - 1, 2, 3, 4, 5]

@AmiStrn
Copy link

AmiStrn commented Aug 26, 2024

@aswath86 making any assumptions on the shards count can cause unpredictable issues.
In logging there are sudden spikes, and/or night/day activity patterns of different types (spiking high to a decreasing-saw-tooth pattern is a thing, not just nice wavey patterns). Making predictions would require analyzing the indexing pattern over a longer period and hoping it represents the future. This only works if there is a pattern.

Additionally, if the cluster changes the shards' count every time it sees a spike we may cause a state of high shard count (which may cause instability and can't be undone).

On the flip side - we may reduce shards due to little shipping but when the next spike arrives we would get a hotspot.

I don't think this is a bad idea, but the approach you described is similar to one I tried as well, and there are many manual tweaks that need to be made. Some are done on a daily basis.

A great way to avoid any of the issues you stated is to spread the shards on the total number of nodes. While this solves all those issues it creates another one - too many shards. if we were ok with that then there would be no problem to have a full spread on all the nodes.

@aswath86
Copy link
Author

Making predictions would require analyzing the indexing pattern over a longer period and hoping it represents the future. This only works if there is a pattern. This only works if there is a pattern.

Additionally, if the cluster changes the shards' count every time it sees a spike we may cause a state of high shard count (which may cause instability and can't be undone).

I'm with you on the "This only works if there is a pattern". And I'm not in favour of analyzing an index pattern over a period of time for various reasons (sales season, newly onboarded features producing more logging, etc.,), so analyzing over a period of time would be misleading. We should be only considering the ingestion volume for the last x hours of the data, where x is the HOT retention period. We could also look at last few hours (say 2 hours) ingestion volume, as a means to validate the outcome that is based on the x hours.

In logging there are sudden spikes, and/or night/day activity patterns of different types (spiking high to a decreasing-saw-tooth pattern is a thing, not just nice wavey patterns).

This FR is primarily for an ISM Action which is going to be completely optional, suitable for shorter retention period (eg., 24 hours, 30 hours etc.,) index patterns that has a surge in ingestion pattern for a period of time before the surge disappearing. For example, a sale season where the application starts producing more logs for few days in which case it is not practical to change the sharding strategy manually, each time.

Generally, setting the shard count to a static value is not the ideal solution for a fluctuating ingestion rate. Changing the primary count each time there is a change is workload pattern is also not ideal.

A great way to avoid any of the issues you stated is to spread the shards on the total number of nodes. While this solves all those issues it creates another one - too many shards. if we were ok with that then there would be no problem to have a full spread on all the nodes.

Its usually is straight forward if a cluster has very fewer tenants (handful of index patterns) where all of them are either large volume or have a longer retention period. But this FR is for cases where there are too many tenants in the same cluster, each with varying index volume/size with fluctuating ignition rate. For instance, if you classify the index patterns by volume and you end up with high, medium and low volume indices, then during an surge, a low volume index would become a medium volume index, yet the sharding strategy is unchanged until manually updated. This FR aims to automat the sharding strategy according to the best practices.

making any assumptions on the shards count can cause unpredictable issues.

This FR is primarily geared for multi-tenant cluster with shorter retention period and this Action is to be used in combination with shard size based rollover to prevent shards getting bigger than the ideal size. We could also provide a parameter to overwrite the primary count in case a particular index patter is experiencing significant surge which the OpenSearch Admin is aware of (says application is undergoing stress testing or performance benchmarking and hence is going to produce significantly more logs), in which case they could overwrite the primary count, or better yet, don't use this action.

This feature was successfully implemented via a python script that runs every hour, which updates the primary count of the index for the sub-sequent rollover, on a large cluster that has 200 tenants with varying index volume that has fluctuating ingestion rate. This FR is to offer this feature natively with OpenSearch.

@Jon-AtAWS
Copy link
Member

Making predictions would require analyzing the indexing pattern over a longer period and hoping it represents the future. This only works if there is a pattern. This only works if there is a pattern.

I'm with you on the "This only works if there is a pattern". And I'm not in favour of analyzing an index pattern over a period of time for various reasons (sales season, newly onboarded features producing more logging, etc.,), so analyzing over a period of time would be misleading. We should be only considering the ingestion volume for the last x hours of the data, where x is the HOT retention period. We could also look at last few hours (say 2 hours) ingestion volume, as a means to validate the outcome that is based on the x hours.

We can/should also include guard rails that prevent large changes in primary count. Some change in primary count, with a limited scope, will be better than not adjusting the shard count. Again, it's an optional feature, that users will choose to use by adding to the policy.

Some potential, user-settable parameters

  • time window to evaluate or maybe a weight on the sizing based on time
  • limit on the change in shard count

In logging there are sudden spikes, and/or night/day activity patterns of different types (spiking high to a decreasing-saw-tooth pattern is a thing, not just nice wavey patterns).

This FR is primarily for an ISM Action which is going to be completely optional, suitable for shorter retention period (eg., 24 hours, 30 hours etc.,) index patterns that has a surge in ingestion pattern for a period of time before the surge disappearing. For example, a sale season where the application starts producing more logs for few days in which case it is not practical to change the sharding strategy manually, each time.

+1 optional, and aimed at weekly/seasonal pattern changes. Without this feature, users have to change everything manually.

A great way to avoid any of the issues you stated is to spread the shards on the total number of nodes. While this solves all those issues it creates another one - too many shards. if we were ok with that then there would be no problem to have a full spread on all the nodes.

I wanted to echo @aswath86 comments here. Users with multi-tenant logging clusters have the most problems with these kinds of index size changes. Especially for people managing 10s or 100s of end user groups in a centralized IT department, this FR can bring a ton of value where they don't have to manually configure every change or reach out to their end users to figure out what's changing, whether it's permanent, and what is the ideal primary count.

@AmiStrn
Copy link

AmiStrn commented Aug 28, 2024

This feature was successfully implemented via a python script that runs every hour, which updates the primary count of the index for the sub-sequent rollover, on a large cluster that has 200 tenants with varying index volume that has fluctuating ingestion rate. This FR is to offer this feature natively with OpenSearch.

I don't doubt that. I manage many clusters each with thousands of tenants sending logging data (of varying sharding). And I have implemented features similar to what you are describing in the FR. as external services. The problem that arrises over time in many cases is that there start to be too many shards per node after increasing and decreasing several times, 24 hour retention is rare in logging. This leads to guard rails that make the feature ineffective to avoid manual tampering.

I think my main point here is that the approach taken is external and not making use of information that you can get your hands on when writing code in the project - such as some metric for indexing pressure of a specific tenant.

Another approach for ism could be a set of indices with various shard counts and when pressure goes up you shift the write alias and this way the transition is smooth and have fewer shards in the end since it isn't an "all or nothing" shards increase (can be one size up and only create a standby index for the next available size) then you can later reindex and merge small indices if needed.

Bottom line is - i agree with the requirement for adjustment due to load changes, but the algorithm provided doesn't fit well enough to avoid many manual adjustments on-the-fly on high tenant clusters from my experience, and doesn't use the OpenSearch internals that are available to us when running within the node.

@AmiStrn
Copy link

AmiStrn commented Aug 28, 2024

@aswath86 I have a different approach to suggest:
Rather than have the algorithm added as a feature how about enriching the policy with a way to make whatever shard adjustment rule the user wants? This way people can add the algorithm that works best for them and perhaps other interesting policies can be generated (yours would be the default?).
Wdyt?

@AmiStrn
Copy link

AmiStrn commented Aug 28, 2024

@aswath86 and I will meet in the next week or so and add a summary of our discussion over here. just so we can expedite the conversation - does anyone else wanna join the call? @Jon-AtAWS ?

@Jon-AtAWS
Copy link
Member

Thanks @AmiStrn I'd love to join!

BTW,

enriching the policy with a way to make whatever shard adjustment rule the user wants

Sounds like what we were thinking about, but maybe didn't say so clearly? Let's figure out the details.

J-

@AmiStrn
Copy link

AmiStrn commented Aug 30, 2024

Thanks @AmiStrn I'd love to join!

BTW,

enriching the policy with a way to make whatever shard adjustment rule the user wants

Sounds like what we were thinking about, but maybe didn't say so clearly? Let's figure out the details.

J-

Great, the invite is still open if anyone else wants in:) It's probably going to happen this upcoming week.

@aswath86
Copy link
Author

aswath86 commented Sep 3, 2024

Added on Sep 3rd, 2024. This is also appended to the feature description above

Goals

  1. Nudge users to adopt sharding strategy best practices — Not a lot of them read the best practices documentation? Only Amazon OpenSearch has the sharding strategy best practices and this is not even available in the opensource documentation? Especially for new OpenSearch users.
  2. Reduce storage & shard skewness — Storage skewness will introduce other hardware resource usage skewness, starting with CPU and Memory, disk throttle etc.,
  3. Avoid too small/large shards
  4. Reduce node hot spots

??? means undecided/unsure

ISM — Mutating Rollover

This can either be an Action that is separate from existing Rollover Action which only determines the primary count and sets its at the index template OR an extension to the Rollover Action. Properties for the Actions would be,

  1. Min Index age
  2. Min doc count — not necessary if this is separate Action???
  3. Min Index size — not necessary if this is separate Action???
  4. Min Primary shard size
  5. Default primary shard count — could be the last value OR 5 OR value set by the user???
  6. HOT retention period — The same value used in the warm migration
  7. Alert Channel — When the ingestion rate deviates from the estimation (a deviation is when the primary count estimation changes between the _hot estimation Vs. estimation based on current index Vs estimation based on last 1 hour, 2 hours, 3 hours etc., )
  8. Ingestion rate deviation threshold — ???

Sample implementation

To calculate the ingestion volume for the last 30 hours. Consider all the indices of an Index pattern in _hot.

GET _cat/indices/_hot?v&h=index,creation.date.string,pri.store.size,pri&
s=index:desc,creation.date.string:desc 
  • It was found that 30gb to be an optimal shard size for the given workload (but optimal shard size depends on the workload, usually ranging from 10gb to 50gb)
  • Size based rollover at shard size 30gb
  • 0gb to 30gb would be the general shard size
  • For each index pattern, in the last last 30 hours if total ingested size (gb) is,
    • less than 10gb —> 1p1r sharding strategy —> 0gb to 10 gb would be the shard size range
    • 10gb to 30gb —> 2p1r sharding strategy —> 5gb to 15 gb would be the shard size range
    • 30gb to 60gb —> 3p1r sharding strategy —> 10gb to 20gb would be the shard size range
    • 60gb+ to 30*NO_OF_DATA_NODES gb —> 1p for every 30gb —> approx. 30gb would be the shard size range
    • 30*NO_OF_DATA_NODES+ gb -> (NO_OF_DATA_NODES)p —> approx. 30gb would be the shard size range
    • (optional) 30*(NO_OF_DATA_NODES/2) -> (NO_OF_DATA_NODES/2)p —> approx. 30gb would be the shard size range

Estimating the ingestion rate

Get the sum of primary store size of all the indices of an index pattern. The HOT retention period is already known (let’s assume ‘x’ hours is HOT retention period). Estimate the ingestion volume for the last x hours. Look at the sample implementation (above) for estimating the primary shard count

GET _cat/indices/_hot?v&h=index,creation.date.string,pri.store.size,pri&
s=index:desc,creation.date.string:desc

Validate the accuracy of the estimation

  1. Get the _count of documents from the current index for the last 1 hour, 2 hours, 3 hours etc., — This type of validation may be misleading if there is an off-hour traffic. Instead of last 1 hour, 2 hours, 3 hours random one hour window can be used, says now-7, now-5, now-1 etc.,
GET opensearch_dashboards_sample_data_logs_00002/_count
{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "timestamp": {
              "gte": "now-1h",
              "lte": "now"
            }
          }
        }
      ]
    }
  }
}
  1. Get the docs.count, pri.store.size of the current index
GET _cat/indices/opensearch_dashboards_sample_data_logs_00002?
h=index,creation.date.string,pri.store.size,pri,docs.count&
format=json
  1. Get the current timestamp
  2. Find the ingestion rate deviation from the original estimate. A deviation is when the primary count estimation changes between the _hot estimation Vs. estimation based on current index Vs estimation based on last 1 hour, 2 hours, 3 hours etc.,
  3. Optionally, Get the pri.store.size of the last rolled-over Index AND/OR last by one rolled-over Index for further analysis on ingestion rate deviation

The worst that can happen

  1. Unpredictable issues???

Incomplete section

Challenges

  1. Find all indices of an index pattern in _hot is currently not a straightforward API call as GET _cat/indices/_hot/<<index_name>> doesn’t work. This needs to be fixed first.
  2. Assumption here is 'timestamp' or any field that is used as Time Field uses current time. Otherwise this won't work
  3. How do we handle data streams ??? May be handle this at Index Management Rollover — https://opensearch.org/docs/latest/dashboards/im-dashboards/rollover/
  4. Not suitable for longer HOT retentions???

Index Management — Create Index

The Create Index page on the OSD can be suggest the primary shard count. Ask the user to input the following information so a primary shard can be suggested to them. They can choose the suggested sharding strategy or overwrite it manually. Goal is to guide the user to adopt the Sharding Strategy best practices.

Side question: Do people really use this to create indices? :-) What can be done for indices created via PUT <<index_name>>. That should be the ultimate goal

  • Type of Index — Search or Timeseries or Vector
  • Number of Data nodes (NO_OF_DATA_NODES) — Auto populate so user can verify it
  • Expected storage volume
  • Index growth rate
  • Expected Search SLA — ???
  • Expected type of searches — Intensive aggregation query usage can influence the optimal shard size
  • Desired shard size — This should be auto populated based on the above input but user can overwrite it
  • When Timeseries is chosen as the type of index, ISM should be suggested here for the user

Quoting AOS best practices documentation — https://docs.aws.amazon.com/opensearch-service/latest/developerguide/sizing-domains.html#bp-storage

  • The total size of the source data plus the index is often 110% of the source, with the index up to 10% of the source data.
  • (Source data + room to grow) * (1 + indexing overhead) / desired shard size = approximate number of primary shards

@AmiStrn
Copy link

AmiStrn commented Sep 3, 2024

@aswath86 Thanks! can you add that to the feature description above?
if you want to clarify that it is added later you can append it after adding a line and writing [Edit Sep 3rd:] before the new part. but it is up to you. Im not sure we have guidelines for editing the RFC as the conversation continues.

@AmiStrn
Copy link

AmiStrn commented Sep 3, 2024

Thanks for the meeting @aswath86 @Jon-AtAWS and Robert (what is your github handle?)

Summary:

  • I am not opposed to the RFC, as we agree that the project should have a way for users to get a better out-of-the-box experience (adding the ease-of-use label is a good idea)
  • The solution should provide users with controls over an automated policy, with the flexibility to tailor it to their use cases
  • The solution does not completely solve hotspots, but provides a better starting point and provide better experience to first time users and if they suffer from hotspots this will provide some basic instrumentation to solve some/all of it per their use case.

@Jon-AtAWS
Copy link
Member

Thanks all!

One more quick thought from me... the feature should work in the data node count as part of the guardrails, and attempt to make changes that are congruent with that node count. This will also help with hot spots

@486
Copy link

486 commented Sep 12, 2024

Thanks for the meeting @aswath86 @Jon-AtAWS and Robert (what is your github handle?)

I am @486 - thanks again for the discussion @AmiStrn , looking forward to the next steps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New request
Projects
None yet
Development

No branches or pull requests

5 participants