-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add index level setting to unmap fields beyond total fields limit #14939
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #14939 +/- ##
============================================
- Coverage 71.95% 71.95% -0.01%
+ Complexity 63128 63092 -36
============================================
Files 5196 5196
Lines 295311 295357 +46
Branches 42677 42683 +6
============================================
+ Hits 212496 212528 +32
+ Misses 65436 65380 -56
- Partials 17379 17449 +70 ☔ View full report in Codecov by Sentry. |
server/src/main/java/org/opensearch/index/mapper/MappingLookup.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Signed-off-by: Gao Binlong <gbinlong@amazon.com>
553f304
to
814f715
Compare
❌ Gradle check result for 553f304: ABORTED Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Gao Binlong <gbinlong@amazon.com>
❕ Gradle check result for 456e8ba: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Signed-off-by: Gao Binlong <gbinlong@amazon.com>
❌ Gradle check result for 9495dae: ABORTED Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Gao Binlong <gbinlong@amazon.com>
❕ Gradle check result for 9f15262: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
This PR is stalled because it has been open for 30 days with no activity. |
@vikasvb90 @andrross @reta ,could you help to review this PR? Thank you! |
Thanks @gaobinlong , I would strongly advocate to use the existing means (primarily processors or pre-processing before sending for ingestion) instead of adding a new setting. I think this setting is confusing and also defeats the purpose of these limits in first place:
In my opinion, we should be very explicit of what to drop or keep, and it should be always driven by user / client intent. We have knobs / instruments for that. |
Thanks @reta , we cannot achieve that in the client side or by ingest processor, because what the users want is to keep the fields in the document but not index them, rather than dropping any fields.
We already have this feature, when set
We don't drop any fields, only |
Thanks @gaobinlong
Oh, let me look at it closely, but to double check: we store new detected fields in
It does, depending which document shape is going to come first to form the mapping. For example, let us assume we have a 1000 limit and the first document have this order:
But what if the order it reversed?
The problem I am trying to convey is that this setting introduces unpredictable behaviour (in some important cases) where the decision what to take in or out is not controllable. |
Yes, if
The document comes first will form the mapping, that is the existing behavior, if the total fields limit is 1000, then for the document with fields |
👍 got it, thanks!
This is exactly the problem. The issue is not about succeeding or not, the issue is what forms the mapping. The current behaviour is clear - either all fields are in the mapping or document is rejected. With this change - what ever comes first is in the mapping, everything beyond the limit is just ignored. @andrross @msfroh wondering if you folks have an opinion on this change. |
@reta I agree this is a concern. Rolling over indexes is a super common pattern in log analytics. With this behavior, then a user may see different field mappings on every index rollover because it may be more or less random which 1000 unique fields are first observed. |
Description
This PR introduces a new index level setting
index.mapping.total_fields.unmap_fields_beyond_limit
, the default value is false, if it's set to true, then all the new detected fields beyond theindex.mapping.total_fields.limit
will not be added to the mapping, not indexed, not searchable but still be stored in the_source
field.Example usage:
, we can see that the fields
field3
,field4
andfield5
are not in the mapping because the total fields limit is 3.Related Issues
#14031
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.