Skip to content

Conversation

@ajleong623
Copy link

Description

This fixes the bug where a null pointer exception occurs when a sort is run on geo point fields.

Issues Resolved

#5503

Check List

  • New functionality includes testing
  • New functionality has been documented
  • New Roles/Permissions have a corresponding security dashboards plugin PR
  • API changes companion pull request created
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

ajleong623 and others added 5 commits July 22, 2025 19:33
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
…5494)

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
…rch-project#5495)

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Craig Perkins <cwperx@amazon.com>
Co-authored-by: Shikhar Jain <8859327+shikharj05@users.noreply.github.com>
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
Signed-off-by: Finn Carroll <carrofin@amazon.com>
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
@cwperks
Copy link
Member

cwperks commented Jul 23, 2025

Thank you for the deep dive and PR @ajleong623 ! It would be great to have an automated test to validate the fix.

@codecov
Copy link

codecov bot commented Jul 23, 2025

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 72.73%. Comparing base (a581bc6) to head (0319f4f).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...earch/security/privileges/dlsfls/FieldMasking.java 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5504      +/-   ##
==========================================
- Coverage   72.76%   72.73%   -0.04%     
==========================================
  Files         397      397              
  Lines       24591    24591              
  Branches     3741     3741              
==========================================
- Hits        17894    17886       -8     
- Misses       4867     4873       +6     
- Partials     1830     1832       +2     
Files with missing lines Coverage Δ
...earch/security/privileges/dlsfls/FieldMasking.java 86.47% <0.00%> (-0.59%) ⬇️

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@nibix
Copy link
Collaborator

nibix commented Jul 23, 2025

Thank you for the quick fix! An automated test would be cool, indeed. Let me know if you need any pointers.

Also, this PR seems to contain some unrelated commits:

Bildschirmfoto 2025-07-23 um 05 50 11

Could you double check that?

@ajleong623
Copy link
Author

@nibix I think those unrelated commits might be from trying to merge the changes from the main into my own repository since my repository was a bit out of date. I can change those back so that only the null pointer catch is included.

Also, I would have to create an integration test with the security plugin configured to reproduce the error.

@cwperks
Copy link
Member

cwperks commented Jul 23, 2025

Also, I would have to create an integration test with the security plugin configured to reproduce the error.

This repo is already equipped for that. Take a look at FlsFlmIntegrationTest as an example.

You can run the whole suite with ./gradlew :integrationTest --tests FlsFmIntegrationTests

@cwperks
Copy link
Member

cwperks commented Jul 23, 2025

@ajleong623 can you check the failing tests? There may need to be some additional null handling from the callers of stripKeywordSuffix

Copy link
Member

@DarshitChanpura DarshitChanpura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajleong623 The change looks good. Mind adding automated tests to validate the fix?

@ajleong623
Copy link
Author

@cwperks Thank you for the example. It looks like it has all the methods I would need to reproduce the error.

@ajleong623
Copy link
Author

@cwperks On second look, it seems like there is already a TestData.java with a predefined mapping, but I wanted to make my own mapping for the index. Do I have to use that framework? All I need is to insert one document, one mapping, and perform one search on that.

Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
@ajleong623
Copy link
Author

ajleong623 commented Jul 24, 2025

@cwperks It seems like the tests that failed worked on my end, so I reran them.

@cwperks
Copy link
Member

cwperks commented Jul 24, 2025

@cwperks It seems like the tests that failed worked on my end.

They may have been flaky (though the tests in this repo are quite stable). I just approved the checks to run again.

Yes you can create a separate test suite with its own test data. Use the suite above as an example which shows setup and teardown of an inmemory cluster (LocalCluster), creating indices, creating users with roles mappings, ingesting data and querying.

@cwperks
Copy link
Member

cwperks commented Jul 24, 2025

@ajleong623 Can we also add a null check in here?

Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
@ajleong623
Copy link
Author

ajleong623 commented Jul 24, 2025

For some reason, after my new test, I am not sure if the point is hit. There was an additional step when I reproduced the issue locally of logging into the node and going through the ssl configurations. I will look into replicating that process later on.

@nibix
Copy link
Collaborator

nibix commented Jul 24, 2025

@ajleong623

Thank you for creating the test!

For some reason, after my new test, I am not sure if the point is hit. There was an additional step when I reproduced the issue locally of logging into the node and going through the ssl configurations. I will look into replicating that process later on.

Could you ever get ahold of a full stack trace of the NullPointerException? To be honest, for me it is surprising that field could be null in any situation. I believe it would be helpful to understand when it will be null.

@ajleong623
Copy link
Author

@nibix

Caused by: java.lang.NullPointerException: Cannot invoke "String.endsWith(String)" because "field" is null
        at org.opensearch.security.privileges.dlsfls.FieldPrivileges$FlsRule.stripKeywordSuffix(FieldPrivileges.java:317) ~[main/:?]
        at org.opensearch.security.privileges.dlsfls.FieldPrivileges$FlsRule.isAllowedRecursive(FieldPrivileges.java:234) ~[main/:?]
        at org.opensearch.security.configuration.DlsFlsFilterLeafReader.isAllowed(DlsFlsFilterLeafReader.java:408) ~[main/:?]
        at org.opensearch.security.configuration.DlsFlsFilterLeafReader.getPointValues(DlsFlsFilterLeafReader.java:689) ~[main/:?]
        at org.opensearch.search.internal.ExitableDirectoryReader$ExitableLeafReader.getPointValues(ExitableDirectoryReader.java:111) ~[opensearch-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
        at org.apache.lucene.search.comparators.NumericComparator$NumericLeafComparator.<init>(NumericComparator.java:123) ~[lucene-core-10.2.2.jar:10.2.2 279eb7aaafe985e5d0552b7f2a10b63185a3f893 - 2025-06-17 09:30:59]
        at org.apache.lucene.search.comparators.DoubleComparator$DoubleLeafComparator.<init>(DoubleComparator.java:76) ~[lucene-core-10.2.2.jar:10.2.2 279eb7aaafe985e5d0552b7f2a10b63185a3f893 - 2025-06-17 09:30:59]
        at org.opensearch.search.sort.GeoDistanceSortBuilder$1$1$1.<init>(GeoDistanceSortBuilder.java:742) ~[opensearch-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
        at org.opensearch.search.sort.GeoDistanceSortBuilder$1$1.getLeafComparator(GeoDistanceSortBuilder.java:742) ~[opensearch-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
        at org.apache.lucene.search.FieldValueHitQueue.getComparators(FieldValueHitQueue.java:181) ~[lucene-core-10.2.2.jar:10.2.2 279eb7aaafe985e5d0552b7f2a10b63185a3f893 - 2025-06-17 09:30:59]
        at org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector.<init>(TopFieldCollector.java:62) ~[lucene-core-10.2.2.jar:10.2.2 279eb7aaafe985e5d0552b7f2a10b63185a3f893 - 2025-06-17 09:30:59]
        at org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector$1.<init>(TopFieldCollector.java:196) ~[lucene-core-10.2.2.jar:10.2.2 279eb7aaafe985e5d0552b7f2a10b63185a3f893 - 2025-06-17 09:30:59]
        at org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector.getLeafCollector(TopFieldCollector.java:195) ~[lucene-core-10.2.2.jar:10.2.2 279eb7aaafe985e5d0552b7f2a10b63185a3f893 - 2025-06-17 09:30:59]
        at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:344) ~[opensearch-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
        at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:309) ~[opensearch-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
        at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:273) ~[opensearch-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
        at org.opensearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:356) ~[opensearch-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
        at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:491) ~[opensearch-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
        at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:450) ~[opensearch-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
        at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWith(QueryPhase.java:433) ~[opensearch-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
        at org.opensearch.search.query.QueryPhaseSearcherWrapper.searchWith(QueryPhaseSearcherWrapper.java:60) ~[opensearch-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
        at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:283) ~[opensearch-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
        at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:156) ~[opensearch-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
        at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:669) ~[opensearch-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
        at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:733) ~[opensearch-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
        at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:702) ~[opensearch-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
        at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) [opensearch-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
        at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) ~[opensearch-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
        at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) [opensearch-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
        at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) [opensearch-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:975) [opensearch-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]

The was from following the steps in this video https://www.youtube.com/watch?v=G8ADRRF4IPQ, then creating an index with a mapping of

{
    "mappings": {
        "properties": {
            "admin1": {
                "type": "keyword"
            },
            "admin2": {
                "type": "keyword"
            },
            "admin3": {
                "type": "keyword"
            },
            "admin4": {
                "type": "keyword"
            },
            "coordinates": {
                "type": "geo_point"
            },
            "countryCode": {
                "type": "keyword"
            },
            "elevation": {
                "type": "long",
                "index": false
            },
            "featureClass": {
                "type": "keyword"
            },
            "featureCode": {
                "type": "keyword"
            },
            "id": {
                "type": "long"
            },
            "population": {
                "type": "long"
            },
            "timezone": {
                "type": "text",
                "index": false
            }
        }
    }
}

Then create a document with

{
    "admin1": "11",
    "admin2": "75",
    "admin3": "751",
    "admin4": "75056",
    "coordinates": {
        "lat": 48.8331,
        "lon": 2.3264
    },
    "countryCode": "FR",
    "elevation": 0,
    "featureClass": "A",
    "featureCode": "ADM5",
    "id": 6618620,
    "population": 137105,
    "timezone": "Europe/Paris"
}

Finally search with a post request with body

{
    "query": {
        "bool": {
            "filter": {
                "exists": {
                    "field": "coordinates"
                }
            }
        }
    },
    "sort": [
        {
            "_geo_distance": {
                "coordinates": {
                    "lat": 40.7128,
                    "lon": -74.0060
                },
                "ignore_unmapped": true,
                "order": "desc",
                "unit": "km"
            }
        }
    ],
    "size": 4
}

@nibix
Copy link
Collaborator

nibix commented Jul 25, 2025

Thank you @ajleong623 , that stacktrace is very helpful!

I am wondering whether this is actually a bug in the GeoDistanceSortBuilder, which is doing the calls towards the leaf reader with a field being null:

See here: https://github.com/opensearch-project/OpenSearch/blob/cdbb76e5f65ab51ca28a075f2d95c518c74747ac/server/src/main/java/org/opensearch/search/sort/GeoDistanceSortBuilder.java#L738-L750

            public FieldComparator<?> newComparator(String fieldname, int numHits, Pruning pruning, boolean reversed) {
                return new DoubleComparator(numHits, null, null, reversed, filterPruning(pruning)) {
                    @Override
                    public LeafFieldComparator getLeafComparator(LeafReaderContext context) throws IOException {
                        return new DoubleLeafComparator(context) {
                            @Override
                            protected NumericDocValues getNumericDocValues(LeafReaderContext context, String field) throws IOException {
                                return getNumericDoubleValues(context).getRawDoubleValues();
                            }
                        };
                    }
                };
            }

This code passes null as value for the field param of DoubleComparator. The JavaDoc of NumericComparator (the base class of DoubleComparator) says about the field param:

Parameter {code field} provided in the constructor is used as a field name in the default implementations of the methods {code getNumericDocValues} and {code getPointValues} to retrieve doc values and points. You can pass a dummy value for a field name (e.g. when sorting by script), but in this case you must override both of these methods.

I would presume null to be a dummy value.

However, looking at the code above, we see that only getNumericDocValues() is overridden. The method getPointValues() is not overriden. Thus, the getPointValues() calls are forwarded to the leaf reader implementations. It seems that the Lucene implementations are always returning null for null field attributes.

This would mean that getPointValues() is always null in this behavior.

I have not tried so far to deeply understand the code of NumericComparator, but this seems to render the functionality of the pruning parameter ineffective, as the pointValues would be needed for that.

So, is this possibly a bug in core?

@ajleong623
Copy link
Author

@nibix Yes. That makes a lot of sense. I can make the request and try stepping through the geo point sort builder to see when it calls getLeafComparator. It only showed up in security because of the way the leaf reader was wrapped. I will make an issue on it tomorrow after I confirm. Thank you so much for looking into that.

@ajleong623
Copy link
Author

ajleong623 commented Jul 26, 2025

@nibix I have linked a pull request in core opensearch-project/OpenSearch#18843. I did a brief trial run by setting up the OpenSearch request as described when reducing the stack trace, and the outputs seem valid.

@cwperks
Copy link
Member

cwperks commented Aug 4, 2025

I don't think this issue is limited to geopoint. See: opensearch-project/OpenSearch-Dashboards#9873

@cwperks
Copy link
Member

cwperks commented Aug 4, 2025

@nibix FYI when Show Missing Values is enabled on a dashboards visualization, it does a bucket agg with "missing": "__missing__" so I don't think this is just geopoint related

"aggs": {
  "by_status": {
    "terms": {
      "field": "status.keyword",
      "size": 5,
      "order": { "_count": "desc" },
      "missing": "__missing__"
    }
  }
}

@ajleong623
Copy link
Author

ajleong623 commented Aug 4, 2025

@cwperks it looks like the request flow also went through ExitableLeafReader which is an entry point into the security plugin. I think that’s based there should be a null checker whenever a field ends up being validated in the plugin. I will make a separate issue for those cases. Or would it be easier to add the check in security?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants