Profile the fetch phase #77064

nik9000 · 2021-08-31T13:25:15Z

This adds profiling to the fetch phase so we can tell when fetching is
slower than we'd like and we can tell which portion of the fetch is
slow. The output includes which stored fields were loaded, how long it
took to load stored fields, which fetch sub-phases were run, and how
long those fetch sub-phases took.

Closes #75892

nik9000

I implemented this by having the fetch phase optionally profile itself, attaching the profile results to the per-shard fetch results. I then merge those fetch results into the "search phase" profiling that we have been collecting for years. The rendered json of the profile result will only include a fetch section on shards that performed a fetch.

...gin/src/main/java/org/elasticsearch/plugin/noop/action/search/TransportNoopSearchAction.java

nik9000 · 2021-08-31T13:26:50Z

docs/reference/search/profile.asciidoc

+        "aggregations": [],
+        "fetch": {
+          "type": "fetch",
+          "description": "fetch",


This type and description come from my using a raw ProfileResult for the entire fetch phase. I'm not happy about them, but I wasn't sure if it was worth writing a whole top level class for the result or working around them some other way.

nik9000 · 2021-08-31T13:27:49Z

docs/reference/search/profile.asciidoc

+        "fetch": {
+          "type": "fetch",
+          "description": "fetch",
+          "time_in_nanos": 660555,


This is the actual nanoTime for the entire fetch phase rather than the sum of the breakdown. Because we can do that in fetch. We couldn't really do that for aggs or queries.

nik9000 · 2021-08-31T13:32:22Z

docs/reference/search/profile.asciidoc

+            "load_stored_fields_count": 5
+          },
+          "debug": {
+            "stored_fields": ["_id", "_routing", "_source"]


This bit right here is important to me. Some folks want to only fetch doc values and avoid stored fields entirely. With this PR you can profile the query and check that this array is empty. If it is then we skipped loading doc values. Unless something tricky is happening in the nested case that I didn't dig into.

I like the debug section too!

nik9000 · 2021-08-31T13:32:43Z

docs/reference/search/profile.asciidoc

+          "children": [
+            {
+              "type": "source",
+              "description": "load _source",


We get one of these per fetch sub phase we run.

nik9000 · 2021-08-31T13:37:08Z

.../percolator/src/main/java/org/elasticsearch/percolator/PercolatorHighlightSubFetchPhase.java

+    @Override
+    public String name() {
+        return "percolator_highlight";
+    }


These power the type and description fields on the children field in the fetch profile.

nik9000 · 2021-08-31T13:39:53Z

server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java

            return;
        }

+        Profiler profiler = context.getProfilers() == null ? Profiler.NOOP : context.getProfilers().getFetchProfiler();


I wasn't super happy with how invasive the change was here. I'm happy that the sub-phases for the most part don't have to change which is good, but this bit is bigger than I'd like. I'd love any suggestions if anyone has any.

nik9000 · 2021-08-31T13:40:48Z

server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java

@@ -428,4 +475,57 @@ private static void fillDocAndMetaFields(SearchContext context, FieldsVisitor fi
    static boolean hasSequentialDocs(DocIdToIndex[] docs) {
        return docs.length > 0 && docs[docs.length-1].docId - docs[0].docId == docs.length - 1;
    }
+
+    interface Profiler {


I went with the internal interface and noop implementation to save from a bunch of null checks that'd have made the whole thing even more invasive. I'd be more than happy to get rid of them if anyone has any good ideas.

nik9000 · 2021-08-31T13:41:25Z

server/src/main/java/org/elasticsearch/search/fetch/subphase/FetchSourcePhase.java

-    @SuppressWarnings("unchecked")
-    private void hitExecute(FetchSourceContext fetchSourceContext, HitContext hitContext) {
+            @SuppressWarnings("unchecked")
+            private void hitExecute(FetchSourceContext fetchSourceContext, HitContext hitContext) {


Pulled into the phase object so we could collect debugging counters.

nik9000 · 2021-08-31T13:42:53Z

server/src/main/java/org/elasticsearch/search/fetch/subphase/ScriptFieldsPhase.java


    @Override
    public FetchSubPhaseProcessor getProcessor(FetchContext context) {
-        if (context.scriptFields() == null) {
+        if (context.scriptFields() == null || context.scriptFields().fields().isEmpty()) {


When you don't ask for any script fields we tend to get an empty list here, not null. That empty list was causing us to always return the phase and run it. That's not a big deal at all because it just iterates an empty list, but it was clogging up the profile output.

This adds profiling to the fetch phase so we can tell when fetching is slower than we'd like and we can tell which portion of the fetch is slow. The output includes which stored fields were loaded, how long it took to load stored fields, which fetch sub-phases were run, and how long those fetch sub-phases took. Closes elastic#75892

nik9000 · 2021-08-31T16:33:35Z

** elasticsearch-ci/part-1 ** — Build finished.

I can't reproduce this failure. Its in the CCS dueling tests. Both look like they fetch the same 8 documents but they fetch them from different shards: https://gist.github.com/nik9000/809fa7865e3d5b1c7dd50e96b89e3452

nik9000 · 2021-08-31T16:36:50Z

I can't reproduce this failure. Its in the CCS dueling tests. Both look like they fetch the same 8 documents but they fetch them from different shards: https://gist.github.com/nik9000/809fa7865e3d5b1c7dd50e96b89e3452

Nope. I was on the wrong branch. I can reproduce. Checking.

nik9000 · 2021-08-31T17:27:23Z

Nope. I was on the wrong branch. I can reproduce. Checking.

OK! It's because the test assumed that the entire search result would be the same when searching in minimize round trip mode vs non-minimized mode. If you minimize the round trips you will perform more fetches so, of course, there will be more fetches.

nik9000 · 2021-08-31T18:49:09Z

So folks don't have to dig through the docs, this is what the fetch profile looks like:

        "fetch": {
          "type": "fetch",
          "description": "fetch",
          "time_in_nanos": 660555,
          "breakdown": {
            "next_reader": 7292,
            "next_reader_count": 1,
            "load_stored_fields": 299325,
            "load_stored_fields_count": 5
          },
          "debug": {
            "stored_fields": ["_id", "_routing", "_source"]
          },
          "children": [
            {
              "type": "source",
              "description": "load _source",
              "time_in_nanos": 20443,
              "breakdown": {
                "next_reader": 745,
                "next_reader_count": 1,
                "process": 19698,
                "process_count": 5
              },
              "debug": {
                "loaded_nested": 0,
                "fast_path": 5
              }
            }
          ]
        }

nik9000 · 2021-08-31T19:07:29Z

I've found a few more unit tests I'd like to write. Incoming.

nik9000 · 2021-08-31T21:15:53Z

I've found a few more unit tests I'd like to write. Incoming.

500 lines of tests..... The profile stuff didn't have any round trip unit tests so I added a bunch. Those are useful to catch serialization issues super early.

elasticmachine · 2021-08-31T21:20:55Z

Pinging @elastic/es-search (Team:Search)

jtibshirani

It is so close! I had one last comment.

jtibshirani · 2021-09-10T16:42:02Z

server/src/test/java/org/elasticsearch/search/profile/SearchProfileQueryPhaseResultsTests.java

+import static org.hamcrest.Matchers.matchesPattern;
+import static org.hamcrest.Matchers.nullValue;
+
+public class SearchProfileQueryPhaseResultsTests extends ESTestCase {


I think this should be named SearchProfileResultsBuilderTests now.

jtibshirani · 2021-09-10T16:53:07Z

server/src/main/java/org/elasticsearch/search/profile/SearchProfileResultsBuilder.java

+ * Profile results for the query phase run on all shards.
+ */
+public class SearchProfileResultsBuilder {
+    public static SearchProfileResultsBuilder build(Map<String, SearchProfileQueryPhaseResult> queryPhaseResults) {


I am a little confused about this part -- probably related to naming. Could we just have a constructor new SearchProfileResultsBuilder(Map<String, SearchProfileQueryPhaseResult>) instead of this static method? I'm not sure why we need the special NOT_PROFILING object instead of just passing null in the original place in SearchPhaseController.

I was trying to get the null check out of search phase controller so it was easy to unit test without all of SearchPhaseController. This way there isn't a null check. I liked that the old code didn't allocate anything if we didn't profile and wanted to keep that too, but have it here.

Another idea is to make the build method static and have it take a nullable builder param. Or just revert the whole change and keep the null check in SearchPhaseController. I just, like, didn't like it there. I'm happy anytime I can get things out of SearchPhaseController.

Since this change has so many "moving parts" already, I'd be in favor of keeping the old strategy to keep it well-scoped. I also find it easier to understand. If we want we could follow-up with a refactor to simplify SearchPhaseController a bit.

Yeah. I think it was clearer that way but I'm not the only one that has to read it. I've changed it the way you asked.

jtibshirani

I was wrong about that being the last comment. I have a few more small ones.

jtibshirani · 2021-09-10T17:15:16Z

server/src/main/java/org/elasticsearch/search/fetch/FetchSubPhaseProcessor.java

+    /**
+     * Profiles from child-phases.
+     */
+    default List<ProfileResult> childProfiles() {


No current subprocessor overrides this, could we remove until a follow-up where we implement profiling for inner_hits subphases?

jtibshirani · 2021-09-10T17:17:30Z

server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java

        try {
            List<FetchSubPhaseProcessor> processors = new ArrayList<>();
            for (FetchSubPhase fsp : fetchSubPhases) {
                FetchSubPhaseProcessor processor = fsp.getProcessor(context);
                if (processor != null) {
-                    processors.add(processor);
+                    String type = fsp.getClass().getSimpleName().replaceAll("^Fetch", "").replaceAll("(FetchSub)?Phase$", "");


Personally I think it's nice to simply include the class name without replacements. It's still very readable and makes it easier to find the corresponding code for the phase.

jtibshirani · 2021-09-10T17:29:34Z

server/src/main/java/org/elasticsearch/search/profile/SearchProfileResultsBuilder.java

+ * Profile results for the query phase run on all shards.
+ */
+public class SearchProfileResultsBuilder {
+    public static SearchProfileResultsBuilder build(Map<String, SearchProfileQueryPhaseResult> queryPhaseResults) {


Since this change has so many "moving parts" already, I'd be in favor of keeping the old strategy to keep it well-scoped. I also find it easier to understand. If we want we could follow-up with a refactor to simplify SearchPhaseController a bit.

nik9000 · 2021-09-10T18:04:56Z

Sure! That's cool. I'll go back to the null check method, remove the name munging, and blast the leftover for child phases. I just missed it last time.

…

On Fri, Sep 10, 2021, 13:30 Julie Tibshirani ***@***.***> wrote: ***@***.**** commented on this pull request. I was wrong about that being the last comment. I have a few more small ones. ------------------------------ In server/src/main/java/org/elasticsearch/search/fetch/FetchSubPhaseProcessor.java <#77064 (comment)> : > @@ -28,4 +31,18 @@ */ void process(HitContext hitContext) throws IOException; + /** + * Called when profiling after processing all documents to get any extra + * debug information the phase collected. + */ + default Map<String, Object> getDebugInfo() { + return null; + } + + /** + * Profiles from child-phases. + */ + default List<ProfileResult> childProfiles() { No current subprocessor overrides this, could we remove until a follow-up where we implement profiling for inner_hits subphases? ------------------------------ In server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java <#77064 (comment)> : > try { List<FetchSubPhaseProcessor> processors = new ArrayList<>(); for (FetchSubPhase fsp : fetchSubPhases) { FetchSubPhaseProcessor processor = fsp.getProcessor(context); if (processor != null) { - processors.add(processor); + String type = fsp.getClass().getSimpleName().replaceAll("^Fetch", "").replaceAll("(FetchSub)?Phase$", ""); Personally I think it's nice to simply include the class name without replacements. It's still very readable and makes it easier to find the corresponding code for the phase. ------------------------------ In server/src/main/java/org/elasticsearch/search/profile/SearchProfileResultsBuilder.java <#77064 (comment)> : > + +package org.elasticsearch.search.profile; + +import org.elasticsearch.search.SearchPhaseResult; +import org.elasticsearch.search.fetch.FetchSearchResult; + +import java.util.Collection; +import java.util.Collections; +import java.util.HashMap; +import java.util.Map; + +/** + * Profile results for the query phase run on all shards. + */ +public class SearchProfileResultsBuilder { + public static SearchProfileResultsBuilder build(Map<String, SearchProfileQueryPhaseResult> queryPhaseResults) { Since this change has so many "moving parts" already, I'd be in favor of keeping the old strategy to keep it well-scoped. I also find it easier to understand. If we want we could follow-up with a refactor to simplify SearchPhaseController a bit. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#77064 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABUXIU5N3HB3ODBAS2RE7TUBI6D5ANCNFSM5DEDX3OQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

nik9000 · 2021-09-10T19:48:40Z

OK! Back to you @jtibshirani.

jtibshirani

Looks good to me!

This adds profiling to the fetch phase so we can tell when fetching is slower than we'd like and we can tell which portion of the fetch is slow. The output includes which stored fields were loaded, how long it took to load stored fields, which fetch sub-phases were run, and how long those fetch sub-phases took. Closes elastic#75892

In elastic#77064 I added some more tests for profiling. It fails when a `rarely()` block hits - roughly 2% of the time. This should fix it.

In #77064 I added some more tests for profiling. It fails when a `rarely()` block hits - roughly 2% of the time. This should fix it.

This adds profiling to the fetch phase so we can tell when fetching is slower than we'd like and we can tell which portion of the fetch is slow. The output includes which stored fields were loaded, how long it took to load stored fields, which fetch sub-phases were run, and how long those fetch sub-phases took. Closes #75892

Now that elastic#77064 is backported we can run the bwc tests against it. And we can speak to 7.16 nodes using the wire protocol addedin elastic#77064.

Now that #77064 is backported we can run the bwc tests against it. And we can speak to 7.16 nodes using the wire protocol addedin #77064.

nik9000 · 2021-09-13T20:23:48Z

In and backported! Exciting!

javanna · 2022-04-19T12:20:45Z

Heya, shall we update the limitations from the profile docs page now that fetch can be profiled too? https://www.elastic.co/guide/en/elasticsearch/reference/current/search-profile.html#profile-limitations

nik9000 · 2022-06-06T19:02:23Z

Heya, shall we update the limitations from the profile docs page now that fetch can be profiled too? https://www.elastic.co/guide/en/elasticsearch/reference/current/search-profile.html#profile-limitations

I have no idea when, but someone's removed that "limitation".

nik9000 added >feature :Search/Search Search-related issues that do not fall into other categories v8.0.0 v7.16.0 labels Aug 31, 2021

nik9000 commented Aug 31, 2021

View reviewed changes

nik9000 force-pushed the profile_fetch_simple branch from faf97a9 to 2c4803d Compare August 31, 2021 14:00

nik9000 added 2 commits August 31, 2021 10:03

Skip bwc

2c4803d

nik9000 added 4 commits August 31, 2021 13:40

Don't compare fetch profiles

22f698f

Merge branch 'master' into profile_fetch_simple

97d4111

Use passed one

860764f

no npe

e764344

nik9000 added 2 commits August 31, 2021 14:50

Do last rename

ffbad75

Move method down

a4fea75

nik9000 added 4 commits August 31, 2021 16:45

serialization tests

ae9373a

Fix sneaky serialization

e38d269

Test for sneaky bug

680226c

license header

5ee43ca

nik9000 marked this pull request as ready for review August 31, 2021 21:20

elasticmachine added the Team:Search Meta label for search team label Aug 31, 2021

nik9000 added 3 commits September 1, 2021 10:24

Merge branch 'master' into profile_fetch_simple

86ca46a

Document

ebe2417

Fix test

0c3b0a0

jtibshirani reviewed Sep 10, 2021

View reviewed changes

Rename

b452a29

jtibshirani reviewed Sep 10, 2021

View reviewed changes

nik9000 added 2 commits September 10, 2021 14:22

Remove funny builder

1235c88

Remove name munging

c9b8d62

jtibshirani approved these changes Sep 10, 2021

View reviewed changes

jtibshirani mentioned this pull request Sep 10, 2021

Add operations to cover search fetch performance elastic/rally-tracks#199

Open

nik9000 merged commit c2c0165 into elastic:master Sep 13, 2021

nik9000 added the backport pending label Sep 13, 2021

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Sep 13, 2021

Fix profile test

9869f40

In elastic#77064 I added some more tests for profiling. It fails when a `rarely()` block hits - roughly 2% of the time. This should fix it.

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Sep 13, 2021

Fix profile test

4dce66b

In elastic#77064 I added some more tests for profiling. It fails when a `rarely()` block hits - roughly 2% of the time. This should fix it.

nik9000 mentioned this pull request Sep 13, 2021

Fix profile test #77643

Merged

nik9000 added a commit that referenced this pull request Sep 13, 2021

Fix profile test (#77643)

e522890

In #77064 I added some more tests for profiling. It fails when a `rarely()` block hits - roughly 2% of the time. This should fix it.

nik9000 mentioned this pull request Sep 13, 2021

Update versions after backporting #77064 #77658

Merged

elasticsearchmachine pushed a commit that referenced this pull request Sep 13, 2021

Update versions after backporting #77064 (#77658)

4e0e2f9

Now that #77064 is backported we can run the bwc tests against it. And we can speak to 7.16 nodes using the wire protocol addedin #77064.

nik9000 removed the backport pending label Sep 13, 2021

jakelandis added v8.0.0-beta1 and removed v8.0.0 labels Oct 27, 2021

cjcenizal mentioned this pull request Nov 2, 2021

Add UI for Fetch phase in Search Profiler elastic/kibana#117216

Open

matsui-stb mentioned this pull request Mar 1, 2022

Bump elasticsearch from 7.9.3 to 7.16.3 codelibs/elasticsearch-dynarank#31

Merged

Profile the fetch phase #77064

Profile the fetch phase #77064

Uh oh!

Conversation

nik9000 commented Aug 31, 2021

Uh oh!

nik9000 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nik9000 Aug 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nik9000 commented Aug 31, 2021

Uh oh!

nik9000 commented Aug 31, 2021

Uh oh!

nik9000 commented Aug 31, 2021

Uh oh!

nik9000 commented Aug 31, 2021

Uh oh!

nik9000 commented Aug 31, 2021

Uh oh!

nik9000 commented Aug 31, 2021

Uh oh!

elasticmachine commented Aug 31, 2021

Uh oh!

jtibshirani left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jtibshirani left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nik9000 commented Sep 10, 2021 via email

Uh oh!

nik9000 commented Sep 10, 2021

Uh oh!

jtibshirani left a comment

Choose a reason for hiding this comment

Uh oh!

nik9000 commented Sep 13, 2021

Uh oh!

javanna commented Apr 19, 2022

nik9000 Aug 31, 2021 •

edited

Loading