Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synthetic source does not still support copy_to #652

Conversation

salvatore-campagna
Copy link
Contributor

@salvatore-campagna salvatore-campagna commented Aug 30, 2024

This causes the elastic/security track to fail execution when index_mode is set to
logsdb. This is happening because LogsDB uses synthetic source which, in turn, does
not support copy_to. Supporting copy_to is expected to come in Elasticsearch 8.16.
In the meanwhile we just exclude the copy_to setting from the mapping so to avoid
triggering the error.

This needs backporting to 8.15.

@salvatore-campagna
Copy link
Contributor Author

Since copy_to is not used under LogsDB, I need to go through all queries to make sure they use kubernetes.event.message instead of the empty message. Otherwise queries might fail, or worse
be much faster just because the message field is empty.

When using logsdb index mode copy_to is disabled because it is
not supported by synthetic source. So we need to change queries
so to reflect the fact that the message field would be empty.
@@ -582,7 +582,11 @@
"include_unmapped": true
},
{
{% if index_mode != "logsdb" %}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like our template engine does not process files in the workflows directory. As a result we are not able to conditionally execute the query on message or kubernetes.even.message. I will just unconditionally execute it using kubernetes.even.message that is the original field name. After we introduce support for copy_to we can restore the original behaviour.

This is done since copy_to does not work in synthetic source and the
message field would be empty.
We also introduce a track parameter which we can use to select
the workflow directory to use to read workflow queries. The default
is `workflows` that is the existing one. Using `workflows-logsdb`
allows us to use workflow queries which do not realy on the usage of
copy_to used by the metricbeat template.

This workaround is introduced to deal with synthetic source (and logsdb)
not supporting copy_to. Once support for copy_to is available we will
revert this change and rely on the existing and default workflows.
@salvatore-campagna
Copy link
Contributor Author

salvatore-campagna commented Sep 3, 2024

I tried adding a workflow-folter parameters which we can use to override the queries executed for logsdb. We would just need to pass an additional track parameter workflow-folder set to workflows-logsdb when executing with logsdb as the index.mode.

@gareth-ellis @charlie-pichette does it sound reasonable?

We will revert this once synthetic source supports copy_to.

@salvatore-campagna
Copy link
Contributor Author

BTW this would have been much easier if this track used the standard logs@settings component template. @charlie-pichette any chance to do that instead?

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few questions. This looks good to me. I think it is ok to temporarily copy the work flow files and undo this ounce we have copy_to support.

@@ -0,0 +1,13 @@
This workflow represents a user using the Hosts dashboard from the Security application in Kibana.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just refer to the other README and indicate this is just a temp solutions for logsdb?
No need to repeat READMEs, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe do this for the other READMEs too?

@@ -31,6 +31,9 @@
"operation-type": "composite",
"param-source": "workflow-selector",
"workflow": {{workflow | tojson }},
{% if p_index_mode == "logsdb" %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be determined by the work_flow track param? Or is the idea that the work_flow param is determined by the index mode track param?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the workflow parameter just says hosts, network or overview - the workflows-folder is what refers to either currently workflows, but for logsdb we want to use the workflows-logsdb folder

@salvatore-campagna
Copy link
Contributor Author

I executed a test for the stateful case and double checkd all datastream backing indices are actually using LogsDB.
I used a very small dataset to have a fast benchmark whose purpose was just to double check LogsDB was used.

esbench@elasticsearch-0:~$ curl -XGET -s -k -u esbench:super-secret-password "https://elasticsearch-0:9200/.ds-packetbeat-default-2024.09.04-000001/_settings?pretty" | grep "logsdb"
        "mode" : "logsdb",
esbench@elasticsearch-0:~$ curl -XGET -s -k -u esbench:super-secret-password "https://elasticsearch-0:9200/.ds-auditbeat-default-2024.09.04-000001/_settings?pretty" | grep "logsdb"
        "mode" : "logsdb",
esbench@elasticsearch-0:~$ curl -XGET -s -k -u esbench:super-secret-password "https://elasticsearch-0:9200/.ds-filebeat-default-2024.09.04-000001/_settings?pretty" | grep "logsdb"
        "mode" : "logsdb",
esbench@elasticsearch-0:~$ curl -XGET -s -k -u esbench:super-secret-password "https://elasticsearch-0:9200/.ds-metricbeat-default-2024.09.04-000001/_settings?pretty" | grep "logsdb"
        "mode" : "logsdb",
esbench@elasticsearch-0:~$ curl -XGET -s -k -u esbench:super-secret-password "https://elasticsearch-0:9200/.ds-winlogbeat-default-2024.09.04-000001/_settings?pretty" | grep "logsdb"
        "mode" : "logsdb",
esbench@elasticsearch-0:~$ curl -XGET -s -k -u esbench:super-secret-password "https://elasticsearch-0:9200/_cat/indices"
yellow open .ds-packetbeat-default-2024.09.04-000001 q1ZIl_THTFahWAA80Qqn_A 1 1   0 0    249b    249b    249b
yellow open .ds-auditbeat-default-2024.09.04-000001  kfKD56XeSEeO7qRk4gBsmA 1 1   0 0    249b    249b    249b
yellow open .ds-filebeat-default-2024.09.04-000001   3rlOpJ37T7W2woFGrJ_Cww 1 1 600 0 427.9kb 427.9kb 427.9kb
yellow open .ds-metricbeat-default-2024.09.04-000001 ELjRLFRXQc2RPQ5K1XvnlQ 1 1   0 0    249b    249b    249b
yellow open .ds-winlogbeat-default-2024.09.04-000001 dfDfSrA5RHadWk5x8jTyLg 1 1   0 0    249b    249b    249b

@salvatore-campagna
Copy link
Contributor Author

The track completed succesfully

2024-09-04 10:27:15,223 ActorAddr-(T|:39015)/PID:5993 esrally.reporter INFO |                                                         Metric |                    Task |          Value |   Unit |
|---------------------------------------------------------------:|------------------------:|---------------:|-------:|
|                     Cumulative indexing time of primary shards |                         |    0.00583333  |    min |
|             Min cumulative indexing time across primary shards |                         |    0           |    min |
|          Median cumulative indexing time across primary shards |                         |    0           |    min |
|             Max cumulative indexing time across primary shards |                         |    0.00583333  |    min |
|            Cumulative indexing throttle time of primary shards |                         |    0           |    min |
|    Min cumulative indexing throttle time across primary shards |                         |    0           |    min |
| Median cumulative indexing throttle time across primary shards |                         |    0           |    min |
|    Max cumulative indexing throttle time across primary shards |                         |    0           |    min |
|                        Cumulative merge time of primary shards |                         |    0           |    min |
|                       Cumulative merge count of primary shards |                         |    0           |        |
|                Min cumulative merge time across primary shards |                         |    0           |    min |
|             Median cumulative merge time across primary shards |                         |    0           |    min |
|                Max cumulative merge time across primary shards |                         |    0           |    min |
|               Cumulative merge throttle time of primary shards |                         |    0           |    min |
|       Min cumulative merge throttle time across primary shards |                         |    0           |    min |
|    Median cumulative merge throttle time across primary shards |                         |    0           |    min |
|       Max cumulative merge throttle time across primary shards |                         |    0           |    min |
|                      Cumulative refresh time of primary shards |                         |    0.00271667  |    min |
|                     Cumulative refresh count of primary shards |                         |   21           |        |
|              Min cumulative refresh time across primary shards |                         |    0           |    min |
|           Median cumulative refresh time across primary shards |                         |    0           |    min |
|              Max cumulative refresh time across primary shards |                         |    0.00271667  |    min |
|                        Cumulative flush time of primary shards |                         |    0.00025     |    min |
|                       Cumulative flush count of primary shards |                         |    5           |        |
|                Min cumulative flush time across primary shards |                         |    3.33333e-05 |    min |
|             Median cumulative flush time across primary shards |                         |    3.33333e-05 |    min |
|                Max cumulative flush time across primary shards |                         |    0.000116667 |    min |
|                                        Total Young Gen GC time |                         |    0.122       |      s |
|                                       Total Young Gen GC count |                         |    4           |        |
|                                          Total Old Gen GC time |                         |    0           |      s |
|                                         Total Old Gen GC count |                         |    0           |        |
|                                                   Dataset size |                         |    0.000409081 |     GB |
|                                                     Store size |                         |    0.000409081 |     GB |
|                                                  Translog size |                         |    2.56114e-07 |     GB |
|                                         Heap used for segments |                         |    0           |     MB |
|                                       Heap used for doc values |                         |    0           |     MB |
|                                            Heap used for terms |                         |    0           |     MB |
|                                            Heap used for norms |                         |    0           |     MB |
|                                           Heap used for points |                         |    0           |     MB |
|                                    Heap used for stored fields |                         |    0           |     MB |
|                                                  Segment count |                         |    4           |        |
|                                    Total Ingest Pipeline count |                         |  600           |        |
|                                     Total Ingest Pipeline time |                         |    0.369       |      s |
|                                   Total Ingest Pipeline failed |                         |    0           |        |
|                                                 Min Throughput |        insert-pipelines |   19.29        |  ops/s |
|                                                Mean Throughput |        insert-pipelines |   19.29        |  ops/s |
|                                              Median Throughput |        insert-pipelines |   19.29        |  ops/s |
|                                                 Max Throughput |        insert-pipelines |   19.29        |  ops/s |
|                                       100th percentile latency |        insert-pipelines |  649.265       |     ms |
|                                  100th percentile service time |        insert-pipelines |  649.265       |     ms |
|                                                     error rate |        insert-pipelines |    0           |      % |
|                                                 Min Throughput |              insert-ilm |   36.83        |  ops/s |
|                                                Mean Throughput |              insert-ilm |   36.83        |  ops/s |
|                                              Median Throughput |              insert-ilm |   36.83        |  ops/s |
|                                                 Max Throughput |              insert-ilm |   36.83        |  ops/s |
|                                       100th percentile latency |              insert-ilm |   80.0769      |     ms |
|                                  100th percentile service time |              insert-ilm |   80.0769      |     ms |
|                                                     error rate |              insert-ilm |    0           |      % |
|                                                 Min Throughput | bulk-index-initial-load |  198.98        | docs/s |
|                                                Mean Throughput | bulk-index-initial-load |  198.98        | docs/s |
|                                              Median Throughput | bulk-index-initial-load |  198.98        | docs/s |
|                                                 Max Throughput | bulk-index-initial-load |  198.98        | docs/s |
|                                        50th percentile latency | bulk-index-initial-load |   69.212       |     ms |
|                                        90th percentile latency | bulk-index-initial-load |  523.803       |     ms |
|                                       100th percentile latency | bulk-index-initial-load |  696.481       |     ms |
|                                   50th percentile service time | bulk-index-initial-load |   69.212       |     ms |
|                                   90th percentile service time | bulk-index-initial-load |  523.803       |     ms |
|                                  100th percentile service time | bulk-index-initial-load |  696.481       |     ms |
|                                                     error rate | bulk-index-initial-load |    0           |      % |
|                                                 Min Throughput |                   hosts |    0.1         |  ops/s |
|                                                Mean Throughput |                   hosts |    0.12        |  ops/s |
|                                              Median Throughput |                   hosts |    0.11        |  ops/s |
|                                                 Max Throughput |                   hosts |    0.16        |  ops/s |
|                                        50th percentile latency |                   hosts |  204.623       |     ms |
|                                        90th percentile latency |                   hosts | 1638.86        |     ms |
|                                       100th percentile latency |                   hosts | 2196.19        |     ms |
|                                   50th percentile service time |                   hosts |  201.545       |     ms |
|                                   90th percentile service time |                   hosts |  497.851       |     ms |
|                                  100th percentile service time |                   hosts | 1637.9         |     ms |
|                                                     error rate |                   hosts |    0           |      % |
|                                                 Min Throughput |                overview |    0.02        |  ops/s |
|                                                Mean Throughput |                overview |    0.04        |  ops/s |
|                                              Median Throughput |                overview |    0.04        |  ops/s |
|                                                 Max Throughput |                overview |    0.05        |  ops/s |
|                                        50th percentile latency |                overview |  341.835       |     ms |
|                                        90th percentile latency |                overview | 1151.38        |     ms |
|                                       100th percentile latency |                overview | 1346.12        |     ms |
|                                   50th percentile service time |                overview |  296.942       |     ms |
|                                   90th percentile service time |                overview | 1149.4         |     ms |
|                                  100th percentile service time |                overview | 1154.91        |     ms |
|                                                     error rate |                overview |    0           |      % |
|                                                 Min Throughput |                 network |    0.06        |  ops/s |
|                                                Mean Throughput |                 network |    0.09        |  ops/s |
|                                              Median Throughput |                 network |    0.09        |  ops/s |
|                                                 Max Throughput |                 network |    0.1         |  ops/s |
|                                        50th percentile latency |                 network |  241.289       |     ms |
|                                        90th percentile latency |                 network | 2146.94        |     ms |
|                                       100th percentile latency |                 network | 2827.87        |     ms |
|                                   50th percentile service time |                 network |  197.612       |     ms |
|                                   90th percentile service time |                 network |  888.805       |     ms |
|                                  100th percentile service time |                 network | 2366.76        |     ms |
|                                                     error rate |                 network |    0           |      % |

@charlie-pichette
Copy link

BTW this would have been much easier if this track used the standard logs@settings component template. @charlie-pichette any chance to do that instead?

I have no information about logs@settings, so I do not know what the impact is or how this is used in production. I am happy to evaluate the implications of changing the track to use that if you can provide me information on it.

@achuguy
Copy link
Contributor

achuguy commented Sep 4, 2024

@salvatore-campagna Regarding the logs@settings component template we can add it to the composable templates https://github.com/elastic/rally-tracks/tree/master/elastic/security/templates/composable. I can create a separate PR unless you want to add it in this PR.

@gareth-ellis
Copy link
Member

@elasticmachine update branch

@salvatore-campagna salvatore-campagna merged commit 2b42a38 into elastic:master Sep 5, 2024
13 checks passed
salvatore-campagna added a commit to salvatore-campagna/rally-tracks that referenced this pull request Sep 17, 2024
salvatore-campagna added a commit to salvatore-campagna/rally-tracks that referenced this pull request Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants