-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support regex tags search for Elasticseach backend #2049
Conversation
Signed-off-by: Annanay <annanayagarwal@gmail.com>
Codecov Report
@@ Coverage Diff @@
## master #2049 +/- ##
==========================================
- Coverage 97.39% 96.29% -1.10%
==========================================
Files 207 214 +7
Lines 10286 10535 +249
==========================================
+ Hits 10018 10145 +127
- Misses 223 332 +109
- Partials 45 58 +13
Continue to review full report at Codecov.
|
Signed-off-by: Annanay <annanayagarwal@gmail.com>
I've added the "search for presence of a tag" functionality to the cassandra reader as well, but I can submit that in a different PR |
Any ideas how we could support this in UI? Could you please add integration tests for regex and also escaped version e.g. find exactly |
It would be better to split the PR for regex and missing tags to two separate PRs. |
@pavolloffay - Please review. |
This reverts commit f94c220. Signed-off-by: Annanay <annanayagarwal@gmail.com>
Signed-off-by: Annanay <annanayagarwal@gmail.com>
What are the performance implications of always forcing a regex search? |
@yurishkuro From what I understand, Elasticsearch performance degrades with decrease in prefix length before the wildcard(operator) in the regex query. For ex: I could not find performance benchmarks for a Match query vs Regexp query for when no operator (*,?,etc) is used. |
I am also interested in performance results ad tag queries are common. Could you please run some perf tests to measure the query time? I have used this https://github.com/pavolloffay/jaeger-perf-tests before and also @jkandasa should have some perf framework, although I am not sure if that can be used without our infrastructure. |
If there is perf degradation maybe we could use regex type only if there is regex in the query. |
@annanay25 we have a tool to measure query timings. https://github.com/jkandasa/jaegerperf |
@pavolloffay - Thanks. Is there a guide on which metrics need to be monitored after running the perf-tests? @jkandasa - Could you please provide some more information on the tool? Does it perf-test the collector/query component, etc? The README doesn't say much. |
@pavolloffay - Also, in the integration test, the "Tag Name + Operation Name" query seems to match the I'm not sure I understand why that is happening, could you please take a look? |
"ExpectedFixtures": ["tags_regex_trace"] | ||
}, | ||
{ | ||
"Caption": "Tag regex + Operation name + max Duration - Multiple traces", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pavolloffay - Also, in the integration test, the "Tag Name + Operation Name" query seems to match the tags_regex_trace.json as well, even though they search on a different ServiceName and tag value.
@annanay25 do you mean this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the one above, "Tag regex + Operation name + max Duration"
It is supposed to match with just the tags_regex_trace
but matches with both tags_regex_trace
and tags_opname_trace
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one also looks wrong as it uses different service name https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/integration/fixtures/traces/tags_opname_trace.json
Could you please debug it and or open an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you fix the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to mangle the traceID further -
0955efa#diff-1824d3ce6ce534210f34269d2370e966
Signed-off-by: Annanay <annanayagarwal@gmail.com>
@annanay25 This tool can send spans to the collector component. and can be used to measure jaeger-query API execution time. Both actions are independent. |
@jkandasa - I was able to setup your tool to do the performance tests. I loved the simplicity, and the concept of creating jobs. I have a few suggestions from my experience of using it -
|
Perf results: From master build -
With this PR -
Bunch of the test results from the master build give an |
@annanay25 Thank you for the comments and feedback on the tool. I have added a metrics page to compare the results
Updated.
Metrics page includeded
In this case, you may see errors or no data for the specified timerange. Please refer JSON result.
Fixed
Thank you. I have modified and added in the source code. |
Seems like there is some performance degradation, how do you suggest this can be improved @pavolloffay? |
@annanay25 could you please create a summary of the test results? For instance what types of queries take how much time in % against master/head #1969 (comment) |
@pavolloffay - https://docs.google.com/spreadsheets/d/1mzJNQYv1xF0fiMAcpXk5TlNVE3GIqZOlZv2iDZqkraI/edit?usp=sharing On my system, after one run the perf tests start giving a 100% error for most queries in the master build (1.16) - can someone else test this? |
@annanay25 the results do not look good, the performance degradation is quite significant. I would suggest detecting the regex and changing the query type on the fly only when it is necessary. |
@pavolloffay - Tests on my machine are quite inconsistent (PTAL at round 2, numbers are in complete contrast with round 1) Can someone else run a benchmark on their machines? I can publish my image on dockerhub for this |
@annanay25 could you please build the query and collector image and publish it? @jkandasa would you be able to run the same tests as in #1969 ? |
@pavolloffay ok I will do. |
Thanks @jkandasa. |
Hello, |
@jkandasa - Did you get a chance to take a look at this? |
@annanay25 here is the result: Detailed file: https://docs.google.com/spreadsheets/d/1fIOYd4LJHHcmOIlEYuR9TQlAv4SXWdddy22zPmymvvw/edit?usp=sharing Image Details:
Test data:
Query test raw data: |
Thank you for the results @jkandasa. @pavolloffay - Could you please review these? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good, just minor comments in tests.
@jkandasa thanks for running the tests!
@annanay25 could you please provide more description in the first comment? E.g. what fields could be searched with regex. Is it only values or the keys? Does it work with our UI? |
Signed-off-by: Annanay <annanayagarwal@gmail.com>
Thanks for the review, @pavolloffay. I've addressed all comments. PTAL. |
what about logs? |
Signed-off-by: Annanay <annanayagarwal@gmail.com>
Is there any update on having regex in tag search? |
Signed-off-by: Annanay annanayagarwal@gmail.com
Which problem is this PR solving?
When using the ES backend, adds the ability to
foo=bar.*
should match all tags like{foo:bar}, {foo:barbaz}, {foo:bar1}
Related to #684
Short description of the changes
NewMatchQuery
method in ES reader forNewRegexpQuery
/cc @pavolloffay