Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add responseless dns requests to dns_data table. #613

Closed
wants to merge 8 commits into from
Closed

Add responseless dns requests to dns_data table. #613

wants to merge 8 commits into from

Conversation

noman-xg
Copy link

This pull request is related to #418 which relates to including those DNS requests into the dns_data table for which a matching response was not found.

Testing Done

  • Verified that tests in stitcher_test.cc are passed.

noman@px-dev-docker:/pl/src/px.dev/pixie$ bazel test src/stirling/source_connectors/socket_tracer/protocols/dns/...
INFO: Analyzed 7 targets (0 packages loaded, 0 targets configured).
INFO: Found 4 targets and 3 test targets...
INFO: Elapsed time: 3.078s, Critical Path: 2.69s
INFO: 11 processes: 1 internal, 10 processwrapper-sandbox.
INFO: Build completed successfully, 11 total actions
//src/stirling/source_connectors/socket_tracer/protocols/dns:parse_test  PASSED in 0.0s
//src/stirling/source_connectors/socket_tracer/protocols/dns:stitcher_test PASSED in 0.0s
//src/stirling/source_connectors/socket_tracer/protocols/dns:types_test  PASSED in 0.0s

INFO: Build completed successfully, 11 total actions
noman@px-dev-docker:/pl/src/px.dev/pixie$ cat   /home/noman/.cache/bazel/_bazel_noman/4c31fb537ca0f31ab15bbd6a8445d3b6/execroot/px/bazel-out/k8-fastbuild/testlogs/src/stirling/source_connectors/socket_tracer/protocols/dns/stitcher_test/test.log
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //src/stirling/source_connectors/socket_tracer/protocols/dns:stitcher_test
I20221010 11:30:13.268741  5618 env.cc:47] Started: /home/noman/.cache/bazel/_bazel_noman/4c31fb537ca0f31ab15bbd6a8445d3b6/sandbox/processwrapper-sandbox/21/execroot/px/bazel-out/k8-fastbuild/bin/src/stirling/source_connectors/socket_tracer/protocols/dns/stitcher_test.runfiles/px/src/stirling/source_connectors/socket_tracer/protocols/dns/stitcher_test
[==========] Running 2 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 2 tests from DnsStitcherTest
[ RUN      ] DnsStitcherTest.RecordOutput
[       OK ] DnsStitcherTest.RecordOutput (0 ms)
[ RUN      ] DnsStitcherTest.OutOfOrderMatching
[       OK ] DnsStitcherTest.OutOfOrderMatching (0 ms)
[----------] 2 tests from DnsStitcherTest (0 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test suite ran. (0 ms total)
[  PASSED  ] 2 tests.
I20221010 11:30:13.269136  5618 env.cc:51] Shutting down

@pixie-io-buildbot
Copy link
Member

Can one of the admins verify this patch?

Changed Stichframes() to include the DNS requests in dns_data table which could not be matched with a response.

Signed-off-by: Noman_ali_bajwa <nmnbajwa@gmail.com>
@noman-xg noman-xg changed the title [PREVIEW] Add responseless dns requests to dns_data table. #418 [PREVIEW] Add responseless dns requests to dns_data table. Oct 10, 2022
@zasgar
Copy link
Member

zasgar commented Oct 11, 2022

ok to test

noman-xg and others added 2 commits October 12, 2022 22:21
@noman-xg
Copy link
Author

noman-xg commented Oct 12, 2022

Hey folks, how can i see the jenkins build log. The details hyperlink takes me to google sigin to continue to Pixie labs which says "Access blocked: Pixie Labs can only be used within its organization"
@htroisi

@htroisi
Copy link
Contributor

htroisi commented Oct 12, 2022

We're working on improving our process so that the build logs are public. But for now, I DM'd you the logs.

@noman-xg
Copy link
Author

noman-xg commented Oct 12, 2022

BLOCKED

Hey folks, Jenkins build is failing with this error. Any pointer on this one ?

Apply this patch to src/stirling/source_connectors/socket_tracer/protocols/dns/stitcher.cc? [Y/n] EXCEPTION: (PhutilConsoleStdinNotInteractiveException) The program is attempting to read user input, but stdin is being piped from some other source (not a TTY). at [<arcanist>/src/console/format.php:196] arcanist(head=stable, ref.master=85c953ebe4a6, ref.stable=b6babd9a07f4), arcanist-addons() #0 phutil_console_require_tty() called at [<arcanist>/src/console/format.php:47] #1 phutil_console_prompt(string) called at [<arcanist>/src/console/format.php:15] #2 phutil_console_confirm(string, boolean) called at [<arcanist>/src/lint/renderer/ArcanistConsoleLintRenderer.php:47] #3 ArcanistConsoleLintRenderer::promptForPatch(ArcanistLintResult, string, TempFile) called at [<arcanist>/src/workflow/ArcanistLintWorkflow.php:312] #4 ArcanistLintWorkflow::run() called at [<arcanist>/scripts/arcanist.php:427] <<< [1] (+124,647) <exec> 124,647,614 us script returned exit code 255Unhandled hudson.AbortException: script returned exit code 255, assuming fatal error.

@oazizi000
Copy link
Contributor

@Noman-Ali-Bajwa I think this is because the Jenkins build expects the diff to be pre-linted. We should figure out how to get the linters to run on your local environment.

Signed-off-by: Noman Ali Bajwa <nmnbajwa@gmail.com>
@ghost
Copy link

ghost commented Oct 12, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: noman-xg
Once this PR has been reviewed and has the lgtm label, please assign zasgar for approval by writing /assign @zasgar in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
@ghost
Copy link

ghost commented Oct 12, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: noman-xg
Once this PR has been reviewed and has the lgtm label, please assign zasgar for approval by writing /assign @zasgar in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

@oazizi000 oazizi000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Noman-Ali-Bajwa,

Thanks so much for this preview PR! So glad to have your contribution.

This is a great start. Leaving a few comments for discussion below.

Comment on lines 246 to 250
// CustomStitchFrames is a re-implementation of the existing StitchFrames with an added
// ability to record those DNS requests as well for which we could not match any response.
// Flag include_respless_dns_requests must be set to include such DNS requests.
RecordsWithErrorCount<Record> CustomStitchFrames(std::deque<Frame>* req_frames,
std::deque<Frame>* resp_frames) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The high degree of similarity of this function with PxStitchFrames suggests we should merge them and pass in an boolean argument that selects the behavior. Otherwise there's too much duplication, which could be a problem if there's some bug fix we need to make in the future.

Also, I know you asked about this on Slack. Thanks for putting this preview out so I could get a better grasp on how much duplication there actually ends up being :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge the two and push the changes soon.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in a3e9243

Comment on lines 312 to 327
// After the external loop's lifecycle comes to an end we end up with the request deque
// having only those request frames which have not been consumed yet i.e. consumed = false
// so essentially these are the requests which could not be matched with any response frame.
// Hence we iterate over this request deque, add a default response to it, make a record and
// append those records at the end of the entries vector.
auto it = req_frames->begin();
while (it != req_frames->end()) {
if (!(*it).consumed) {
Frame default_resp_frame;
default_resp_frame.timestamp_ns = 99;
StatusOr<Record> record_status = ProcessReqRespPair(*it, default_resp_frame);
entries.push_back(record_status.ConsumeValueOrDie());
}
it++;
}
return {entries, error_count};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new part. A couple cases that we'll probably have to handle:

  1. What if a request can't be matched with a response because the response is still in flight? The parser could run at any time, so we we have to be ready to handle that case. We might need to build in some assumptions about how delayed a response can be.

  2. What's the rationale on default_resp_frame.timestamp_ns = 99;? We'll have to be careful with that because it could affect latency calculations.

Copy link
Author

@noman-xg noman-xg Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. What should we consider as a timeout for the dns request. I think we can handle that using timestamp_ns.

  2. I have a workaround in mind to remove this. Will change in next commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For (1), there's no set number, but nslookup uses a default of 2 seconds if I'm not mistaken. So that could be a starting point. Whatever we choose, it should definitely be a FLAG so we can change it later if required.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout added in a3e9243

Made changes to set a timeout on DNS requests.

Signed-off-by: Noman Ali Bajwa <nmnbajwa@gmail.com>
Copy link
Contributor

@oazizi000 oazizi000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Noman-Ali-Bajwa for making the updates. Here are a few more on the latest version.

// append those records at the end of the entries vector.
auto it = req_frames->begin();
while (it != req_frames->end()) {
if (!(*it).consumed) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: if (!it->consumed)

Comment on lines 257 to 259
auto current_time = std::chrono::duration_cast<std::chrono::milliseconds>(
std::chrono::system_clock::now().time_since_epoch())
.count();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting the clock can become expensive when inside a loop. Can you get the current_time a single time outside the loop and use the fixed value inside the loop. While not strictly as accurate, it should be more than sufficient for this use case.

auto current_time = std::chrono::duration_cast<std::chrono::milliseconds>(
std::chrono::system_clock::now().time_since_epoch())
.count();
auto elapsed_seconds = current_time - (*it).timestamp_ns;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: it->timestsamp_ns


if (elapsed_seconds > FLAGS_dns_request_timeout_threshold_milliseconds) {
Frame default_resp_frame;
default_resp_frame.timestamp_ns = 99;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still thinking about this one. At minimum, we could use the time of the timeout as the "response" time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll get it out of the way today.

Changed dummy response_frame timestamp to be increment of request timestamp.

Signed-off-by: Noman Ali Bajwa <nmnbajwa@gmail.com>
…y timeout threshold.

Signed-off-by: Noman Ali Bajwa <nmnbajwa@gmail.com>
oazizi000
oazizi000 previously approved these changes Oct 20, 2022
Copy link
Contributor

@oazizi000 oazizi000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. One additional small comment. Thanks @Noman-Ali-Bajwa!

Comment on lines 36 to 37
DEFINE_uint64(dns_request_timeout_threshold_milliseconds, 2000,
"Number of seconds to wait for the in-flight response of a dns request.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add, in the description, that this flag depends on include_respless_dns_requests?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in ef1516a

@oazizi000
Copy link
Contributor

Also, feel free to take off the [PREVIEW] prefix on the PR :)

Signed-off-by: Noman Ali Bajwa <nmnbajwa@gmail.com>
@noman-xg noman-xg changed the title [PREVIEW] Add responseless dns requests to dns_data table. Add responseless dns requests to dns_data table. Oct 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants