-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for MongoDB query profiling. #640
Labels
area/datacollector
Issues related to Stirling (datacollector)
kind/feature
New feature or request
priority/important-longterm
Important over the long term, but may not be staffed and/or may need multiple releases to complete.
triage/accepted
Indicates an issue or PR is ready to be actively worked on.
Comments
ghost
added
the
needs-triage
Indicates an issue or PR lacks a `triage/foo` label and requires one.
label
Nov 3, 2022
oazizi000
added
area/datacollector
Issues related to Stirling (datacollector)
kind/feature
New feature or request
priority/important-longterm
Important over the long term, but may not be staffed and/or may need multiple releases to complete.
triage/accepted
Indicates an issue or PR is ready to be actively worked on.
labels
Nov 7, 2022
oazizi000
changed the title
KIndly provide support for MongoDB query profiling.
Support for MongoDB query profiling.
Nov 10, 2022
oazizi000
removed
the
needs-triage
Indicates an issue or PR lacks a `triage/foo` label and requires one.
label
Nov 18, 2022
I think this would be a great addition to Pixie |
vihangm
pushed a commit
that referenced
this issue
Sep 10, 2023
… ExtractInt function to ExtractBEInt (#1698) Summary: This PR adds a function to the BinaryDecoder class to extract little endian encoded integers from the buffer and renames the existing `ExtractInt` function to `ExtractBEInt` to make naming reflect the type of integer. There are several protocol parsing implementations, MongoDB included, which are little endian encoded but have been directly using its relevant functions within utils namespace. Adding this to BinaryDecoder would make it consistent with the existing extract integer functions that deals with big endian encoded integers. Ran `git grep ExtractInt | cut -d : -f1 | uniq | xargs -I{} sed -i 's/ExtractInt(/ExtractBEInt(/g' {}` and `git grep ExtractInt | cut -d : -f1 | uniq | xargs -I{} sed -i 's/ExtractInt</ExtractBEInt</g' {}` to make the naming changes and reran AMQP's code generation to verify changes were done right. Related issues: #640 Type of change: /kind feature Test Plan: `bazel build //...` and `bazel test //...` --------- Signed-off-by: Kartik Pattaswamy <kpattaswamy@pixielabs.ai>
This was referenced Sep 11, 2023
vihangm
pushed a commit
that referenced
this issue
Sep 14, 2023
Summary: This PR adds the external Bazel repo for the mongo-c-driver. This driver is used to help parse BSON documents from a MongoDB frame to type JSON strings. Related issues: #640 Type of change: /kind feature Test Plan: `bazel build @com_github_mongodb_mongo_c_driver//...` --------- Signed-off-by: Kartik Pattaswamy <kpattaswamy@pixielabs.ai>
JamesMBartlett
pushed a commit
that referenced
this issue
Sep 18, 2023
…1703) Summary: This PR adds the header file the parsing logic uses (currently in #1704 ) to get the MongoDB wire protocol's frame specifications. The header file currently contains the frame spec for a standard message header, the spec of a payload of type `OP_MSG` and contains the basic structure for a `Record`. Related issues: #640 Type of change: /kind feature Test Plan: Tested the integration of this and the parsing logic through unit tests. --------- Signed-off-by: Kartik Pattaswamy <kpattaswamy@pixielabs.ai>
JamesMBartlett
pushed a commit
that referenced
this issue
Oct 3, 2023
Summary: This is the final PR for MongoDB protocol parsing. It adds the logic to parse frames and includes corresponding tests. The initial parsing functionality currently supports frames of type `OP_MSG`. Related issues: #640 Type of change: /kind feature Test Plan: Included tests --------- Signed-off-by: Kartik Pattaswamy <kpattaswamy@pixielabs.ai>
This was referenced Oct 17, 2023
This was referenced Oct 31, 2023
vihangm
pushed a commit
that referenced
this issue
Nov 1, 2023
Summary: This PR adds the Mongo docker [image](https://hub.docker.com/layers/library/mongo/7.0.0/images/sha256-19b2e5c91f92c7b18113a1501c5a5fe52b71a6c6d2a5232eeebb4f2abacae04a?context=explore) to the image dependencies so that it can be mirrored to the various container registries for later use, specifically for the upcoming BPF test. Related issues: #640 Type of change: /kind test-infra Test Plan: Used this image through docker.io with the upcoming Mongo BPF test Signed-off-by: Kartik Pattaswamy <kpattaswamy@pixielabs.ai>
JamesMBartlett
pushed a commit
that referenced
this issue
Nov 2, 2023
Summary: This PR adds functionality to track the order of transactions as they are parsed. It adds the streamID of each request to the state vector at parsing time which will then be used to iterate over at stitching time. Adding this will mainly help the stitching process when trying to stitch a request with N `moreToCome` responses. **Motivation behind this change:** The stitching implementation relies on the new interface using `absl::flat_hash_map` to store the `streamID` and a deque of request/response frames. We then use a response led matching algorithm where we loop through the response map and stitch the first response frame in a deque with its corresponding request frame. A response pairs with a request when both frames share the same `streamID`, the response frame's `streamID` is its `responseTo` and the request frame's `streamID` is its `requestID`. MongoDB's `OP_MSG` wire protocol has the concept of `more_to_come` which means that the server could send `N` responses back to a singular request by the client. Each frame in the series of the `N` responses are linked by the `responseTo` of the frame matching the `requestID` of the previous response frame, similar to a singly linked list. The head response frame's `responseTo` will be the `requestID` of the request frame. Note: the `requestID` of each frame in the `N more_to_come` frames is random and unique. At the time of stitching, if we do not use state to track the order of transactions we would iterate over the response map in a "random" order and could iterate over the `more_to_come` response frames out of order. We could lose context on how the `more_to_come` frames are linked due to not knowing the head response frame and if we were to iterate over the end of the `more_to_come` message before looping through all prior `more_to_come` frames in the message they would be dropped since we do not know which request those frames are responding to. To solve this issue, tracking the order of transactions' `streamIDs` to iterate over would ensure that could use the response led stitching approach and find the complete `more_to_come` message for a given request. **New test case:** The new test case checks to make sure the state's `stream_order` vector is correctly populated with the order of `streamIDs` as we parse new request frames (transactions). The test case parses 3 frames and expects that the state's `stream_order` after parsing the first frame to contain `std::pair<917, false>` since the first frame's `requestID` is 917. It expects the `stream_order` to contain `std::pair<917, false>`, `std::pair<444, false>` after parsing the second request frame since that frame's requestID is 444 and so on. Related issues: #640 Type of change: /kind feature Test Plan: Modified the existing tests and added another test to make sure the vector is populated correctly. --------- Signed-off-by: Kartik Pattaswamy <kpattaswamy@pixielabs.ai>
JamesMBartlett
pushed a commit
that referenced
this issue
Nov 2, 2023
Summary: Previously, the TShark command in the `dataset_generation` script was not able to decode Mongo pcap files and insert them to the dataset for evaluation. This PR adds a flag to the TShark command to decode traffic running through port 27017 as Mongo. The readme is also updated to provide information about the bidirectional connection level dataset. **Updates to the confusion matrix** In the previous image, the connections per protocol in the dataset seem to have been duplicated leading to a large number of connections per protocol. This may have been due to the `dataset_generation` script appending data to the `.tsv` files each time it was ran even though the underlying pcap file content/counts not being altered. Running the `dataset_generation` script with empty `.tsv` files with the same pcap files followed by the `eval` script resulted in a matrix showing much fewer number of connections per protocol, suggesting that there may have been duplication in the dataset previously. The connection counts for each protocol in the older dataset seem to have increased by a factor of 4x or 8x the count as the new dataset and makes sense as to why the inference accuracy remained constant between the old/new matrix. The TLS connection count had dropped in the new matrix by the previous number of Mongo connections (432) due to the new TShark command decoding mongo connections. The Mongo captures may have been previously captured in one of the early iterations of running the `dataset_generation` script and not updated since in the old dataset. **New mongo additions** In the old dataset, the Mongo pcap files were mainly of type `OP_QUERY` which is an opcode that Stirling does not currently process. More mongo pcap files of type `OP_MSG` were added to test the existing inference rule and this resulted in 0.9% being mislabeled as `unknown` due to request side data missing from the connection and the existing rule not supporting response side inference for `OP_MSG` packets. 0.7% was mislabeled as `pgsql` due to request side data also missing from the connection and the opcode of the packet being one which is not is not recognizable by Stirling. Related issues: #640 Type of change: /kind test-infra Test Plan: Ran the dataset generation and evaluation scripts with the new TShark flag and verified the `.tsv` files were created appropriately and the confusion matrix was as expected. Signed-off-by: Kartik Pattaswamy <kpattaswamy@pixielabs.ai>
JamesMBartlett
pushed a commit
that referenced
this issue
Nov 3, 2023
Summary: This PR adds functionality to stitch MongoDB frames together. It relies on state to iterate over the order of streamIDs parsed and uses a response led matching approach. Related issues: #640 Type of change: /kind feature Test Plan: Added tests --------- Signed-off-by: Kartik Pattaswamy <kpattaswamy@pixielabs.ai>
aimichelle
pushed a commit
that referenced
this issue
Nov 6, 2023
Summary: This PR removes the frame's contents from the buffer in the case we return `kIgnored` when the frame is a type (opcode) we do not parse. When running the BPF test we noticed that the program would stall when the parser encountered a frame with an opcode it does not support. This was due to to the parser returning `kIgnored` to `ParseFramesLoop` and it not moving the buffer forward before calling `ParseFrame` again. This change will update the buffer position before returning `kIgnored` to `ParseFramesLoop` so that the remaining frames in the buffer can be parsed. Related issues: #640 Type of change: /kind bug Test Plan: Modified the existing test checking for the unsupported opcode type. Signed-off-by: Kartik Pattaswamy <kpattaswamy@pixielabs.ai>
This was referenced Nov 6, 2023
ChinmayaSharma-hue
pushed a commit
to ChinmayaSharma-hue/pixie
that referenced
this issue
Nov 9, 2023
…ixie-io#1703) Summary: This PR adds the header file the parsing logic uses (currently in pixie-io#1704 ) to get the MongoDB wire protocol's frame specifications. The header file currently contains the frame spec for a standard message header, the spec of a payload of type `OP_MSG` and contains the basic structure for a `Record`. Related issues: pixie-io#640 Type of change: /kind feature Test Plan: Tested the integration of this and the parsing logic through unit tests. --------- Signed-off-by: Kartik Pattaswamy <kpattaswamy@pixielabs.ai> Signed-off-by: Chinmay <chinmaysharma1020@gmail.com>
vihangm
pushed a commit
that referenced
this issue
Nov 9, 2023
#1763) Summary: This PR integrates the mongo stitcher with the new upstream stitching interface using a map of `streamID` and deque of frames. Related issues: #640 Type of change: /kind feature Test Plan: Tested these changes with the upcoming BPF test. Signed-off-by: Kartik Pattaswamy <kpattaswamy@pixielabs.ai>
vihangm
pushed a commit
that referenced
this issue
Nov 17, 2023
Summary: This PR integrates the mongo protocol tracing functionality with the socket tracer pipeline. The trace mode is currently set to off and can be toggled later. Related issues: #640 Type of change: /kind feature Test Plan: Tested these changes with the upcoming BPF test. Signed-off-by: Kartik Pattaswamy <kpattaswamy@pixielabs.ai>
This was referenced Nov 20, 2023
vihangm
pushed a commit
that referenced
this issue
Nov 30, 2023
Summary: When end to end testing MongoDB tracing, the parser encountered `OP_MSG` handshaking frames and was not able to parse them. This was due to the old parser searching for a CRUD command in the `OP_MSG` frame's top level key in the payload and `OP_MSG` handshaking frames not containing a key the old parser could interpret. This PR adds support to parse top level handshaking keys and discards those frames at stitching time. It also correctly prefixes `const` variables in the `types.h` file. The motivation to parse these frames instead of immediately ignoring them at parsing time is to account for the different possibilities of responses for a handshaking request. A handshaking request could lead to a `more_to_come` response where each frame in the `more_to_come` response may not be identifiable as a handshaking frame. It's also possible that the response of a handshaking request may contain an `ok` key which can be misidentified as a non handshaking response and left stale in the map of response deques. Identifying all of the handshaking request/response frames at stitching time but not inserting it to the records would ensure that all handshaking frames are cleared from the map and not pushed to the data table. Related issues: #640 Type of change: /kind bug Test Plan: Added a stitcher test Signed-off-by: Kartik Pattaswamy <kpattaswamy@pixielabs.ai>
This was referenced Nov 30, 2023
vihangm
pushed a commit
that referenced
this issue
Dec 6, 2023
Summary: This PR adds the BPF test for the mongo protocol tracer. The default tracing mode is on. Related issues: #640 Type of change: /kind feature Test Plan: Added the BPF test Changelog Message: ``` MongoDB query profiling is now supported by Stirling. ``` Signed-off-by: Kartik Pattaswamy <kpattaswamy@pixielabs.ai>
aimichelle
pushed a commit
that referenced
this issue
Dec 13, 2023
Summary: This PR adds the pxl script to visualize the MongoDB data table on the UI. This is how the table looks like <img width="1833" alt="Screenshot 2023-11-30 at 11 28 47 AM" src="https://github.com/pixie-io/pixie/assets/62078498/25be18a2-2c71-4888-8224-e188ce518ebd"> Related issues: #640 Type of change: /kind feature Test Plan: Ran the pxl script with `vis.json` in the scratch pad section of the UI Signed-off-by: Kartik Pattaswamy <kpattaswamy@pixielabs.ai>
aimichelle
pushed a commit
that referenced
this issue
Mar 11, 2024
Summary: This PR adds a demo app to test the mongo protocol tracing feature. Related issues: Closes #640 Type of change: /kind feature Test Plan: Uploaded the `px-mongo` demo artifact to the GCS dev bucket. Tested the CLI and then deployed the demo to my cluster. Skaffolded Pixie with the latest mongo tracing changes and used the scratch pad to execute the mongo pxl script to verify data is being collected from the demo. <img width="1401" alt="Screenshot 2023-12-06 at 1 29 39 PM" src="https://github.com/pixie-io/pixie/assets/62078498/57aba6ed-dce5-4721-8c55-c4cd23f9ed0d"> Changelog Message: ``` Adds the `px-mongo` demo to the CLI ``` Signed-off-by: Kartik Pattaswamy <kpattaswamy@pixielabs.ai>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/datacollector
Issues related to Stirling (datacollector)
kind/feature
New feature or request
priority/important-longterm
Important over the long term, but may not be staffed and/or may need multiple releases to complete.
triage/accepted
Indicates an issue or PR is ready to be actively worked on.
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: