Skip to content

Stable changes for V3.0#67

Merged
Anyesh merged 62 commits intomasterfrom
v3.0.x
Feb 14, 2022
Merged

Stable changes for V3.0#67
Anyesh merged 62 commits intomasterfrom
v3.0.x

Conversation

@Anyesh
Copy link
Contributor

@Anyesh Anyesh commented Jan 22, 2022

  • Added background submission
  • Added ndjson batching
  • Arrange middlewares
  • Added optional custom fields on submit

Anyesh added 30 commits June 23, 2021 10:27
Add warning utility inside the utils sub package

usagelogger
Arrange multipart and warnings utility inside the utils directory

middleware
Add warning utility inside the utils sub package

usagelogger
Arrange multipart and warnings utility inside the utils directory

middleware
Anyesh added 19 commits October 20, 2021 20:13
…queue and thread

Added a mechanism that takes the payload and puts that on queue and spawns 1 extra thread for background thread. That thread reads the data from queue, adds WAF threat score then appends to a list. Later that list is submitted as a batch to reduce network overhead.

http_logger, base_logger
Remove `ml-waf`  from logger
Depricate old root dir middlewares and move them to middlewares dir

Breaks everything from v2.x.x
…ocess

Adding queue empty break on 100 as per suggestion of @monrax
@monrax
Copy link
Member

monrax commented Jan 24, 2022

I have yet to test it and measure performance, but it looks good! I think there's still the need to add another worker thread to handle the POST request itself, though. We don't expect it to add any significant overhead, but it is still a network I/O operation that should happen in the background (while payloads are being added to the queue but also while they are being processed). In terms of adding a bound to the queue, perhaps we should check its size instead of using a counter variable without a lock. I would suggest using the maxsize argument to initialize the queue itself, but I don't think we want to block at any point.

@monrax
Copy link
Member

monrax commented Jan 24, 2022

I think creating new threads might be too expensive compared to just leaving one daemon thread running since logger initialization. I believe the Queue.get method will block only the worker thread and not the main thread, so while there's no new submissions (and all previous ones have been bundled and sent) that thread will just sleep until there's a new msg to submit.

Maybe we can have two queues for each worker thread: an unbounded one where we will put payloads to be processed (compression, JSON serialization) by a worker thread, and one bounded queue where that same worker will put messages in while the other worker thread bundles them and handles the I/O operation. The goal would be that the process of putting payloads is never blocked, but the process of putting processed messages to bundle and send is, so that it would pause when the max bundle size is reached (while that is happening the payload would just pile on the unbounded queue). I can work on a first version of how that would look like.

Lastly, is there a reason for us to create a dictionary with the submission url if self.url is already available from the worker thread? (Same with self.skip_compression, and self.agent and the logger version for the headers)

@monrax
Copy link
Member

monrax commented Jan 26, 2022

I just pushed some changes to a new branch here: 0cd8193 I ended up not using the two queues, but instead just using a list as a buffer. I wanted to create the dispatcher thread when the logger is initialized, and change its internal loop to an inifinite loop so that it would sleep when the queue is empty. However, we would need to find a way to "flush" the buffer list before that happens.

As I'm writing this, I realize that a similar result could be achieved by just not checking for the "submission_thread" worker thread existence after putting in the payload, and instead using submit as a dispatcher of a limited amount of worker threads that would consume from the same queue, creating their own batches and submitting them accordingly (I think because of this condition I initially assumed the __internal_submission method to be a dispatcher instead of a network I/O worker thead).

One change introduced in this commit is doing the compression after it is all bundled instead of compressing each message and then joining them together (after thinking about it I figured it would be best to do it that way since compression works better as the amounts of data to look for patterns and similarities in increases).

@Anyesh
Copy link
Contributor Author

Anyesh commented Jan 31, 2022

I just pushed some changes to a new branch here: 0cd8193 I ended up not using the two queues, but instead just using a list as a buffer. I wanted to create the dispatcher thread when the logger is initialized, and change its internal loop to an inifinite loop so that it would sleep when the queue is empty. However, we would need to find a way to "flush" the buffer list before that happens.

As I'm writing this, I realize that a similar result could be achieved by just not checking for the "submission_thread" worker thread existence after putting in the payload, and instead using submit as a dispatcher of a limited amount of worker threads that would consume from the same queue, creating their own batches and submitting them accordingly (I think because of this condition I initially assumed the __internal_submission method to be a dispatcher instead of a network I/O worker thead).

One change introduced in this commit is doing the compression after it is all bundled instead of compressing each message and then joining them together (after thinking about it I figured it would be best to do it that way since compression works better as the amounts of data to look for patterns and similarities in increases).

That commit looks good but could you please check why unit tests are not passing?

@Anyesh Anyesh merged commit ba89843 into master Feb 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants