Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support regex filter #387

Closed
wants to merge 22 commits into from

Conversation

fredagsfys
Copy link
Contributor

No description provided.

@fredagsfys fredagsfys changed the title 316/regex-filter Support regex filter Aug 30, 2021
Copy link
Collaborator

@XAMPPRocky XAMPPRocky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your PR! This mostly looks good to me, I've left a one comment. Once that is addressed this should be good to merge.


impl Filter for RegexBytes {
fn read(&self, ctx: ReadContext) -> Option<ReadResponse> {
let re = Regex::new(self.regex_expression.as_str()).unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Regex type should be created in create_filter, stored, and re-used here. This implementation will compile the Regex for each packet.

@fredagsfys
Copy link
Contributor Author

fredagsfys commented Aug 31, 2021 via email

Copy link
Collaborator

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! We have a capture bytes that does something pretty similar to this from what I can tell, could we enhance that filter with a new regex Strategy instead of implementing this as a new filter? https://github.com/googleforgames/quilkin/blob/main/src/filters/capture_bytes/capture.rs#L2-L7

@iffyio iffyio requested a review from markmandel August 31, 2021 07:56
iffyio and others added 2 commits August 31, 2021 09:07
So that we get pinged when PRs are up for review
Co-authored-by: Mark Mandel <markmandel@google.com>
@fredagsfys
Copy link
Contributor Author

fredagsfys commented Aug 31, 2021

Thanks for the PR! We have a capture bytes that does something pretty similar to this from what I can tell, could we enhance that filter with a new regex Strategy instead of implementing this as a new filter? https://github.com/googleforgames/quilkin/blob/main/src/filters/capture_bytes/capture.rs#L2-L7

I was looking at this as well first, but realised pretty soon that the description/idea of how this regex should work differs from what CaptureBytes does today. It will also introduce complexity to configuration, e.g different supporting properties, errors/unsupported combinations.

Example

config:
 strategy: PREFIX
 metadataKey: myapp.com/myownkey
 size: 3
 remove: false

vs

config:
 strategy: REGEX
 metadataKey: myapp.com/myownkey
 expression: "(?P<cstr>[^\x00]+)\x00"
 on_success: "capture | pass"
 on_error: "drop" 

We probably could try to align and merge logic of these two filters, for example PREFIX/SUFFIX drops today as default if size req is not met, but for the rest it is different. I see this more as regex filter utilises capture as a partial feature but also has support for pass.

An idea on how we could align these two filters to one is to support a configuration like this

config:
 strategy: PREFIX | SUFFIX | REGEX
 metadataKey: myapp.com/myownkey
 on_success: "capture | pass"
 on_error: "drop" 
 
 expression: "(?P<cstr>[^\x00]+)\x00" <-- Req strategy REGEX
 size: 3  <-- Req strategy PREFIX | SUFFIX
 remove: false <-- Req strategy PREFIX | SUFFIX

Required properties throws validation errors on startup if missing to set strategy, wdyt?

@iffyio
Copy link
Collaborator

iffyio commented Sep 1, 2021

Ah I see, yeah the only difference I can seems to be the pass feature while all other things seem common to capture filter. in which case we can merge it into the capture filter and make the 'pass' part configurable.
Config-wise, we can ensure that only one of them is provided and that strategy arguments aren't mixed with each other using a one-of schema type (we currently do something similar for the static/dynamic fields of a config).
So a config would look like:

# prefix
config:
  metadataKey: myapp.com/myownkey
  prefix:  # deserialization ensures that exactly one of prefix,suffix,regex is provided
    size: 3
    remove: false

or

# regex
config:
  metadataKey: myapp.com/myownkey
  regex:
    expression: ".*"
    remove: false
    capture: true # or maybe e.g `passthrough: true` since capturing the bytes makes more sense as a default given the filter name

For the on_error: drop, I'd say we ignore that field config-wise since that's already what all filters do when things go wrong (we can introduce options for all affected filters when we do in fact have options in such scenarios)

@markmandel
Copy link
Member

Ah I see, yeah the only difference I can seems to be the pass feature while all other things seem common to capture filter. in which case we can merge it into the capture filter and make the 'pass' part configurable.
Config-wise, we can ensure that only one of them is provided and that strategy arguments aren't mixed with each other using a one-of schema type (we currently do something similar for the static/dynamic fields of a config).

I just came here to say that I like this idea 👍🏻

Giles Heron and others added 11 commits September 7, 2021 14:18
Co-authored-by: Mark Mandel <markmandel@google.com>
* Update custom filter example and add CI

Update the custom filter example to the new API surface, and also setup
CI so that it ensures that it always compiles against dev Quilkin, and
therefore will stay in sync going forward.

Next I want to block out pieces of code and inject them into our docs
via mdbook when updating
https://googleforgames.github.io/quilkin/main/book/filters/writing_custom_filters.html
as well, rather than have them copy pasted.

Work on googleforgames#373

* Review updates.
* Exit on SIGTERM

we currently only listen for SIGINT while k8s sends
a SIGTERM to shutdown a process, so we need to listen for that
too

* shutdown logging

Co-authored-by: Mark Mandel <markmandel@google.com>
* Use hostname as default proxy id

If we don't provide a default proxy id this falls back to using the
hostname instead of uuid (and if that fails, then we use a uuid).

For environments like kubernetes, we would like the proxy's id to match
the pod id so that the management server can easily find pod information
for the proxy that is connected to it (the proxy passes its ID when it
connects to the server)

* use hostname only on linux

Co-authored-by: Mark Mandel <markmandel@google.com>
If we hit connectivity issues, make sure we retry until
the server comes back up.

Co-authored-by: Mark Mandel <markmandel@google.com>
* Docs: Updated Custom Filters

Updates the custom filter documentation to the latest API changes.

This also includes using mdbook's `include` preprocessor to inject the
example code into the documentation.

Note: Because of the use of `include` preprocessor I had to remove the
Markdown lists, as they wouldn't format correctly
(See: rust-lang/mdBook#1564).

Closes googleforgames#373

* Fix CI build
* Review Updates.
@github-actions github-actions bot removed the size/m label Sep 19, 2021
…n into 316/regex-filter"

This reverts commit 44ac12e, reversing
changes made to 0e93c20.
@github-actions github-actions bot added size/m and removed size/xl labels Sep 19, 2021
@markmandel
Copy link
Member

Just checking in to see if you need / want a hand in any way?

@quilkin-bot
Copy link
Collaborator

Build Failed 😭

Build Id: c7c2d223-6237-4246-9901-0c93d2a8a59a

Status: FAILURE

To get permission to view the Cloud Build view, join the quilkin-discuss Google Group.

@fredagsfys
Copy link
Contributor Author

fredagsfys commented Sep 28, 2021

Ah I see, yeah the only difference I can seems to be the pass feature while all other things seem common to capture filter. in which case we can merge it into the capture filter and make the 'pass' part configurable.
Config-wise, we can ensure that only one of them is provided and that strategy arguments aren't mixed with each other using a one-of schema type (we currently do something similar for the static/dynamic fields of a config).

I just came here to say that I like this idea 👍🏻

Thanks @markmandel, tbh I've been swamped at work and private latley. However, I'm now in a state where I'm fighting the language more than anything.

Two initial things I'd like your help with

  • As how I've refactored the configuration, we now support a dynamic strategy property which turns the enum to a struct with properties over a enum variant. This breaks the map_proto_enum in config.rs. Is there a nice alternative to parse this without compromising the nice configuration structure @iffyio proposed?

  • I'm trying to use composition to handle both something now called SizeCapture and RegexCapture joined under Capture trait. How can I get these to work in harmony in as_capture?

Last but not least, feel free to contribute if you think it is taking too much time... It is totally understandable. Rust is a new language for me and this turned out to be bigger than I anticipated in the first place! 😅

@iffyio
Copy link
Collaborator

iffyio commented Oct 5, 2021

Hi @devharis!

I'm trying to use composition to handle both something now called SizeCapture and RegexCapture joined under Capture trait. How can I get these to work in harmony in as_capture?

I think the simpler route would be have to all three strategies implement the Capture trait rather than introduce a new trait. The CaptureBytes filter would no longer need to know about things like remove or size etc, rather those belong to the implementation now that they differ - so something like

struct Suffix {
  remove: bool,
  size: usize,
}
struct Regex {
  regex: Regex,
  remove: bool,
}
impl Capture for Suffix {}
impl Capture for Regex {}

struct CaptureBytes {
  log: Logger,
  capture: Box<dyn Capture>,
  metadata_key: Arc<String>,
}
impl Filter for CaptureBytes {
  fn read(&self) {
    // I think this part works largely the same
    ...
    // Except no extra parameters needed in the `capture()` call
    // since the `self.capture` implementation already has all the
    // needed configuration.
    let token = self.capture.capture(ctx.contents)
    ...
  }
}

This breaks the map_proto_enum in config.rs

the map_proto_enum implementation only converts proto enums to plain rust enums that don't contain any data so likely we won't be able to use it anymore for this filter. We can instead do the mapping ourselves with e.g:

let strategy: Strategy = match protoconfig.strategy {
  ProtoStrategy::Suffix => {
     Strategy::Suffix {
        size: protoconfig.size,
        ...
     }
  },
}

(also sorry about the delayed feedback)

@XAMPPRocky XAMPPRocky added the kind/design Proposal discussing new features / fixes and how they should be implemented label Nov 2, 2021
@XAMPPRocky
Copy link
Collaborator

XAMPPRocky commented Dec 24, 2021

Since this has bitrotted with little activity, and I wanted to add support for regex to test something, I've continued this work in #458 and am going to close this, I'll be sure to add you as a coauthor to it. Thank you again for your PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes kind/design Proposal discussing new features / fixes and how they should be implemented size/m
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants