Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an ID field to filters #174

Closed
iffyio opened this issue Jan 27, 2021 · 19 comments
Closed

Add an ID field to filters #174

iffyio opened this issue Jan 27, 2021 · 19 comments
Labels
area/operations Installation, updating, metrics etc kind/feature New feature or request

Comments

@iffyio
Copy link
Collaborator

iffyio commented Jan 27, 2021

We want to be able to assign an id (name is already taken) to filters - this would allow us to e.g filter in metrics only coming from a particular filter.

  • Update all current filter configs to include id field of type string - the field is optional and we use a default uuid otherwise.
  • We should enforce that any user provided id is at least a valid prometheus label
    https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels
  • Update all metrics exported by current filters to include an id set to the filter's assigned ID.
@iffyio iffyio added kind/feature New feature or request good first issue Good for newcomers labels Jan 27, 2021
@markmandel
Copy link
Member

Update all metrics exported by current filters to include an id set to the filter's assigned ID.

Should it be a label on the Filter's assigned metric(s), rather than part of the name? Might be good to see the total metrics in for a given filter, then dig down into each individual one?

I'm assuming here that ids for Filters are probably going to be a small set -- but we can't guarantee it. WDYT?

@markmandel markmandel added the area/operations Installation, updating, metrics etc label Jan 27, 2021
@iffyio
Copy link
Collaborator Author

iffyio commented Jan 27, 2021

Not entirely sure I follow, do you mean rather than e.g packets_dropped{id="foo"} (label), it'll be foo_packets_dropped (name)? Both cases generate the same number of time series so they'll have the same effect with the same number of filters

@markmandel
Copy link
Member

Ah ignore me - you were already saying to use a label, I just didn't quite grok that. We're on the same page. 👍

@XAMPPRocky
Copy link
Collaborator

XAMPPRocky commented May 11, 2021

Update all current filter configs to include id field of type string - the field is optional and we use a default uuid otherwise.

Looking at this now, would it make sense if instead there was some kind of wrapper newtype struct around a filter? I'm just trying to think if there are cases where you explicitly wouldn't want a filter to have an ID, and if not it might be easier to keep it in one place, rather than duplicating the field across each filter. I was thinking something like the following:

struct TrackedFilter<F> {
  id: Option<Uuid>,
  filter: F
}

impl<F: Filter> Filter for TrackedFilter<F> {
    fn read(&self, ctx: ReadContext) -> Option<ReadResponse> {
        self.filter.read(ctx)
    }
    
    fn write(&self, ctx: WriteContext) -> Option<WriteResponse> {
        self.filter.write(ctx)
    }
}

@iffyio
Copy link
Collaborator Author

iffyio commented May 11, 2021

Yeah it would be nice if we can have it in one place. At least all of the builtin filters would include an ID either to tag metrics/log info and a filter that doesn't need it can ignore the value. though custom filters won't necessarily have one.
I'm not sure if the wrapper in this case would be sufficient though since the underlying filter implementation needs access to the ID? both during normal flow but also in the filter factory when initializing metrics

@XAMPPRocky
Copy link
Collaborator

XAMPPRocky commented May 11, 2021

Good points.

though custom filters won't necessarily have one.

Well we could make it a requirement for all filters by adding fn id(&self) -> &Uuid to the Filter trait as a required method, then you would have to always provide an ID, even when you're writing a custom one. That doesn't reduce the boilerplate on the filter creation side as we'll still need fields but it does let us access the ID of the filter without knowing what filter it is.

I think it probably depends on whether you want the encapsulation to be in the filters or in metrics. Like do we want metrics to work with potentially any filter? Or do we want filters supposed to be mostly opaque and have complete control over metrics?

@iffyio
Copy link
Collaborator Author

iffyio commented May 11, 2021

I think it probably depends on whether you want the encapsulation to be in the filters or in metrics. Like do we want metrics to work with potentially any filter? Or do we want filters supposed to be mostly opaque and have complete control over metrics?

Its up to the filter to export the metric they choose to - we pass in the metrics registry and they register any implementation specific metrics, in that sense its mostly opaque yeah.

Currently its not required that a filter has an id since the current use cases of metrics/logs are implementation specific (e.g a filter that doesn't export any metrics doesn't need one). Not sure if there'll be a use case that requires quilkin to know an ID of a filter and we can add it to the trait if that shows up

@XAMPPRocky
Copy link
Collaborator

XAMPPRocky commented May 17, 2021

Something that came up when I was implementing this, is "What ID should a FilterChain get?" since it itself is a Filter.

@iffyio
Copy link
Collaborator Author

iffyio commented May 17, 2021

The FilterChain doesn't need an ID since its mostly an implementation detail, the ID isn't part of the filter api though so that should be fine to ignore it?

@XAMPPRocky
Copy link
Collaborator

The FilterChain doesn't need an ID since its mostly an implementation detail, the ID isn't part of the filter api though so that should be fine to ignore it?

Well FilterChain is part of the public API, and it not having an ID would mean that wouldn't be able to filter the chain out, only its components. I was imagining something like the following "pipeline" of filters, you wouldn't be able to hide all of the filters in the inner filter chain without also hiding the Debug that comes at the start.

FilterChain::new([
    Debug::new(),
    LoadBalancer::new(),
    Authentication::new(),
    FilterChain::new([
        Compression::new(),
        Debug::new(),
        Encryption::new(),
    ]),
    Scriptable::new(),
])

Where as if Filter::id is part of the public API, and FilterChain's also have IDs, they can be treated like any other filter, and you can filter/hide "chains" of filters just as easily. This introspect-ability also relates to #167 as it would be quite easy to implement processing metrics in terms of FilterChain if it can inspect a Filter's ID to map elapsed time stats to an ID, than needing every filter to implement processing metrics manually, it can be given for free if we can inspect them.

@iffyio
Copy link
Collaborator Author

iffyio commented May 17, 2021

That makes sense, I don't think FilterChain was intended to be public (if it is today it shouldn't be since its not mentioned in the docs for example) but yeah we could expose it.
If we do that, then we should do so as a proper filter that isn't special in anyway (so to move it under filters/ etc) - we'll essentially be saying filters are nestable. Also since this isn't dependent on the id issue, I think we can tackle it separately and e.g give filterchain an empty id or so to begin with.

@markmandel
Copy link
Member

Should the id be a String or &str - rather an a Uuid ? Just so that users can specify their own human readable ids?

@XAMPPRocky
Copy link
Collaborator

Is this resolved now?

@markmandel
Copy link
Member

Just peeking through this - we have a name field on the Filter Config:

(source)

/// Filter is the configuration for a single filter
#[derive(Debug, Deserialize, Serialize, Clone)]
#[serde(deny_unknown_fields)]
pub struct Filter {
    pub name: String,
    pub config: Option<serde_yaml::Value>,
}

But I don't think this information makes its way to the Filter when it gets created, at first glance it looks like it should be added to CreateFilterArgs

(source)

/// Arguments needed to create a new filter.
pub struct CreateFilterArgs<'a> {
    /// Configuration for the filter.
    pub config: Option<ConfigType<'a>>,
    /// metrics_registry is used to register filter metrics collectors.
    pub metrics_registry: Registry,
}

So it looks like that work is outstanding at this point.

@markmandel
Copy link
Member

....and as soon as I hit "submit" I realise that was wrong. The name field is for the Filter is for the filter name 🤦🏻

So it doesn't look like any of this work has been done yet.

@XAMPPRocky
Copy link
Collaborator

@markmandel yes but I thought from the discussion in #263 we didn’t want instance specific IDs, and Filter::name already serves as a type identifier.

@markmandel
Copy link
Member

Oh that's right! If a user did something like dynamically generating a uuid or something for each id, it becomes an possible footgun if they are running lots of proxies.

It would be useful though to be able to look at a specific configuration of a filter in a chain and see that that is the one that's causing a performance issue (for example).

But maybe this is one of those things where we close this for now and revisit if people want to explore it in the future -- it's entirely possible that in practice people won't actually use multiple instances of a filter that often.

@jahrlin any suggestions on how to prevent users from creating their own cardinality bomb accidentally? Do we think documentation is good enough (put some big warnings on it)?

@XAMPPRocky
Copy link
Collaborator

It would be useful though to be able to look at a specific configuration of a filter in a chain and see that that is the one that's causing a performance issue (for example).

One way would be to key by the values of the config rather than by the instance, that would provide some kind bound as long as the configuration itself isn’t unbounded.

@XAMPPRocky XAMPPRocky removed the good first issue Good for newcomers label Jul 16, 2021
@XAMPPRocky
Copy link
Collaborator

Closing this issue, as it seems resolved, and we can always open in a new issue if there's more work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/operations Installation, updating, metrics etc kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants