Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to apply a filter when we are picking a string from a list of strings #9173

Closed
chris48s opened this issue May 16, 2023 · 2 comments · Fixed by #9193
Closed

Ability to apply a filter when we are picking a string from a list of strings #9173

chris48s opened this issue May 16, 2023 · 2 comments · Fixed by #9193

Comments

@chris48s
Copy link
Member

There are a couple of open PRs relating to filtering github tag/release badges.

One is about filtering github tag and release badges to only include tags that start with a prefix #8205
One is about excluding tags that contain some substring #9076

There is also an open issue relating to filtering dockerhub tags by a prefix #8789

We've also got some badges where we've implemented some kind of prefix or suffix filter that may or may not be working well #8714

See also #6275

In general, it seems like there is enough demand for this kind of thing that it is worth us thinking about, but I'm not overly keen to merge either of the currently open PRs. Maybe we add the ability to filter only github tags that start with a prefix, or that don't include a substring. Then someone else comes along and wants to only filter releases that end with a suffix, or we want to apply it to other badges. Feels like we could be heading down a messy road.

I'd rather find a more general solution that allows us the user to say "all the releases that end with the suffix -ubuntu" by specifying some kind of filter query like ?filter=%2A-ubuntu which we can apply to an array of strings anywhere where the task boils down to "pick a string from an array of strings and shove it on a badge". Then that gives us a flexible pattern we can apply in multiple places as this request comes up.

What I don't want to do is:

  • Accept an arbitrary user-supplied regex. Evaluating a user-supplied regex opens us up to a ReDos attack. It is fairly easy for a user to supply a regex which causes us to either spend a long time evaluating it or consume a large amount of memory evaluating it. Regex is powerful, but it is overkill for this task.
  • Invent our own query language/parser. It should be possible to adopt some kind of existing solution to this problem.

So ideally we want to find some kind of "string with wildcards or placeholders" that satisfies the following criteria:

  • At least expressive enough to allow a user to convey stuff like:
    • Does/does not include substring
    • Does/does not start with prefix
    • Does/does not end with suffix
  • Less capable than regex - won't open us up to ReDos or RCE vulns by evaluating user-supplied filters
  • Has an existing javascript implementation

One option I think is worth evaluating is format strings ( https://www.npmjs.com/package/scanf seems like a reasonable implementation)

Glob patterns also spring to mind as a possibility, but they are specifically for paths, and we're mostly not going to be dealing with paths. It is not quite the right fit.

There are also some very widely used wildcard matching libraries:

which we should consider.

I'd also be very open to other suggestions from the community if anyone else can think of something else we should evaluate.

I probably need to do a few proof-of-concepts (proofs-of-concept?) before we pick an approach and move forward.

@chris48s
Copy link
Member Author

I've spent a bit more time playing with these options.

https://www.npmjs.com/package/matcher has only 2 constructs: * represents a wildcard, and if the pattern starts with a !, the whole pattern is negated. That's it.

This is reasonably versatile. It certainly would allow us to cover the cases I've listed above:

  • *substring* / !*substring*
  • preifx* / !prefix*
  • *suffix / !*suffix

It is also intuitive/natural to write the filter expressions. There are some use-cases I can imagine that we couldn't satify with this syntax. For example you couldn't express "starts with a number" or "contains a YYYY-MM-DD formatted date" with this syntax.

https://www.npmjs.com/package/wildcard is very similar to matcher, except it doesn't implement the ! negation feature. We could work round this by allowing ?include= and ?exclude= rather than just a ?filter=. I think the negation feature is useful.

In principle, format strings are a bit more expressive. You could write something like "starts with a number" or "contains a YYYY-MM-DD formatted date".
Format strings don't have the concept of negation, so again we would have to implement that via ?include= and ?exclude= as opposed to ?filter=.
In practice, I've not found a package/implementation that works for our use-case though. I was hoping the sscanf function from https://www.npmjs.com/package/scanf would do the job, but having tried it out, its not quite the right fit. Essentially the use-case I have in mind is what is described in this SO post https://stackoverflow.com/questions/28735316/detect-if-string-matches-format-specifier-string but I've not managed to find a package that implements this and I don't really want to maintain our own implementation with a full plethora of placeholders.

I've also had a look to see if I can find anything else worth considering that I didn't mention in the first post and not found anything new.

Based on this, my inclination is to implement filtering based on https://www.npmjs.com/package/matcher . It is easy to implement, maintained by Sindresorhus and my instinct is that although there are some patterns that would not be possible to express, it is probably flexible enough for our needs.

@calebcartwright
Copy link
Member

Based on this, my inclination is to implement filtering based on https://www.npmjs.com/package/matcher . It is easy to implement, maintained by Sindresorhus and my instinct is that although there are some patterns that would not be possible to express, it is probably flexible enough for our needs.

Sold. We/our users can always make feature requests upstream should such pattern use cases be brought forward with enough motivation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants