Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIP-119: AND operator for filters #1365

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

dskvr
Copy link
Contributor

@dskvr dskvr commented Jul 15, 2024

Filter modifier that has optimization benefits for relays, users and developers. Likely contentious.

Rendered NIP

Discourse

Implementations

Rationale

  • Reduce bandwidth for all, meme AND cat objectively consumes less bandwidth than meme OR cat
  • Reduce clock-time for relays, indexing with AND is fast for all common index formats, and faster compared to OR for some index formats. (See section below)
  • Reduce client-side caching requirements
  • Reduce centralization vectors by reducing or even eliminating the need for centralized REST, GraphQL APIs or specialized relay "feed" endpoints.
  • Give relays the option to be more useful at the protocol level while improving efficiency for all parties.

Considerations

  • New field for NIP-11.limitations: max_tags_per_and and max_tags_and
  • Benchmarking should be conducted to validate that bandwidth and protocol usability as benefits supercede implementation and clock-time cost.

Index Efficiency

Index Type AND Operation Efficiency OR Operation Efficiency Notes
B-Tree High Moderate B-Tree indexes are very efficient for AND operations, especially with compound indexes. For OR operations, they are less efficient than for AND, as the database engine might need to traverse multiple paths.
Bitmap High High Bitmap indexes excel in both AND and OR operations, particularly for columns with low cardinality. They utilize fast bitwise operations, making them ideal for read-heavy environments.
Hash Not Applicable Not Applicable Hash indexes are designed for equality checks and do not directly support range-based queries or optimize for AND/OR operations efficiently.
Full-Text High High Optimized for text search, full-text indexes efficiently handle both AND and OR conditions, making them suitable for complex text queries.

@alexgleason
Copy link
Member

Have you implemented a relay? I can't even make tag queries fast as they are.

@dskvr
Copy link
Contributor Author

dskvr commented Jul 15, 2024

Not yet, no. But I've scraped, researched and monitored them for almost two years.

There is a PR to nostr-rs-relay linked above from @v0l that uses this syntax and it includes benchmarks

I can't even make tag queries fast as they are.

AND operators are (generally) faster for btree and (relatively) equivalent for all other indexing methods (depending on impl. ofc), so if you are using btree indices AND operators via filters are likely faster (depending on case ofc).

@hzrd149
Copy link
Collaborator

hzrd149 commented Aug 27, 2024

Satellite Node relay now supports AND tag filters satellite-earth/core@15b8132
Although no one is running it yet 😁

@mikedilger
Copy link
Contributor

I cannot think of a use case where I want to ask a relay for only events that have the same tag twice with two different values.

@dskvr
Copy link
Contributor Author

dskvr commented Oct 5, 2024

@mikedilger @fiatjaf Really? I can think of an infinite number of cases.

Here's one: { kinds: [2003], "&t": [ "movie", "4k" ] } Re: NIP-35 inspired by @v0l

To retrieve that combination in a populated dataset client-side could easily be <10% of the result provided by #t. However, this isn't the best example because most events with t,4k will be a movie.

And another: { kinds: [2003], "&t": [ "movie", "4k", "turkish" ] }

easily <0.1%.

@fiatjaf
Copy link
Member

fiatjaf commented Oct 6, 2024

I think this proposal is reasonable if we limit & to 2 values -- more than that and it becomes infeasible for filtering on the relay side.

But even if it's reasonable I also think it's a slippery slope and soon people will be demanding full SQL support on relays.

@alexgleason
Copy link
Member

Postgres has a @> operator that is almost identical in functionality to Nostr filters, except it uses AND instead of OR for array values. If anything supporting OR was actually harder, because I had to break the query up into multiple @> statements for each value in the list.

We have been able to work around Nostr filter limitations by changing the events themselves in many cases. I think the torrent category example (or Amazon product filtering as another hypothetical), is still basically unsolved, and we worked around that by simply not supporting the ability to do it. 😃

I'm not sure it's worth changing NIP-01 filters. I've been shoehorning any extra functionality I need into NIP-50 search extensions.

@dskvr
Copy link
Contributor Author

dskvr commented Oct 7, 2024

I think this proposal is reasonable if we limit & to 2 values

Why not just leave it to the operator to decide? Maybe i run a relay that stores a unique event kind and I have optimized it to support up to 20 and filters. Why shouldn't I be able to allow a client to run REQs up to my determined limit?

But even if it's reasonable I also think it's a slippery slope and soon people will be demanding full SQL support on relays.

Possible, and fair point. However, as proposed it's an optimization that just so happens to be a feature. If data is provided to prove that there is optimization benefits for all parties (relays, clients, users) this would establish a baseline that could be used as a measure against filter creep.

I'm not sure it's worth changing NIP-01 filters.

As proposed it's additive and does not "change" NIP-01 filters.

119.md Outdated

- `AND` **MUST** take precedence over `OR`
- Tags used in `AND` **SHOULD NOT** be used in standard `OR` tags [`#`]
- Any tag used in `AND` **SHOULD** be ignored in `OR`
Copy link
Contributor

@bezysoftware bezysoftware Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on these rules? They say

Tags used in `AND` **SHOULD NOT** be used in standard `OR` tags

and yet in your example filter you specifically use the t tag in both AND and OR filters. Or do you mean tag value?

Also isn't the second point basically the same as the last one?

Copy link
Contributor Author

@dskvr dskvr Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tags used in AND SHOULD NOT be used in standard OR tags

Yes, the tag value. Will clarify. edit: has been updated. Thanks!

Also isn't the second point basically the same as the last one?

Yes and no. It's there as a defensive implementation hint since #2 will be disobeyed by clients regardless of what the NIP says.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, second point is asking clients "please don't do this", third is telling relays "if they do it, ignore it"

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Oct 7, 2024

Can we consider this to be part of NIP-01?

I really dislike having to check if each relay offers custom commands before even enabling the UI for the feature in the client. Maybe that's going to be normal in the future, but we should consider minimizing the array of optional base-level stuff available to clients. Or maybe we go all out with tons of customizations and make checks for custom filter features the norm and in that case, something like NIP-11 must be merged with NIP-01.

@bezysoftware
Copy link
Contributor

How would current relay implementations signal supporting this if it becomes part of 01? I think making this a separate NIP is better, with NIP 11 signaling. Much like NIP 70, that feels pretty similar.

And perhaps make NIP 11 required.

@vitorpamplona
Copy link
Collaborator

How would current relay implementations signal supporting this if it becomes part of 01?

It would just be required for all relays to implement it. There is no signaling needed.

My issue is that we are starting to have all these additional base protocol features without having a good signaling mechanism that must be present and accurate in all relays (most NIP-11 documents do not reflect what is implemented in the relay, so NIP-11 is quite useless for now)

@bezysoftware
Copy link
Contributor

bezysoftware commented Oct 7, 2024

My question was more about the transition period after merging. The current relay implementations wouldn't magically start supporting this (let alone running instances), you would need a mechanism to check for support.

I agree the current state isn't great, but frankly that's always going to be a problem. You'll always have to code defensively and have fallbacks when a relay says one thing (or nothing), but behaves differently.

@vitorpamplona
Copy link
Collaborator

Agree. We can have:

  1. A minimum required set of features for the base protocol + poor signaling support and a few very rare add-ons
  2. An expanded required set of features for the base protocol + poor signaling support and a few very rare add-ons
  3. A minimum required set of features for the base protocol + great signaling support and great add-ons
  4. An expanded required set of features for the base protocol + great signaling support and great add-ons

Today we have 1. If we think this PR is just for VERY RARE use cases where the client and the relay are probably coming from the same developer, then this makes sense.

If we want to see more usage of this, then we have to go for 2 or 3. Since we like simplicity, 4 is not desired on Nostr.

@staab
Copy link
Member

staab commented Oct 7, 2024

Feature support signaling is a hassle, but that's sort of the only downside. It would be nice to make any additions signal-able by putting them in a new NIP, so relays can signal support if they want to (even though many don't).

@fiatjaf
Copy link
Member

fiatjaf commented Oct 7, 2024

The most important downside of feature signaling is that it makes changing things a much easier process than currently.

@bezysoftware
Copy link
Contributor

The most important downside of feature signaling is that it makes changing things a much easier process than currently.

But currently feature signaling is done by adding new NIPs, so.. I don't follow. Do you mean we should have a different way to signal feature support?

@bezysoftware
Copy link
Contributor

bezysoftware commented Oct 7, 2024

FYI I have this implemented in netstr relay and deployed to a dev instance: wss://relay-dev.netstr.io
Feel free to play with it (e.g. according to meme sample: ["REQ", "test", { "&t": ["meme", "cat"], "#t": ["white", "black"] }]

@dskvr
Copy link
Contributor Author

dskvr commented Oct 18, 2024

What's worse than support signaling? A NIP that is merged too quickly and changes n times.

IMO Just leave this open and let relays signal support for it, like is already happening. If it works and is found helpful, the benefits of a merge will be implicit, the need for discussion will be reduced and the effort of merging will be minimized.

I see no reason to rush this NIP (or really, most NIPs).

@bezysoftware
Copy link
Contributor

I'm personally happy with this NIP and deployed it to production: https://relay.netstr.io/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants