Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Optimize hash queries with lookup table #2933

Merged
merged 11 commits into from
Aug 8, 2024

Conversation

Ivansete-status
Copy link
Collaborator

@Ivansete-status Ivansete-status commented Jul 25, 2024

Description

This PR enhances the performance of the most expensive query that we have at the moment, i.e., ... WHERE messageHash IN (...)

The enhancement leverages the use of a lookup table ( messages_lookup )

Special thanks to @NagyZoltanPeter and @richard-ramos for sharing that great idea ❤️

Changes

  • Change schema and new migration script to add the messages_lookup table
  • Adapt code to populate messaged_lookup table for new messages
  • Adapt getMessages procs when they fetch messages only filtering by the messageHash attribute

Issue

closes #2895

Copy link

This PR may contain changes to database schema of one of the drivers.

If you are introducing any changes to the schema, make sure the upgrade from the latest release to this change passes without any errors/issues.

Please make sure the label release-notes is added to make sure upgrade instructions properly highlight this change.

Copy link

github-actions bot commented Jul 25, 2024

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:2933

Built from 4f31de1

@Ivansete-status Ivansete-status force-pushed the optimize-hash-queries-with-lookup-table branch from a333adf to 8ad5852 Compare August 6, 2024 16:09
@Ivansete-status Ivansete-status marked this pull request as ready for review August 7, 2024 07:59
Copy link
Contributor

@gabrielmer gabrielmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Niiice! LGTM!

Thanks so much! Do we yet have any indication/way to compare how much better this performs than without the lookup table?

Copy link
Contributor

@SionoiS SionoiS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks! 🐎 faster!

Copy link
Member

@richard-ramos richard-ramos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! looking fwd to see this running in the fleet :)

);

-- Put data into lookup table
INSERT INTO messages_lookup (messageHash, timestamp) SELECT messageHash, timestamp from messages;
Copy link
Member

@richard-ramos richard-ramos Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know how slow this operation is, but since it's doing a batch insert, a way to speed up the insertion could be to create the table without the primary key, and then, after the insertion is complete, do:

ALTER TABLE messages_lookup ADD CONSTRAINT messageIndexLookupTable PRIMARY KEY (messageHash, timestamp)

This should be faster because it eliminates the overhead of maintaining the primary key index during each insert operation. And should be 'safe' since the messageHash, timestamp combination should be unique already in the messages table

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(of course, if the insertion takes just a couple of minutes then doing this optimization is an overkill)

Copy link
Collaborator Author

@Ivansete-status Ivansete-status Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool thanks! I've applied the suggestion in cb80df4

I think it would take around 15 minutes to complete

@Ivansete-status
Copy link
Collaborator Author

Niiice! LGTM!

Thanks so much! Do we yet have any indication/way to compare how much better this performs than without the lookup table?

Yes, this can be checked through Status . I will suggest to dogfood that in our weekly Status dogfooding session

@Ivansete-status Ivansete-status force-pushed the optimize-hash-queries-with-lookup-table branch from efd0e96 to 62d7f5e Compare August 8, 2024 18:13
@Ivansete-status Ivansete-status merged commit 4d47aa6 into master Aug 8, 2024
9 of 11 checks passed
@Ivansete-status Ivansete-status deleted the optimize-hash-queries-with-lookup-table branch August 8, 2024 19:46
Ivansete-status added a commit that referenced this pull request Aug 8, 2024
* Upgrade Postgres schema to add messages_lookup table
* Perform optimized query for messageHash-only queries
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

chore: improve postgres query performance
4 participants