-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Optimize hash queries with lookup table #2933
chore: Optimize hash queries with lookup table #2933
Conversation
This PR may contain changes to database schema of one of the drivers. If you are introducing any changes to the schema, make sure the upgrade from the latest release to this change passes without any errors/issues. Please make sure the label |
You can find the image built from this PR at
Built from 4f31de1 |
a333adf
to
8ad5852
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Niiice! LGTM!
Thanks so much! Do we yet have any indication/way to compare how much better this performs than without the lookup table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM Thanks! 🐎 faster!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! looking fwd to see this running in the fleet :)
); | ||
|
||
-- Put data into lookup table | ||
INSERT INTO messages_lookup (messageHash, timestamp) SELECT messageHash, timestamp from messages; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know how slow this operation is, but since it's doing a batch insert, a way to speed up the insertion could be to create the table without the primary key, and then, after the insertion is complete, do:
ALTER TABLE messages_lookup ADD CONSTRAINT messageIndexLookupTable PRIMARY KEY (messageHash, timestamp)
This should be faster because it eliminates the overhead of maintaining the primary key index during each insert operation. And should be 'safe' since the messageHash
, timestamp
combination should be unique already in the messages
table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(of course, if the insertion takes just a couple of minutes then doing this optimization is an overkill)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool thanks! I've applied the suggestion in cb80df4
I think it would take around 15 minutes to complete
Yes, this can be checked through Status . I will suggest to dogfood that in our weekly Status dogfooding session |
efd0e96
to
62d7f5e
Compare
* Upgrade Postgres schema to add messages_lookup table * Perform optimized query for messageHash-only queries
Description
This PR enhances the performance of the most expensive query that we have at the moment, i.e.,
... WHERE messageHash IN (...)
The enhancement leverages the use of a lookup table (
messages_lookup
)Special thanks to @NagyZoltanPeter and @richard-ramos for sharing that great idea ❤️
Changes
messages_lookup
tablemessaged_lookup
table for new messagesgetMessages
procs when they fetch messages only filtering by themessageHash
attributeIssue
closes #2895