-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: postgresql store #157
base: main
Are you sure you want to change the base?
Conversation
Thanks for your contribution, nearly done with my review but I'll go ahead and approve and run the workflow. |
Hey great work, any update on this? |
I'm still in my review, I got wrapped up in some other responsibilities, will get it out ASAP! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delay on this review! Mostly looking good, a couple of changes I've requested here.
One note: I noticed that 'static
was used a good bit here which was likely inherited from the sqlite
implementation. I think this is due to how the connection requires an closure with an async block which makes borrowing and using such things really odd.
Recently, proper async closures which will get into Rust stable sometime in 2025 which should help us alleviate our usage of 'static
here (i think!)
// tracing_subscriber::EnvFilter::from_default_env() | ||
// .add_directive(tracing::Level::DEBUG.into()) | ||
// .add_directive("hyper=off".parse().unwrap()) | ||
// ).init(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would either leave this out or add this snippet back to the example!
let placeholders = placeholders.join(", "); | ||
|
||
let query_string = &format!( | ||
"INSERT INTO {} ({}) VALUES ({})", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This uses Rust string interpolation to evaluate the query string which introduces SQL injection risk. Nearly every instance of sql statements made in this crate can be converted to prepared statements to avoid this problem. This is a better practice to have since it will lead to a harder situation of user input leading to problems down the line (imagine prompt engineering -> hacking -> SQL injection 🤯).
This stackoverflow post has more information on this can be refactored.
EDIT: I realized you used prepared statements later, was there a specific reason you didn't use it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also noticed typed, prepared statements. I'm not too familiar with this, might be interested to look into.
.iter() | ||
.map(|(_, v)| &**v) | ||
.chain(std::iter::once(&embedding_vector as &(dyn ToSql + Sync))) | ||
.collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation seems a bit odd to me, I'm not sure why iterators are used here. I would leave a comment explaining what you are doing to prepare your parameters here.
/// to store embeddings and perform similarity searches. | ||
/// | ||
/// # Example | ||
/// ```rust,ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen this "```rust,ignore"
Does this disable doctests?
<span style="font-size: 48px; margin: 0 20px; font-weight: regular; font-family: Open Sans, sans-serif;"> + </span> | ||
<picture> | ||
<source media="(prefers-color-scheme: dark)" srcset="https://www.postgresql.org/media/img/about/press/elephant.png"> | ||
<source media="(prefers-color-scheme: light)" srcset="https://www.postgresql.org/media/img/about/press/elephant.png"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's the same logo for both, you prolly could just remove the media="(prefers-color-scheme: dark)"
and just keep one <source>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also include basic instructions (esp. what a DATABASE_URL
should be) and some links to install postgres.
let mut map = serde_json::Map::new(); | ||
for (i, column) in row.columns().iter().enumerate().take(rlen - 1) { | ||
let name = column.name(); | ||
let value = serde_json::Value::String(row.get(i)); | ||
map.insert(name.to_string(), value); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was curious what this would look like functionally (copilot assisted 😉)
let mut map = serde_json::Map::new(); | |
for (i, column) in row.columns().iter().enumerate().take(rlen - 1) { | |
let name = column.name(); | |
let value = serde_json::Value::String(row.get(i)); | |
map.insert(name.to_string(), value); | |
} | |
let map = row.columns().iter().enumerate().take(rlen - 1).fold( | |
serde_json::Map::new(), | |
|mut acc, (i, column)| { | |
acc.insert( | |
column.name().to_string(), | |
serde_json::Value::String(row.get(i)), | |
); | |
acc | |
}, | |
); |
Pretty neat, you don't need to accept this unless you like functional design better.
if let Err(e) = connection.await { | ||
tracing::error!("Connection error: {}", e); | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this example, why are you spawning a task for connecting, why can't you just await the connection directly?
Thanks for the review. I'm still in holiday mode until New Year's, will address the comments on Jan 2 or 3. |
Resolves #5
I've modeled the crate on the SQLite one.
On the DB side, the crate is relying on the pgvector extension. It creates an HSNW index and uses the cosine distane operator to compare embeddings.
Please let me know it this looks ok or if there are any parts of the code that should be changed.