-
Notifications
You must be signed in to change notification settings - Fork 81
Allow to update duplicates on conflict in PostgreSQL #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of remarks about testing and the options passed to the method.
This is a great initiative and could actually extend more support for MySQL as well. To recap:
- add support for
update_duplicates
with a boolean to be consistent with MySQL implementation - consider to pass to
update_duplicates
a list of column to update (supported by both MySQL and PG) instead of the conflicting columns (you may set a primary key for this) - more testing
Book.bulk_insert(*destination_columns, update_duplicates: true) do |worker| | ||
worker.add(...) | ||
worker.add(...) | ||
# ... | ||
end | ||
|
||
# Update duplicate rows (PostgreSQL) | ||
Book.bulk_insert(*destination_columns, update_duplicates: %w[title]) do |worker| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be a little bit misleading this change of argument type according to the adapter.
- I would support the option
update_duplicates:
boolean argument to be consistent in the gem implementation (this is the reason the of original ON CONFLICT DO NOTHING to reproduce the behaviour of INSERT_IGNORE Psql conflict | Made ignore parameter dependent on adapter name #19)
The options to pass to this argument could be:
- the columns to update (supported also in mysql https://dev.mysql.com/doc/refman/8.0/en/insert-on-duplicate.html)
- the conflicting columns (by default the primary key for MySQL)
Assuming that some has set correctly its primary keys, I would consider more intuitive that update_duplicates:
will provide the list of the columns to update. This will make the option supported for MySQL as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider more intuitive that
update_duplicates:
will provide the list of the columns to update.
I would normally consider it a more intuitive as well, but Postgres requires inference specification or constraint name (in short: it needs conflicting columns). I've spent a while on it now and I find it hard to come up with satisfying abstraction that will work both for MySQL and Postgres.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about considering the following scenarios:
false
, default%[]
, only selected columnstrue
, all destination columns
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it solves the issue. We have destination columns (data you insert) and we have update_duplicates (unique columns for postgres, which is done automatically by mysql).
%[]
, only selected columns
How it would for MySQL?
true
, all destination columns
How it would work for PostgreSQL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @sobstel , thank you for your patience 😄
What I meant is the following:
Let's imagine we have a foos
table with columns a
, b
, c
, d
.
destination_columns = [:a, :b]
# I would see 3 use cases
# PG
Foo.bulk_insert(*destination_columns, update_duplicates: false) #clear
Foo.bulk_insert(*destination_columns, update_duplicates: [:a]) #clear
Foo.bulk_insert(*destination_columns, update_duplicates: true)
# => true will be a shortcut for update_duplicates: destination_columns
# Mysql
Foo.bulk_insert(*destination_columns, update_duplicates: false) #clear
Foo.bulk_insert(*destination_columns, update_duplicates: true) # clear
Foo.bulk_insert(*destination_columns, update_duplicates: [:a])
# => you will update only a
bulk_insert/lib/bulk_insert/worker.rb
Lines 155 to 159 in 91b1a5b
elsif adapter_name =~ /^mysql/i && update_duplicates | |
update_values = @columns.map do |column| | |
"`#{column.name}`=VALUES(`#{column.name}`)" | |
end.join(', ') | |
' ON DUPLICATE KEY UPDATE ' + update_values |
New implementation suggested:
elsif adapter_name =~ /^mysql/i && update_duplicates
columns_scope = \
if update_duplicates.is_a?Array
->(c) { update_duplicates.map(&:to_s).include?(c.name) }
else
->(c) { c }
end
update_values = @columns.select(&columns_scope).map do |column|
"`#{column.name}`=VALUES(`#{column.name}`)"
end.join(', ')
' ON DUPLICATE KEY UPDATE ' + update_values
You can try it out in irb easily:
$ irb
irb(main):001:0> update_duplicates = ['a', :b, 1]
=> ["a", :b, 1]
irb(main):002:0> columns = ["b", "1"]
=> ["b", "1"]
irb(main):003:0> columns_scope = ->(c) { update_duplicates.map(&:to_s).include?(c) }
=> #<Proc:0x00007fa17980ba20@(irb):3 (lambda)>
irb(main):004:0> columns.select(&columns_scope)
=> ["b", "1"]
irb(main):005:0> id = ->(c) { c }
=> #<Proc:0x00007fa1768544d0@(irb):5 (lambda)>
irb(main):006:0> update_duplicates.select(&id)
=> ["a", :b, 1]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your reply and sorry for another long wait. Unfortunately I struggle to find more time to invest into it, sorry.
Actually you have implementation done above. Feel free to update it or even to create separate PR with final solution including your improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would love to see this goes into code, we cannot use this amazing gem because lack of postgress support
@sobstel I would keep #40 (comment) as an improvement for a separated PR. Can you prove with an example that this implementation is working? If so, I would be OK to pass it
@sobstel's PR is working fine, Inserted value 'Grade 1', updated it 'Grade 2'. @mberlanda , Are you merging this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This work and has been opened since a while. Since it looks it is pretty demanded i will merge but I do not have the rights to release. (cc @jamis could bump a minor version? 🙇 )
@sobstel I guess we need to find some energies and time to create some adapters because we are starting to get too many regexp like adapter_name =~ /\APost(?:greSQL|GIS)/i
.
I agree, but I'm still not able to accommodate any time for this, but I'll try to keep it in mind. |
It adds support for PostgreSQL 9.5+
ON CONFLICT DO UPDATE
(https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.5#INSERT_..._ON_CONFLICT_DO_NOTHING.2FUPDATE_.28.22UPSERT.22.29).
It has been implemented as suggested in #34 (comment).
However, in a hindsight I feel that these things (ignore, update_duplicates, etc) make this simple gem a bit too bloated. You should probably do standard stuff with a gem and use... custom queries for all the others (I ended up with custom script for my use case eventually: https://gist.github.com/sobstel/1b022e17f2f2bb276e766f7fea974b17).
Feel free to merge it if you disagree ;-)