Skip to content

Conversation

jean-frenette-optel
Copy link
Contributor

@jean-frenette-optel jean-frenette-optel commented Feb 23, 2022

This is an attempt to return ids and rows after bulk inserting (Issue : #94)

By choice we only provide the feature for Django 3.0+

jean-frenette-optel and others added 9 commits February 18, 2022 09:39
….0 version of feature flag can_return_rows_from_bulk_insert (namely, can_return_ids_from_bulk_insert) (cf. https://docs.djangoproject.com/en/4.0/releases/3.0/#database-backend-api)
Maybe a bit ambitious, as we completely forsake the SCOPE_IDENTITY strategy (dead code path - we keep the code here, but we could decide not to, really) in favor of OUTPUT strategy.
… let's only set can_return_rows_from_bulk_insert to True and not can_return_ids_from_bulk_insert
…means default values insertion - for n objects of course!), we must still fulfill our contract, and return the appropriate rows.

This means we won't use INSERT INTO (...) DEFAULT VALUES n times, but a single INSERT INTO (...) VALUES (DEFAULT, (...), DEFAULT), (...), (DEFAULT, (...), DEFAULT)
Also: be more thorough re the infamous feature flag rename from Django 3.0
if self.get_returned_fields() and self.can_return_columns_from_insert():
if self.can_return_rows_from_bulk_insert():
if not(self.query.fields):
# There isn't really a single statement to bulk multiple DEFAULT VALUES insertions,
Copy link
Contributor

@marcperrinoptel marcperrinoptel Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little bit of context to explain why we resort to a weird workaround to achieve bulk insertion in the case where there are no self.query.fields (but there are "returning fields"):

  • INSERT...DEFAULT VALUES won't help us: yes it is compatible with OUPUT INSERTED, but it is limited to inserting a single row. It's fine to continue to "fake bulk" like this when no returning fields are desired, but it won't help us here (as there is only one cursor.fetchall() to get all the returning values - unless we start overriding more stuff e.g. execute_sql, which isn't appealing at all)
  • specifying a column list for the INSERT despite the absence of self.query.fields and using value DEFAULT for every column (x every row) isn't an option either, because that column list would basically be: all fields minus auto fields; but having only auto fields is certainly possible (cf. the infamous bulk_create.tests.BulkCreateTests.test_empty_model), which leaves that list empty, and that does not work

SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n)
MERGE INTO %s
USING (SELECT TOP %s * FROM
(SELECT row_number() over (order by a.n) as number
Copy link
Contributor

@marcperrinoptel marcperrinoptel Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it work with just SELECT 1 FROM instead of SELECT row_number() over (order by a.n) as number FROM ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it does not work with SELECT 1 FROM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my suggestion was imprecise, but I've tried:
SELECT 1 as _
instead of
SELECT row_number() over (order by a.n) as number
and it seems to work. Could help getting the trick as lean as possible.

# so we have to use a workaround:
# https://dba.stackexchange.com/questions/254771/insert-multiple-rows-into-a-table-with-only-an-identity-column
result = ["""
WITH empty_rows AS (
Copy link
Contributor

@marcperrinoptel marcperrinoptel Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling it for example eight_fake_rows is a bit more explicit

Co-authored-by: marcperrinoptel <86617454+marcperrinoptel@users.noreply.github.com>
@jean-frenette-optel jean-frenette-optel marked this pull request as ready for review February 24, 2022 14:38
Bulk insert: simplification & refactoring
Copy link
Contributor

@absci absci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution!

@absci absci merged commit c677819 into microsoft:dev Mar 22, 2022
jmah8 added a commit that referenced this pull request Apr 27, 2022
* Only drop necessary indexes during field rename (#97)

* Only drop unique indexes which include the column to be renamed

This avoids unnecessarily slowing down migrations with
drop/recreate of index(es) on potentially large tables
when doing so is not required.

* Test for rename of unique-and-nullable column

This covers the specific case where we are still
dropping a unique index on a column & re-instating.

Also update references to an issue which was ported
to the new fork and fixed there (here).

* Add general test for presence of indexes

This is not a panacea but hopefully increase the coverage
if nothing else by at least being a regression test for
#77

* Basic logging config for debugging testapp tests

* Fix KeyTransformExact applied to all databases (#98)

* Fixed issue #82

* Issue #90 - fix alter nullability for foreign key (#93)

* Issue #90 bug fix

* Update SUPPORT.md (#101)

Remove paragraph pointing users to a the Django mailing list.

[skip ci]

* Add possibility for token authentication (#102)

* Add possibility for token authentication

* Add documentation for TOKEN setting

* Fixed overridden functions not working with other DBs (#105)

Fixes issue #92 and other potential issues caused by overriding Django functions in functions.py.

* Remove unique constraints (fixes #100 and #104) (#106)

* Add support for unique_constraint to introspection, and use it to determine if we should use DROP CONSTRAINT or DROP INDEX when altering or removing a unique (together) constraint. Fixes #100 and #104.

In the sys.indexes table, is_unique_constraint is true only when an actual constraint was created (using ALTER TABLE ... ADD CONSTRAINT ... UNIQUE).
Because this method is not suitable for nullable fields in practice (you cannot have more than one row with NULL), mssql-django always creates CREATE UNIQUE INDEX instead.
django-pyodbc-azure behaved differently and used a unique constraint whenever possible.

The problem that arises is that mssql-django assumes that an index is used to enforce all unique constraints, and always uses DROP INDEX to remove it.
When migrating a codebase from django-pyodbc-azure to mssql-django, this fails because the database contains actual unique constraints that need to be dropped using "ALTER TABLE ... DROP CONSTRAINT ...".

This commit adds support for is_unique_constraint to the introspection, so we can determine if the constraint is enforced by an actual SQL Server constraint or by a unique index.
Additionally, places that delete unique constraints have been refactored to use a common function that uses introspection to determine the proper method of deletion.

* Also treat primary keys as constraints instead of as indexes.

Co-authored-by: Ruben De Visscher <ruben@awingu.com>

* Returning ids/rows after bulk insert (#107)

* Implement can_return_rows_from_bulk_insert feature, returning ids or rows after bulk inserts.

* Since mssql-django supports Django 2.2, we also need the pre-Django 3.0 version of feature flag can_return_rows_from_bulk_insert (namely, can_return_ids_from_bulk_insert) (cf. https://docs.djangoproject.com/en/4.0/releases/3.0/#database-backend-api)

* My alternative changes on SQLInsertCompiler.as_sql.

Maybe a bit ambitious, as we completely forsake the SCOPE_IDENTITY strategy (dead code path - we keep the code here, but we could decide not to, really) in favor of OUTPUT strategy.

* Don't try to use the OUTPUT clause when inserting without fields

* Actually we don't really have to offer the feature for Django 2.2, so let's only set can_return_rows_from_bulk_insert to True and not can_return_ids_from_bulk_insert

* Tentative fix: when there are returning fields, but no fields (which means default values insertion - for n objects of course!), we must still fulfill our contract, and return the appropriate rows.

This means we won't use INSERT INTO (...) DEFAULT VALUES n times, but a single INSERT INTO (...) VALUES (DEFAULT, (...), DEFAULT), (...), (DEFAULT, (...), DEFAULT)
Also: be more thorough re the infamous feature flag rename from Django 3.0

* Using MERGE INTO to support Bulk Insertion of multiple rows into a table with only an IDENTITY column.

* Add a link to a reference web page.

* Attempt to make Django 2.2 tests pass

* Get back to a lighter diff of as_sql function vs. original

* Use a query to generate sequence of numbers instead of using the master....spt_values table.

* Update mssql/operations.py

Co-authored-by: marcperrinoptel <86617454+marcperrinoptel@users.noreply.github.com>

* Simplification & refactoring

Co-authored-by: marcperrinoptel <marc.perrin@optelgroup.com>
Co-authored-by: marcperrinoptel <86617454+marcperrinoptel@users.noreply.github.com>

* The reset_sequences argument in sql_flush must be understood as relative to the tables passed, not absolute. (#112)

The present fix aligns the relevant code snippet with the one from the closest impl. i.e. Oracle - cf. https://github.com/django/django/blob/795da6306a048b18c0158496b0d49e8e4f197a32/django/db/backends/oracle/operations.py#L493

* Adds unit test for issues #110 and #90 (#115)

* Added test to test changing from non-nullable to nullable with unique
index

* Fixed failing unit test on Django 3.1 and lower due to bad import

* Unskip one passed test (#116)

Unit test `bulk_create.tests.BulkCreateTests.test_bulk_insert_nullable_fields` is passing, remove it from expected failures.

* Add offset clause for all Django versions (#117)

Issue #109, missing OFFSET clause causes select_for_update() to fail

* Create devskim.yml (#120)

* Fixed missing ValidatorError import (#121)

* Updated supported Django versions (#122)

* Bump up version to 1.1.3 (#123)

Co-authored-by: Tom Sparrow <793763+sparrowt@users.noreply.github.com>
Co-authored-by: jmah8 <59151084+jmah8@users.noreply.github.com>
Co-authored-by: jean-frenette-optel <95249592+jean-frenette-optel@users.noreply.github.com>
Co-authored-by: Anthony Shaw <anthony.p.shaw@gmail.com>
Co-authored-by: Petter Moe Kvalvaag <pmkv@equinor.com>
Co-authored-by: Ruben De Visscher <github@rubendv.be>
Co-authored-by: Ruben De Visscher <ruben@awingu.com>
Co-authored-by: marcperrinoptel <marc.perrin@optelgroup.com>
Co-authored-by: marcperrinoptel <86617454+marcperrinoptel@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants