Skip to content

Strategy for inserting rows with table-values parameters #2484

Open

Description

With SQL Server you can used table-values parameters to perform bulk DML very quickly and elegantly. Unfortunately, this is tedious:

  1. Create a table type
  2. Write an INSERT statement
  3. Create a DataTable or an IEnumerable
  4. Execute a command

Here is a proposal for how EF could pull off all of this transparently for inserts and deletes in SaveChanges:

Generate a table type programmatically for all tables that require it. 3 issues:

  1. The table type probably should be created in a separate transaction so that concurrent SaveChanges calls do not contend. But really this is optional because type creation is a one-time initialization.
  2. A type name must be chosen and the type's structure might change over time. This can be solved by appending a strong hash of the type's structure to it's name. That way table types are immutable. If the table structure changes a new type will be created. The hash would include column names, their order, data types, nullability and the primary key. Old TVP types are simply never cleaned up. They only arise on schema change which is rare.
  3. This requires DDL permission. EF could check on startup for these permissions. Alternatively, the feature could be opt-in.

I think all of this would work for deletes as well. Updates are trickier because there is a great variety of columns that might change or not change. Maybe EF can use a single type for all updates and simply ignore some columns for some updates.

Non-issues:

  1. Generated values (identity, defaults). The OUTPUT clause can return them elegantly.
  2. Performance. For SaveChanges calls with few rows the existing row-by-row strategy should be used. Over the network the TVP strategy is probably better starting with 2-3 rows due to roundtrip times. On the same machine I measured the threshold to be 10 for a particular workload.
  3. Semantics. I don't think the semantics of a SaveChanges call would be affected in any way. It's simply a "go faster" feature.
  4. "Do we really need this given that SqlBulkCopy exists?": TVPs can kind of compete with SqlBulkCopy. They are an integer factor slower but like 2 orders of magnitude faster than row-by-row inserts. 2 OOM go a long way. Often, this will be good enough. I think SqlBulkCopy usage will drop dramatically once this feature is available in EF.
  5. Topological ordering of DML. I don't see any issues here. Doing an entire table at once should always result in a valid topological order. Alternatively, EF could detect safe cases and fall back to row-by-row otherwise.
  6. Row order. Index uniqueness validation logically happens at the end of a DML statement. Therefore the order of rows in the TVP does not matter for correctness.

Performance benefits:

  1. Many-row inserts are much more efficient because SQL Server can "see" all rows at once and maintain all indexes optimally (usually by writing to indexes at a time in sorted order).
  2. Less round-trips. In a network a round-trip might cost 0.5ms.
  3. Less CPU on both client and server.

Who says that EF is not suitable for bulk inserts? Right now that might be the case but it does not have to be so. This would be awesome.

The point of this ticket is to present a viable plan for how this could be implemented in EF. Some inspiration for the dev team.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions