Skip to content

Bulk import for efficient importing of data from the client into the database #27333

@roji

Description

@roji

Databases usually provide various mechanisms for efficient bulk import of data from the client (e.g. .NET application) into the database. We already have issues tracking improving INSERT performance (#15059, #9118, #10443), but bulk import is a specialized scenario where the native database mechanisms work more efficiently (and often considerably so) than multiple INSERTs. Note also that this is different from bulk copy of data across tables within the database (INSERT ... SELECT, tracked by #27320).

Bulk import only allows targeting a table, so a simple method on DbSet should suffice (LINQ operators don't make sense here):

ctx.Blogs.ImportFrom(blogs);

ImportFrom should accept an IEnumerable parameter; ImportFromAsync should have overloads accepting both IEnumerable and IAsyncEnumerable parameters. The method would pull entity instances and import them into the database table via the provider-specific mechanism.

Additional notes

  • EF Core can provide a default implementation which just uses regular INSERTs (without tracking (#9118) and possibly with other optimizations). This would make the API always work, regardless whether a provider has provided a special implementation for it.
  • This method would not be tracking. This typically isn't what's needed when bulk-importing, and the optimized import mechanisms don't typically allow reading back generated values (i.e. the IDs).
  • Most databases also provide bulk import mechanisms which take CSV or other textual data. While this may be used under the hood (implementation detail), we wouldn't expose a user-facing API that deals with textual data; the focus is on an API where the user provides entity instances. An EF Core API accepting textual data would have no added value beyond just using the API directly.
    • Similarly, some implementations allow importing data from a file on the database server - we again wouldn't expose an API for this as EF Core has no added value here.
  • We may want to provide some sort of hook for allowing users to pass provider-specific parameterization of the import process (e.g. SqlBulkCopy allows controlling the batch size, the timeout...). For example, the SQL Server provider could expose an additional overload accepting a custom SqlServerBulkCopyOptions, and there would be a way for that overload to pipe the user-provided options down to the provider's implementation.
  • Naming-wise, the word "import" is used because "copy" seems more ambiguous with copying across tables in the database (#27320).

Database support

  • SQL Server - takes DataTable or IDataReader as input.
  • PostgreSQL (Npgsql) - specialized API for binary import.
  • MySql
    • MySqlConnector - similar to SQL Server's SqlBulkCopy (but seems to support only DataTable/IEnumerable, no IDataReader).
    • Official driver (seems to be CSV/tab-delimited only).
  • SQLite: no special bulk import API apparently, but this dude wrote some tips on making inserts faster. It may make sense to wrap that in a bulk import implementation.
  • Cosmos. This is a good reason to have this abstraction in core rather than relational.

Community implementations

Metadata

Metadata

Assignees

No one assigned
    No fields configured for Feature.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions