Add references to IMPORT INTO in relevant docs #5451

lnhsingh · 2019-09-18T20:21:56Z

Closes #5180.

cockroach-teamcity · 2019-09-18T20:22:02Z

This change is

cockroach-teamcity · 2019-09-18T20:25:03Z

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/1b8cc3a3754d8a3156b6349b87704b815e455827/

Edited pages:

jseldess

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jseldess and @lhirata)

v19.2/insert.md, line 12 at r1 (raw file):

## Performance best practices

- <span class="version-tag">New in v19.2:</span> To bulk-insert data into an existing table, use the [`IMPORT INTO`](import-into.html) statement.

I don't think we should equate "inserting multiple rows" with "bulk inserting". If you're inserting multiple rows at once, this is still more performance. If you're inserting "bulk", import into is better. Can you chat with Roland or the devs on where we should draw the threshold, or how to know which to use?

v19.2/insert.md, line 147 at r1 (raw file):

### Insert multiple rows into an existing table

<span class="version-tag">New in v19.2:</span> To bulk-insert data into an existing table, use the [`IMPORT INTO`](import-into.html) statement.

See above.

v19.2/performance-best-practices-overview.md, line 35 at r1 (raw file):

### Use `IMPORT INTO` instead of `INSERT` for bulk inserts into existing tables

<span class="version-tag">New in v19.2:</span> To bulk-insert data into an existing table, use the [`IMPORT INTO`](import-into.html) statement.

I suspect this is the right guidance here, since we're talking "bulk", but I also suspect we'll still want to guide people when to use multi-row inserts. Same question as above, really.

v19.2/performance-tuning.md, line 913 at r1 (raw file):

{{site.data.alerts.callout_info}}
<span class="version-tag">New in v19.2:</span> To import bulk data from one or more CSV files into an existing table, use the [`IMPORT INTO`](import-into.html) statement.

It's hard to see how this fits with what comes after. Do you want this to be a tip that insert into is the most performance for bulk inserting into an existing table when you can format the data as csvs?

v19.2/sql-faqs.md, line 12 at r1 (raw file):

- To bulk-insert data into a brand new table, the [`IMPORT`](import.html) statement.
- <span class="version-tag">New in v19.2:</span> To bulk-insert data into an existing table, use the [`IMPORT INTO`](import-into.html) statement.

Depending on what you find, you might still want to mention multi-row inserts as well.

lnhsingh

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @lhirata and @rolandcrosby)

v19.2/insert.md, line 12 at r1 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

I don't think we should equate "inserting multiple rows" with "bulk inserting". If you're inserting multiple rows at once, this is still more performance. If you're inserting "bulk", import into is better. Can you chat with Roland or the devs on where we should draw the threshold, or how to know which to use?

When I talked to @rolandcrosby, he mentioned that it's kind of a grey area and maybe we could lay out that new in 19.2, you can import from CSV into an existing table and let the user decide what to do. Do you think specifying here (and where you commented elsewhere) "To insert bulk data from a CSV file into an existing table, use IMPORT INTO" differentiates itself enough from "inserting multiple rows"?

jseldess · 2019-09-23T02:48:55Z

v19.2/insert.md, line 12 at r1 (raw file):

Previously, lhirata wrote…

When I talked to @rolandcrosby, he mentioned that it's kind of a grey area and maybe we could lay out that new in 19.2, you can import from CSV into an existing table and let the user decide what to do. Do you think specifying here (and where you commented elsewhere) "To insert bulk data from a CSV file into an existing table, use IMPORT INTO" differentiates itself enough from "inserting multiple rows"?

Can you look into this more? I'd like us to understand when to tell people to use multi-row inserts vs import into. I suspect multi-row inserts will be for smaller batches, where you get better performance from a single batch (insert into xxx values (a, 1), (b, 2), (c, 3);) than from separate inserts (insert into xxx values (a, 1);, insert into xxx values (b, 2);, insert into xxx values (c, 3);. You should ask some engineers like David Taylor and probably Ben Darnell.

lnhsingh

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @lhirata and @rolandcrosby)

v19.2/import-into.md, line 200 at r1 (raw file):

- The table will be taken offline during the incremental import.

Known issues that will be addressed in upcoming alpha releases:

@dt, are these known issues resolved?

dt

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @lhirata and @rolandcrosby)

v19.2/import-into.md, line 200 at r1 (raw file):

Previously, lhirata wrote…

@dt, are these known issues resolved?

yes. Additional known limitation: the gc ttl on the table must allow enough time for the import to complete. If it doesn't, there is a risk of the table becoming permanently unavailable if the import encounters any issue causing it to fail if it has run longer than theTTL, as at that point the data needed to rollback to before it started may have been GC'ed and the table may be irrecoverable / need to be restored from backup.

@rolandcrosby Best practice is to have regular backups anyway, but given that IMPORT INTO a new feature with a thus elevated potential for bugs plus a known table-loss edge case above, do we need to add any special advice?

dt

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @lhirata and @rolandcrosby)

v19.2/import-into.md, line 200 at r1 (raw file):

Previously, dt (David Taylor) wrote…

yes. Additional known limitation: the gc ttl on the table must allow enough time for the import to complete. If it doesn't, there is a risk of the table becoming permanently unavailable if the import encounters any issue causing it to fail if it has run longer than theTTL, as at that point the data needed to rollback to before it started may have been GC'ed and the table may be irrecoverable / need to be restored from backup.

@rolandcrosby Best practice is to have regular backups anyway, but given that IMPORT INTO a new feature with a thus elevated potential for bugs plus a known table-loss edge case above, do we need to add any special advice?

I'd also move the - The table will be taken offline during the incremental import. to the top and/or also give that a call-out box as it is potentially a production-disrupting detail.

lnhsingh

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @lhirata and @rolandcrosby)

v19.2/import-into.md, line 200 at r1 (raw file):

Previously, dt (David Taylor) wrote…

I'd also move the - The table will be taken offline during the incremental import. to the top and/or also give that a call-out box as it is potentially a production-disrupting detail.

Removed alpha release related limitations / added new limitation callout.

lnhsingh · 2019-09-30T21:31:49Z

Note: SQL diagram also needs to be added. The SVG needs to be generated once cockroachdb/cockroach#41214 is merged.

lnhsingh · 2019-10-15T18:11:49Z

Might want to also add to Known Limitations that you can't use IMPORT INTO on interleaved tables.

cockroach-teamcity · 2019-10-28T01:41:03Z

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/048639d3e26a93b7578d5999727b4e5fee85825f/

Edited pages:

jseldess

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @lhirata and @rolandcrosby)

v19.2/import-into.md, line 200 at r1 (raw file):

Previously, lhirata wrote…

Removed alpha release related limitations / added new limitation callout.

Updated this list. @dt, does it look accurate and complete?

v19.2/insert.md, line 12 at r1 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Can you look into this more? I'd like us to understand when to tell people to use multi-row inserts vs import into. I suspect multi-row inserts will be for smaller batches, where you get better performance from a single batch (insert into xxx values (a, 1), (b, 2), (c, 3);) than from separate inserts (insert into xxx values (a, 1);, insert into xxx values (b, 2);, insert into xxx values (c, 3);. You should ask some engineers like David Taylor and probably Ben Darnell.

After reading the warning at the top of the incremental import page about the feature not being ready for production use, I've changed my mind here. I've reverted to what we had and added a note about the new feature, but with a warning.

v19.2/insert.md, line 147 at r1 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

See above.

Reverted and added note, as mentioned above.

v19.2/performance-best-practices-overview.md, line 35 at r1 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

I suspect this is the right guidance here, since we're talking "bulk", but I also suspect we'll still want to guide people when to use multi-row inserts. Same question as above, really.

Reverted and added note, as mentioned above.

v19.2/performance-tuning.md, line 913 at r1 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

It's hard to see how this fits with what comes after. Do you want this to be a tip that insert into is the most performance for bulk inserting into an existing table when you can format the data as csvs?

Removed and added qualified note at end of section.

v19.2/performance-tuning-insecure.md, line 763 at r3 (raw file):

{{site.data.alerts.callout_info}}
<span class="version-tag">New in v19.2:</span> To import bulk data from one or more CSV files into an existing table, use the [`IMPORT INTO`](import-into.html) statement.

Removed and added qualified note at end of section.

v19.2/sql-faqs.md, line 12 at r1 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Depending on what you find, you might still want to mention multi-row inserts as well.

Reworked to still emphasize multi-row inserts but with a note about import into.

cockroach-teamcity · 2019-10-28T02:34:44Z

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/8d23c61ed9818fcc7f5cc294d0a08cfc4fb0f1e9/

Edited pages:

cockroach-teamcity · 2019-10-28T02:42:26Z

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/849510a26d18182db3faf95d8bdb8145364d93d8/

Edited pages:

jseldess · 2019-10-28T03:20:16Z

@dt and @rolandcrosby, PTAL one more time.

cockroach-teamcity · 2019-10-28T03:23:21Z

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/08ab16c9c5f92714f751c3e42aa3af5ec9c4ba1b/

Edited pages:

cockroach-teamcity · 2019-10-28T13:16:13Z

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/ad5bea83441b432e67a6e18257be376eb5b1804a/

Edited pages:

jseldess · 2019-11-04T03:10:58Z

@dt, @rolandcrosby, I'd really like to wrap this so. Can one of you please check the latest changes?

dt · 2019-11-04T23:01:37Z

v19.2/import-into.md

 - `IMPORT INTO` **cannot**:
    - Be used within a [transaction](transactions.html).
    - Replace a row.
    - Add a row that conflicts with a `UNIQUE` [index](indexes.html).
- Currently, `IMPORT INTO` does not work on indexed tables.
- `IMPORT` can sometimes fail with a "context canceled" error, or can restart itself many times without ever finishing. If this is happening, it is likely due to a high amount of disk contention. This can be mitigated by setting the `kv.bulk_io_write.max_rate` [cluster setting](cluster-settings.html) to a value below your max disk write speed. For example, to set it to 10MB/s, execute:
+    - Be used on [indexed](indexes.html) or [interleaved](interleave-in-parent.html) tables.


We removed the no-indexes limitation, though the "replace a row" limitation above includes unique secondary indexes. The limitation is "no imported row may conflict with an existing row (in the primary or any unique secondary index) or the import will be rejected."

dt · 2019-11-04T23:02:15Z

v19.2/import-into.md

@@ -184,23 +182,18 @@ Google Cloud:
 ## Known limitations


probably should add "the table's configured GC TTL must be longer than the IMPORT will take to complete"

dt · 2019-11-04T23:04:46Z

v19.2/experimental-features.md

@@ -125,6 +125,10 @@ This example uses the `users` table from our open-source, fictional peer-to-peer

 The [`SHOW RANGE ... FOR ROW`](show-range-for-row.html) statement shows information about a [range](architecture/overview.html#glossary) for a particular row of data. This information is useful for verifying how SQL data maps to underlying ranges, and where the replicas for a range are located.

+### Incremental import into an existig table
+
+The [`IMPORT INTO`](import-into.html) statement incrementally imports CSV data into an existing table.


I don't think I understand what "incrementally" means in this context.

rolandcrosby

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jseldess, @lhirata, and @rolandcrosby)

v19.2/import-into.md, line 14 at r5 (raw file):

{{site.data.alerts.callout_danger}}
**This is an [experimental feature](experimental-features.html) and should not be used in production due to correctness issues.** See [Known limitations](#known-limitations) below for more information.

We are not labeling IMPORT INTO as experimental for 19.2. The alpha releases were experimental and not ready for production use, but the version in the 19.2 final release is not experimental. There are some known limitations, which may justify calling it a beta feature for this release, but I'm not even sure that that's necessary.

v19.2/insert.md, line 147 at r1 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Reverted and added note, as mentioned above.

The known limitations don't preclude production use anymore. I would remove the second sentence.

v19.2/performance-best-practices-overview.md, line 35 at r1 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Reverted and added note, as mentioned above.

Same note as above re: known limitations.

v19.2/performance-tuning.md, line 969 at r5 (raw file):

{{site.data.alerts.callout_info}}
<span class="version-tag">New in v19.2:</span> You can also use the [`IMPORT INTO`](import-into.html) statement to bulk-insert CSV data into an existing table. However, note that this is an experimental feature and should not be used in production due to [known limitations](import-into.html#known-limitations).

Same note as above re: known limitations.

v19.2/performance-tuning-insecure.md, line 821 at r5 (raw file):

{{site.data.alerts.callout_info}}
<span class="version-tag">New in v19.2:</span> You can also use the [`IMPORT INTO`](import-into.html) statement to bulk-insert CSV data into an existing table. However, note that this is an experimental feature and should not be used in production due to [known limitations](import-into.html#known-limitations).

Same note as above re: known limitations.

v19.2/sql-faqs.md, line 13 at r5 (raw file):

    {{site.data.alerts.callout_info}}
    <span class="version-tag">New in v19.2:</span> You can also use the [`IMPORT INTO`](import-into.html) statement to bulk-insert CSV data into an existing table. However, note that this is an experimental feature and should not be used in production due to [known limitations](import-into.html#known-limitations).

Same note as above re: known limitations.

Fixes #5702. Fixes #5180.

jseldess

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @lhirata, and @rolandcrosby)

v19.2/experimental-features.md, line 130 at r5 (raw file):

Previously, dt (David Taylor) wrote…

I don't think I understand what "incrementally" means in this context.

Removing since this feature isn't actually experimental.

v19.2/import-into.md, line 14 at r5 (raw file):

Previously, rolandcrosby (Roland Crosby) wrote…

We are not labeling IMPORT INTO as experimental for 19.2. The alpha releases were experimental and not ready for production use, but the version in the 19.2 final release is not experimental. There are some known limitations, which may justify calling it a beta feature for this release, but I'm not even sure that that's necessary.

OK, removed this and refactored other aspects of this PR as well.

v19.2/import-into.md, line 191 at r5 (raw file):

Previously, dt (David Taylor) wrote…

We removed the no-indexes limitation, though the "replace a row" limitation above includes unique secondary indexes. The limitation is "no imported row may conflict with an existing row (in the primary or any unique secondary index) or the import will be rejected."

Revised. PTAL.

v19.2/insert.md, line 147 at r1 (raw file):

Previously, rolandcrosby (Roland Crosby) wrote…

The known limitations don't preclude production use anymore. I would remove the second sentence.

Done.

v19.2/performance-best-practices-overview.md, line 35 at r1 (raw file):

Previously, rolandcrosby (Roland Crosby) wrote…

Same note as above re: known limitations.

Done.

v19.2/performance-tuning.md, line 969 at r5 (raw file):

Previously, rolandcrosby (Roland Crosby) wrote…

Same note as above re: known limitations.

Done.

v19.2/performance-tuning-insecure.md, line 821 at r5 (raw file):

Previously, rolandcrosby (Roland Crosby) wrote…

Same note as above re: known limitations.

Done.

v19.2/sql-faqs.md, line 13 at r5 (raw file):

Previously, rolandcrosby (Roland Crosby) wrote…

Same note as above re: known limitations.

Done.

cockroach-teamcity · 2019-11-05T20:23:25Z

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/631662dae9b11b12c87453435601c9a9868b2ae2/

Edited pages:

lnhsingh requested a review from jseldess September 18, 2019 20:21

jseldess suggested changes Sep 20, 2019

View reviewed changes

lnhsingh commented Sep 22, 2019

View reviewed changes

lnhsingh commented Sep 23, 2019

View reviewed changes

dt reviewed Sep 23, 2019

View reviewed changes

lnhsingh commented Sep 30, 2019

View reviewed changes

jseldess force-pushed the import-into branch from 1b8cc3a to 048639d Compare October 28, 2019 01:37

jseldess reviewed Oct 28, 2019

View reviewed changes

jseldess force-pushed the import-into branch from 8d23c61 to 849510a Compare October 28, 2019 02:40

jseldess approved these changes Oct 28, 2019

View reviewed changes

jseldess force-pushed the import-into branch from 08ab16c to ad5bea8 Compare October 28, 2019 13:12

dt approved these changes Nov 4, 2019

View reviewed changes

rolandcrosby suggested changes Nov 5, 2019

View reviewed changes

Update IMPORT INTO docs

631662d

Fixes #5702. Fixes #5180.

jseldess force-pushed the import-into branch from ad5bea8 to 631662d Compare November 5, 2019 20:20

jseldess reviewed Nov 5, 2019

View reviewed changes

jseldess merged commit d621681 into master Nov 6, 2019

jseldess deleted the import-into branch November 6, 2019 20:01

Add references to IMPORT INTO in relevant docs #5451

Add references to IMPORT INTO in relevant docs #5451

Uh oh!

Conversation

lnhsingh commented Sep 18, 2019

Uh oh!

cockroach-teamcity commented Sep 18, 2019

Uh oh!

cockroach-teamcity commented Sep 18, 2019

Uh oh!

jseldess left a comment

Choose a reason for hiding this comment

Uh oh!

lnhsingh left a comment

Choose a reason for hiding this comment

Uh oh!

jseldess commented Sep 23, 2019

Uh oh!

lnhsingh left a comment

Choose a reason for hiding this comment

Uh oh!

dt left a comment

Choose a reason for hiding this comment

Uh oh!

dt left a comment

Choose a reason for hiding this comment

Uh oh!

lnhsingh left a comment

Choose a reason for hiding this comment

Uh oh!

lnhsingh commented Sep 30, 2019

Uh oh!

lnhsingh commented Oct 15, 2019

Uh oh!

cockroach-teamcity commented Oct 28, 2019

Uh oh!

jseldess left a comment

Choose a reason for hiding this comment

Uh oh!

cockroach-teamcity commented Oct 28, 2019

Uh oh!

cockroach-teamcity commented Oct 28, 2019

Uh oh!

jseldess commented Oct 28, 2019

Uh oh!

cockroach-teamcity commented Oct 28, 2019

Uh oh!

cockroach-teamcity commented Oct 28, 2019

Uh oh!

jseldess commented Nov 4, 2019

Uh oh!

dt Nov 4, 2019

Choose a reason for hiding this comment

Uh oh!

dt Nov 4, 2019

Choose a reason for hiding this comment

Uh oh!

dt Nov 4, 2019

Choose a reason for hiding this comment

Uh oh!

rolandcrosby left a comment

Choose a reason for hiding this comment

Uh oh!

jseldess left a comment

Choose a reason for hiding this comment

Uh oh!

cockroach-teamcity commented Nov 5, 2019

Uh oh!

Uh oh!