-
Notifications
You must be signed in to change notification settings - Fork 469
Add references to IMPORT INTO in relevant docs #5451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/1b8cc3a3754d8a3156b6349b87704b815e455827/ Edited pages: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @jseldess and @lhirata)
v19.2/insert.md, line 12 at r1 (raw file):
## Performance best practices - <span class="version-tag">New in v19.2:</span> To bulk-insert data into an existing table, use the [`IMPORT INTO`](import-into.html) statement.
I don't think we should equate "inserting multiple rows" with "bulk inserting". If you're inserting multiple rows at once, this is still more performance. If you're inserting "bulk", import into is better. Can you chat with Roland or the devs on where we should draw the threshold, or how to know which to use?
v19.2/insert.md, line 147 at r1 (raw file):
### Insert multiple rows into an existing table <span class="version-tag">New in v19.2:</span> To bulk-insert data into an existing table, use the [`IMPORT INTO`](import-into.html) statement.
See above.
v19.2/performance-best-practices-overview.md, line 35 at r1 (raw file):
### Use `IMPORT INTO` instead of `INSERT` for bulk inserts into existing tables <span class="version-tag">New in v19.2:</span> To bulk-insert data into an existing table, use the [`IMPORT INTO`](import-into.html) statement.
I suspect this is the right guidance here, since we're talking "bulk", but I also suspect we'll still want to guide people when to use multi-row inserts. Same question as above, really.
v19.2/performance-tuning.md, line 913 at r1 (raw file):
{{site.data.alerts.callout_info}} <span class="version-tag">New in v19.2:</span> To import bulk data from one or more CSV files into an existing table, use the [`IMPORT INTO`](import-into.html) statement.
It's hard to see how this fits with what comes after. Do you want this to be a tip that insert into is the most performance for bulk inserting into an existing table when you can format the data as csvs?
v19.2/sql-faqs.md, line 12 at r1 (raw file):
- To bulk-insert data into a brand new table, the [`IMPORT`](import.html) statement. - <span class="version-tag">New in v19.2:</span> To bulk-insert data into an existing table, use the [`IMPORT INTO`](import-into.html) statement.
Depending on what you find, you might still want to mention multi-row inserts as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @lhirata and @rolandcrosby)
v19.2/insert.md, line 12 at r1 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
I don't think we should equate "inserting multiple rows" with "bulk inserting". If you're inserting multiple rows at once, this is still more performance. If you're inserting "bulk", import into is better. Can you chat with Roland or the devs on where we should draw the threshold, or how to know which to use?
When I talked to @rolandcrosby, he mentioned that it's kind of a grey area and maybe we could lay out that new in 19.2, you can import from CSV into an existing table and let the user decide what to do. Do you think specifying here (and where you commented elsewhere) "To insert bulk data from a CSV file into an existing table, use IMPORT INTO
" differentiates itself enough from "inserting multiple rows"?
v19.2/insert.md, line 12 at r1 (raw file): Previously, lhirata wrote…
Can you look into this more? I'd like us to understand when to tell people to use multi-row inserts vs import into. I suspect multi-row inserts will be for smaller batches, where you get better performance from a single batch ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @lhirata and @rolandcrosby)
v19.2/import-into.md, line 200 at r1 (raw file):
- The table will be taken offline during the incremental import. Known issues that will be addressed in upcoming alpha releases:
@dt, are these known issues resolved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @lhirata and @rolandcrosby)
v19.2/import-into.md, line 200 at r1 (raw file):
Previously, lhirata wrote…
@dt, are these known issues resolved?
yes. Additional known limitation: the gc ttl on the table must allow enough time for the import to complete. If it doesn't, there is a risk of the table becoming permanently unavailable if the import encounters any issue causing it to fail if it has run longer than theTTL, as at that point the data needed to rollback to before it started may have been GC'ed and the table may be irrecoverable / need to be restored from backup.
@rolandcrosby Best practice is to have regular backups anyway, but given that IMPORT INTO a new feature with a thus elevated potential for bugs plus a known table-loss edge case above, do we need to add any special advice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @lhirata and @rolandcrosby)
v19.2/import-into.md, line 200 at r1 (raw file):
Previously, dt (David Taylor) wrote…
yes. Additional known limitation: the gc ttl on the table must allow enough time for the import to complete. If it doesn't, there is a risk of the table becoming permanently unavailable if the import encounters any issue causing it to fail if it has run longer than theTTL, as at that point the data needed to rollback to before it started may have been GC'ed and the table may be irrecoverable / need to be restored from backup.
@rolandcrosby Best practice is to have regular backups anyway, but given that IMPORT INTO a new feature with a thus elevated potential for bugs plus a known table-loss edge case above, do we need to add any special advice?
I'd also move the - The table will be taken offline during the incremental import.
to the top and/or also give that a call-out box as it is potentially a production-disrupting detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @lhirata and @rolandcrosby)
v19.2/import-into.md, line 200 at r1 (raw file):
Previously, dt (David Taylor) wrote…
I'd also move the
- The table will be taken offline during the incremental import.
to the top and/or also give that a call-out box as it is potentially a production-disrupting detail.
Removed alpha release related limitations / added new limitation callout.
Note: SQL diagram also needs to be added. The SVG needs to be generated once cockroachdb/cockroach#41214 is merged. |
Might want to also add to Known Limitations that you can't use |
Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/048639d3e26a93b7578d5999727b4e5fee85825f/ Edited pages: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @lhirata and @rolandcrosby)
v19.2/import-into.md, line 200 at r1 (raw file):
Previously, lhirata wrote…
Removed alpha release related limitations / added new limitation callout.
Updated this list. @dt, does it look accurate and complete?
v19.2/insert.md, line 12 at r1 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
Can you look into this more? I'd like us to understand when to tell people to use multi-row inserts vs import into. I suspect multi-row inserts will be for smaller batches, where you get better performance from a single batch (
insert into xxx values (a, 1), (b, 2), (c, 3);
) than from separate inserts (insert into xxx values (a, 1);
,insert into xxx values (b, 2);
,insert into xxx values (c, 3);
. You should ask some engineers like David Taylor and probably Ben Darnell.
After reading the warning at the top of the incremental import page about the feature not being ready for production use, I've changed my mind here. I've reverted to what we had and added a note about the new feature, but with a warning.
v19.2/insert.md, line 147 at r1 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
See above.
Reverted and added note, as mentioned above.
v19.2/performance-best-practices-overview.md, line 35 at r1 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
I suspect this is the right guidance here, since we're talking "bulk", but I also suspect we'll still want to guide people when to use multi-row inserts. Same question as above, really.
Reverted and added note, as mentioned above.
v19.2/performance-tuning.md, line 913 at r1 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
It's hard to see how this fits with what comes after. Do you want this to be a tip that insert into is the most performance for bulk inserting into an existing table when you can format the data as csvs?
Removed and added qualified note at end of section.
v19.2/performance-tuning-insecure.md, line 763 at r3 (raw file):
{{site.data.alerts.callout_info}} <span class="version-tag">New in v19.2:</span> To import bulk data from one or more CSV files into an existing table, use the [`IMPORT INTO`](import-into.html) statement.
Removed and added qualified note at end of section.
v19.2/sql-faqs.md, line 12 at r1 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
Depending on what you find, you might still want to mention multi-row inserts as well.
Reworked to still emphasize multi-row inserts but with a note about import into.
Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/8d23c61ed9818fcc7f5cc294d0a08cfc4fb0f1e9/ Edited pages: |
Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/849510a26d18182db3faf95d8bdb8145364d93d8/ Edited pages: |
@dt and @rolandcrosby, PTAL one more time. |
Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/08ab16c9c5f92714f751c3e42aa3af5ec9c4ba1b/ Edited pages: |
Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/ad5bea83441b432e67a6e18257be376eb5b1804a/ Edited pages: |
@dt, @rolandcrosby, I'd really like to wrap this so. Can one of you please check the latest changes? |
v19.2/import-into.md
Outdated
- `IMPORT INTO` **cannot**: | ||
- Be used within a [transaction](transactions.html). | ||
- Replace a row. | ||
- Add a row that conflicts with a `UNIQUE` [index](indexes.html). | ||
- Currently, `IMPORT INTO` does not work on indexed tables. | ||
- `IMPORT` can sometimes fail with a "context canceled" error, or can restart itself many times without ever finishing. If this is happening, it is likely due to a high amount of disk contention. This can be mitigated by setting the `kv.bulk_io_write.max_rate` [cluster setting](cluster-settings.html) to a value below your max disk write speed. For example, to set it to 10MB/s, execute: | ||
- Be used on [indexed](indexes.html) or [interleaved](interleave-in-parent.html) tables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We removed the no-indexes limitation, though the "replace a row" limitation above includes unique secondary indexes. The limitation is "no imported row may conflict with an existing row (in the primary or any unique secondary index) or the import will be rejected."
@@ -184,23 +182,18 @@ Google Cloud: | |||
## Known limitations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably should add "the table's configured GC TTL must be longer than the IMPORT will take to complete"
v19.2/experimental-features.md
Outdated
@@ -125,6 +125,10 @@ This example uses the `users` table from our open-source, fictional peer-to-peer | |||
|
|||
The [`SHOW RANGE ... FOR ROW`](show-range-for-row.html) statement shows information about a [range](architecture/overview.html#glossary) for a particular row of data. This information is useful for verifying how SQL data maps to underlying ranges, and where the replicas for a range are located. | |||
|
|||
### Incremental import into an existig table | |||
|
|||
The [`IMPORT INTO`](import-into.html) statement incrementally imports CSV data into an existing table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand what "incrementally" means in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @jseldess, @lhirata, and @rolandcrosby)
v19.2/import-into.md, line 14 at r5 (raw file):
{{site.data.alerts.callout_danger}} **This is an [experimental feature](experimental-features.html) and should not be used in production due to correctness issues.** See [Known limitations](#known-limitations) below for more information.
We are not labeling IMPORT INTO as experimental for 19.2. The alpha releases were experimental and not ready for production use, but the version in the 19.2 final release is not experimental. There are some known limitations, which may justify calling it a beta feature for this release, but I'm not even sure that that's necessary.
v19.2/insert.md, line 147 at r1 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
Reverted and added note, as mentioned above.
The known limitations don't preclude production use anymore. I would remove the second sentence.
v19.2/performance-best-practices-overview.md, line 35 at r1 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
Reverted and added note, as mentioned above.
Same note as above re: known limitations.
v19.2/performance-tuning.md, line 969 at r5 (raw file):
{{site.data.alerts.callout_info}} <span class="version-tag">New in v19.2:</span> You can also use the [`IMPORT INTO`](import-into.html) statement to bulk-insert CSV data into an existing table. However, note that this is an experimental feature and should not be used in production due to [known limitations](import-into.html#known-limitations).
Same note as above re: known limitations.
v19.2/performance-tuning-insecure.md, line 821 at r5 (raw file):
{{site.data.alerts.callout_info}} <span class="version-tag">New in v19.2:</span> You can also use the [`IMPORT INTO`](import-into.html) statement to bulk-insert CSV data into an existing table. However, note that this is an experimental feature and should not be used in production due to [known limitations](import-into.html#known-limitations).
Same note as above re: known limitations.
v19.2/sql-faqs.md, line 13 at r5 (raw file):
{{site.data.alerts.callout_info}} <span class="version-tag">New in v19.2:</span> You can also use the [`IMPORT INTO`](import-into.html) statement to bulk-insert CSV data into an existing table. However, note that this is an experimental feature and should not be used in production due to [known limitations](import-into.html#known-limitations).
Same note as above re: known limitations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dt, @lhirata, and @rolandcrosby)
v19.2/experimental-features.md, line 130 at r5 (raw file):
Previously, dt (David Taylor) wrote…
I don't think I understand what "incrementally" means in this context.
Removing since this feature isn't actually experimental.
v19.2/import-into.md, line 14 at r5 (raw file):
Previously, rolandcrosby (Roland Crosby) wrote…
We are not labeling IMPORT INTO as experimental for 19.2. The alpha releases were experimental and not ready for production use, but the version in the 19.2 final release is not experimental. There are some known limitations, which may justify calling it a beta feature for this release, but I'm not even sure that that's necessary.
OK, removed this and refactored other aspects of this PR as well.
v19.2/import-into.md, line 191 at r5 (raw file):
Previously, dt (David Taylor) wrote…
We removed the no-indexes limitation, though the "replace a row" limitation above includes unique secondary indexes. The limitation is "no imported row may conflict with an existing row (in the primary or any unique secondary index) or the import will be rejected."
Revised. PTAL.
v19.2/insert.md, line 147 at r1 (raw file):
Previously, rolandcrosby (Roland Crosby) wrote…
The known limitations don't preclude production use anymore. I would remove the second sentence.
Done.
v19.2/performance-best-practices-overview.md, line 35 at r1 (raw file):
Previously, rolandcrosby (Roland Crosby) wrote…
Same note as above re: known limitations.
Done.
v19.2/performance-tuning.md, line 969 at r5 (raw file):
Previously, rolandcrosby (Roland Crosby) wrote…
Same note as above re: known limitations.
Done.
v19.2/performance-tuning-insecure.md, line 821 at r5 (raw file):
Previously, rolandcrosby (Roland Crosby) wrote…
Same note as above re: known limitations.
Done.
v19.2/sql-faqs.md, line 13 at r5 (raw file):
Previously, rolandcrosby (Roland Crosby) wrote…
Same note as above re: known limitations.
Done.
Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/631662dae9b11b12c87453435601c9a9868b2ae2/ Edited pages: |
Closes #5180.