-
Notifications
You must be signed in to change notification settings - Fork 469
Oracle Migration Guide #5092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Oracle Migration Guide #5092
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @drewdeally, @glennfawcett, @lhirata, @robert-s-lee, and @rolandcrosby)
v19.2/migrate-from-oracle.md, line 8 at r2 (raw file):
--- <span class="version-tag">New in v19.2:</span> This page has instructions for migrating data from Oracle into CockroachDB by [importing](import.html) CSV files. Note that `IMPORT` only works for creating new tables.
as of 19.2 we'll have IMPORT INTO
as well
v19.2/migrate-from-oracle.md, line 54 at r2 (raw file):
SET VERIFY OFF SET ARRAYSIZE 10000 SET COLSEP '|'
what happens to literal |
characters in strings? is there a reason to prefer this over commas or tabs?
v19.2/migrate-from-oracle.md, line 58 at r2 (raw file):
SPOOL '&1' ALTER SESSION SET nls_date_format = 'DD-MON-YYYY HH24:MI:SS';
this seems to work with cockroachdb but YYYY-MM-DD is more common
v19.2/migrate-from-oracle.md, line 116 at r2 (raw file):
with open(sys.argv[1]) as f: reader = csv.reader(f, delimiter="|")
I don't think this will work correctly if there are extra |
characters in the row besides the delimiter -- and from what I can see, SQL*PLUS doesn't do anything to "escape" row separators when they occur in the row itself - in the second example I set row separators to the letter 'd', which occurs in the WAREHOUSE_NAME column of various rows, and you can see that there's no way to distinguish separators from data:
SQL> set colsep |
SQL> select * from warehouses where rownum <= 5;
WAREHOUSE_ID|WAREHOUSE_NAME |LOCATION_ID
------------|-----------------------------------|-----------
1|C4d8kmazi0Z44w8ltOsRM6bMGOoLH6 | 9209
2|M7g9q7udw | 6760
3|Fd7BuzN1DBLnyGH | 9971
4|MI9hY2Ik50Dw2DuxRV9PB5Hj | 9546
5|LkZnmIfS7y1pLv3HjKyDtMWnRTiKjs | 3447
SQL> set colsep d
SQL> select * from warehouses where rownum <= 5;
WAREHOUSE_IDdWAREHOUSE_NAME dLOCATION_ID
------------d-----------------------------------d-----------
1dC4d8kmazi0Z44w8ltOsRM6bMGOoLH6 d 9209
2dM7g9q7udw d 6760
3dFd7BuzN1DBLnyGH d 9971
4dMI9hY2Ik50Dw2DuxRV9PB5Hj d 9546
5dLkZnmIfS7y1pLv3HjKyDtMWnRTiKjs d 3447
So I think we need to do something else to split these into columns. I'm not sure which of those SET
statements above omits the header line in each file, but maybe we should keep it, and use it to determine the widths of the columns. We should also set trimspool to off, so that all lines are the same length. Then we could split each line at those widths, like this (I'm using struct.unpack
to read each line like a binary file, using a format string like '10sx5s', meaning '10 string bytes, 1 padding byte, 5 string bytes'):
import struct
with open(infile) as f:
line_one = f.readline().rstrip('\n')
struct_format = 'x'.join([str(len(f)) + 's' for f in line_one.split('|')])
with open(outfile, 'w') as fo:
writer = csv.writer(fo, delimiter='|')
for line in f:
writer.writerow(map(string.strip, struct.unpack(struct_format, line.rstrip('\n'))))
v19.2/migrate-from-oracle.md, line 143 at r2 (raw file):
You will need to export one CSV file per table, with the following requirements: - Files must be in [valid CSV format](https://tools.ietf.org/html/rfc4180), with the caveat that the delimiter must be a single character. To use a character other than comma (such as a tab), set a custom delimiter using the [`delimiter` option](import.html#delimiter).
as specified above I don't think we actually meet these requirements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @drewdeally, @glennfawcett, @lhirata, @robert-s-lee, and @rolandcrosby)
v19.2/migrate-from-oracle.md, line 8 at r2 (raw file):
Previously, rolandcrosby (Roland Crosby) wrote…
as of 19.2 we'll have
IMPORT INTO
as well
Wasn't sure if I should link to IMPORT INTO
yet since it's still experimental, but I guess it won't hurt.
v19.2/migrate-from-oracle.md, line 54 at r2 (raw file):
Previously, rolandcrosby (Roland Crosby) wrote…
what happens to literal
|
characters in strings? is there a reason to prefer this over commas or tabs?
Not sure. @drewdeally, do you know?
v19.2/migrate-from-oracle.md, line 58 at r2 (raw file):
Previously, rolandcrosby (Roland Crosby) wrote…
this seems to work with cockroachdb but YYYY-MM-DD is more common
Done.
v19.2/migrate-from-oracle.md, line 116 at r2 (raw file):
Previously, rolandcrosby (Roland Crosby) wrote…
I don't think this will work correctly if there are extra
|
characters in the row besides the delimiter -- and from what I can see, SQL*PLUS doesn't do anything to "escape" row separators when they occur in the row itself - in the second example I set row separators to the letter 'd', which occurs in the WAREHOUSE_NAME column of various rows, and you can see that there's no way to distinguish separators from data:SQL> set colsep | SQL> select * from warehouses where rownum <= 5; WAREHOUSE_ID|WAREHOUSE_NAME |LOCATION_ID ------------|-----------------------------------|----------- 1|C4d8kmazi0Z44w8ltOsRM6bMGOoLH6 | 9209 2|M7g9q7udw | 6760 3|Fd7BuzN1DBLnyGH | 9971 4|MI9hY2Ik50Dw2DuxRV9PB5Hj | 9546 5|LkZnmIfS7y1pLv3HjKyDtMWnRTiKjs | 3447 SQL> set colsep d SQL> select * from warehouses where rownum <= 5; WAREHOUSE_IDdWAREHOUSE_NAME dLOCATION_ID ------------d-----------------------------------d----------- 1dC4d8kmazi0Z44w8ltOsRM6bMGOoLH6 d 9209 2dM7g9q7udw d 6760 3dFd7BuzN1DBLnyGH d 9971 4dMI9hY2Ik50Dw2DuxRV9PB5Hj d 9546 5dLkZnmIfS7y1pLv3HjKyDtMWnRTiKjs d 3447
So I think we need to do something else to split these into columns. I'm not sure which of those
SET
statements above omits the header line in each file, but maybe we should keep it, and use it to determine the widths of the columns. We should also set trimspool to off, so that all lines are the same length. Then we could split each line at those widths, like this (I'm usingstruct.unpack
to read each line like a binary file, using a format string like '10sx5s', meaning '10 string bytes, 1 padding byte, 5 string bytes'):import struct with open(infile) as f: line_one = f.readline().rstrip('\n') struct_format = 'x'.join([str(len(f)) + 's' for f in line_one.split('|')]) with open(outfile, 'w') as fo: writer = csv.writer(fo, delimiter='|') for line in f: writer.writerow(map(string.strip, struct.unpack(struct_format, line.rstrip('\n'))))
@rolandcrosby / @drewdeally can one of you help me rewrite the script to implement Roland's suggested changes, since I don't know how to edit it?
The choice is up to user who will know best on the delim
…Sent from my iPhone
On Aug 7, 2019, at 1:27 PM, lhirata ***@***.***> wrote:
@lhirata commented on this pull request.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @drewdeally, @glennfawcett, @lhirata, @robert-s-lee, and @rolandcrosby)
v19.2/migrate-from-oracle.md, line 8 at r2 (raw file):
Previously, rolandcrosby (Roland Crosby) wrote…
Wasn't sure if I should link to IMPORT INTO yet since it's still experimental, but I guess it won't hurt.
v19.2/migrate-from-oracle.md, line 54 at r2 (raw file):
Previously, rolandcrosby (Roland Crosby) wrote…
Not sure. @drewdeally, do you know?
v19.2/migrate-from-oracle.md, line 58 at r2 (raw file):
Previously, rolandcrosby (Roland Crosby) wrote…
Done.
v19.2/migrate-from-oracle.md, line 116 at r2 (raw file):
Previously, rolandcrosby (Roland Crosby) wrote…
@rolandcrosby / @drewdeally can one of you help me rewrite the script to implement Roland's suggested changes, since I don't know how to edit it?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@Amruta-Ranade / @jseldess - can you please review when you have a chance? FYI, @rolandcrosby will be helping me with the delimiter stuff. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only did a copy-check. Didn't go through the code yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Amruta-Ranade, @drewdeally, @glennfawcett, @jseldess, @lhirata, @robert-s-lee, and @rolandcrosby)
v19.2/migrate-from-oracle.md, line 148 at r7 (raw file):
Previously, Amruta-Ranade (Amruta Ranade) wrote…
nit: Should "double quote" be "double-quote"? If yes, change throughout the doc.
No, I think it's "double quote". Keeping as is!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still some comments from others to resolve, and I have a few suggestions, but this is looking good, @lhirata!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Amruta-Ranade, @drewdeally, @glennfawcett, @jseldess, @lhirata, @robert-s-lee, and @rolandcrosby)
v19.2/migrate-from-oracle.md, line 10 at r7 (raw file):
<span class="version-tag">New in v19.2:</span> This page has instructions for migrating data from Oracle into CockroachDB by [importing](import.html) CSV files. Note that `IMPORT` only works for creating new tables. For information on how to add CSV data to existing tables, see [`IMPORT INTO`](import-into.html). The general steps for migrating from Oracle into CockroachDB are as follows:
This list of the steps isn't adding value, as they are already visible in the sidenav. I'd either describe the stages of the process in prose or leave this out.
v19.2/migrate-from-oracle.md, line 21 at r7 (raw file):
I'd change this a bit:
To illustrate this process, we use the following sample data and tools:
v19.2/migrate-from-oracle.md, line 29 at r7 (raw file):
## Step 1. Export the Oracle schema Using [Data Pump Export](https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sutil/oracle-data-pump-export-utility.html), export the schema:
To clarify that this tool is external, I'd change Data Pump Export
to Oracle's Data Pump Export utility
.
v19.2/migrate-from-oracle.md, line 40 at r7 (raw file):
## Step 2. Convert the Oracle schema to SQL Using [Data Pump Import](https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sutil/datapump-import-utility.html), load the DMP file you exported in [Step 1](#step-1-export-the-oracle-schema) and convert it to a SQL file:
Same comment as above.
Also feels unnecessary to reference the step directly before. Maybe just: load the exported DMP file to convert it to a SQL file.
v19.2/migrate-from-oracle.md, line 51 at r7 (raw file):
## Step 3. Export table data You need to extract each table's data into a data list file (`.lst`). We wrote a simple SQL script(`spool.sql`) to do this:
Add space between script
and the following parenthetical.
v19.2/migrate-from-oracle.md, line 321 at r7 (raw file):
Repeat the above for each CSV file you want to import. <!--
I'm in favor of leaving this in, even if it is high-level and similar to what's in a blog. We may want to make this a completely separate page in the future, but for now, it rounds out the migration guidance well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Amruta-Ranade, @drewdeally, @glennfawcett, @jseldess, @lhirata, @robert-s-lee, and @rolandcrosby)
v19.2/migrate-from-oracle.md, line 8 at r2 (raw file):
Previously, lhirata wrote…
Wasn't sure if I should link to
IMPORT INTO
yet since it's still experimental, but I guess it won't hurt.
Done.
v19.2/migrate-from-oracle.md, line 10 at r7 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
This list of the steps isn't adding value, as they are already visible in the sidenav. I'd either describe the stages of the process in prose or leave this out.
Done.
v19.2/migrate-from-oracle.md, line 21 at r7 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
I'd change this a bit:
To illustrate this process, we use the following sample data and tools:
Done.
v19.2/migrate-from-oracle.md, line 29 at r7 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
To clarify that this tool is external, I'd change
Data Pump Export
toOracle's Data Pump Export utility
.
Done.
v19.2/migrate-from-oracle.md, line 40 at r7 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
Same comment as above.
Also feels unnecessary to reference the step directly before. Maybe just:
load the exported DMP file to convert it to a SQL file.
Done.
v19.2/migrate-from-oracle.md, line 51 at r7 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
Add space between
script
and the following parenthetical.
Done.
v19.2/migrate-from-oracle.md, line 321 at r7 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
I'm in favor of leaving this in, even if it is high-level and similar to what's in a blog. We may want to make this a completely separate page in the future, but for now, it rounds out the migration guidance well.
Added back in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 4 files at r1, 1 of 1 files at r8.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @Amruta-Ranade, @drewdeally, @glennfawcett, @jseldess, @lhirata, @robert-s-lee, and @rolandcrosby)
@jseldess / @rolandcrosby can you take a final look? thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, one you get your remaining answers/help from Roland and Drew.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @drewdeally, @glennfawcett, @lhirata, @robert-s-lee, and @rolandcrosby)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks pretty good, I found a couple things we probably want to clarify/hedge a little more but otherwise I'm happy with this.
Reviewed 1 of 1 files at r9.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @drewdeally, @glennfawcett, @lhirata, and @robert-s-lee)
v19.2/migrate-from-oracle.md, line 116 at r2 (raw file):
Previously, lhirata wrote…
@rolandcrosby / @drewdeally can one of you help me rewrite the script to implement Roland's suggested changes, since I don't know how to edit it?
Disregard this for now, we will just say to use the workaround.
v19.2/migrate-from-oracle.md, line 123 at r9 (raw file):
reader = csv.reader(f, delimiter="|") with open(filename+".csv", "w") as fo: writer = csv.writer(fo, delimiter="|")
I don't think we need to specify a delimiter when rewriting the file - Python's CSV writer should properly escape delimiters, so comma is fine.
v19.2/migrate-from-oracle.md, line 218 at r9 (raw file):
`XML` | [`JSON`](jsonb.html) [<sup>2</sup>](#considerations) <a name="considerations"></a>
Have we verified that the Oracle text representation for each of these is importable into CockroachDB? Obviously XML needs to be converted but I'm not sure about others; we should maybe highlight which of these will definitely need the data to be converted pre-import in order to be imported into CockroachDB.
v19.2/migrate-from-oracle.md, line 287 at r9 (raw file):
) CSV DATA ( 'https://cr-test.s3.us-east-2.amazonaws.com/CUSTOMERS.csv.gz'
is this going to remain accessible? if not, can we use a standard bucket where we share other docs resources?
v19.2/migration-overview.md, line 12 at r9 (raw file):
- MySQL - Oracle
I'm not sure if we can really say this here but your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @drewdeally, @glennfawcett, @lhirata, @robert-s-lee, and @rolandcrosby)
v19.2/migrate-from-oracle.md, line 116 at r2 (raw file):
Previously, rolandcrosby (Roland Crosby) wrote…
Disregard this for now, we will just say to use the workaround.
Done.
v19.2/migrate-from-oracle.md, line 143 at r2 (raw file):
Previously, rolandcrosby (Roland Crosby) wrote…
as specified above I don't think we actually meet these requirements
Removed the info about the custom delimiter
v19.2/migrate-from-oracle.md, line 123 at r9 (raw file):
Previously, rolandcrosby (Roland Crosby) wrote…
I don't think we need to specify a delimiter when rewriting the file - Python's CSV writer should properly escape delimiters, so comma is fine.
Done.
v19.2/migrate-from-oracle.md, line 287 at r9 (raw file):
Previously, rolandcrosby (Roland Crosby) wrote…
is this going to remain accessible? if not, can we use a standard bucket where we share other docs resources?
Ah, good catch. I was actually going to put placeholders since this is an example and we can assume that the user is going to have their own bucket
v19.2/migration-overview.md, line 12 at r9 (raw file):
Previously, rolandcrosby (Roland Crosby) wrote…
I'm not sure if we can really say this here but your call.
Maybe adding "Oracle (using CSV)" to clarify?
Follow up re: data type mapping: #5236 |
01bff11
to
ca38796
Compare
367cfe7
to
5b619ab
Compare
Closes #3539.