Skip to content

Oracle Migration Guide #5092

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 16, 2019
Merged

Oracle Migration Guide #5092

merged 1 commit into from
Aug 16, 2019

Conversation

lnhsingh
Copy link
Contributor

Closes #3539.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@cockroach-teamcity
Copy link
Member

@drewdeally drewdeally requested a review from robert-s-lee July 29, 2019 09:13
@lnhsingh lnhsingh requested a review from rolandcrosby July 29, 2019 19:16
@cockroach-teamcity
Copy link
Member

@cockroach-teamcity
Copy link
Member

Copy link

@rolandcrosby rolandcrosby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @drewdeally, @glennfawcett, @lhirata, @robert-s-lee, and @rolandcrosby)


v19.2/migrate-from-oracle.md, line 8 at r2 (raw file):

---

<span class="version-tag">New in v19.2:</span> This page has instructions for migrating data from Oracle into CockroachDB by [importing](import.html) CSV files. Note that `IMPORT` only works for creating new tables.

as of 19.2 we'll have IMPORT INTO as well


v19.2/migrate-from-oracle.md, line 54 at r2 (raw file):

SET VERIFY OFF
SET ARRAYSIZE 10000
SET COLSEP '|'

what happens to literal | characters in strings? is there a reason to prefer this over commas or tabs?


v19.2/migrate-from-oracle.md, line 58 at r2 (raw file):

SPOOL '&1'

ALTER SESSION SET nls_date_format = 'DD-MON-YYYY HH24:MI:SS';

this seems to work with cockroachdb but YYYY-MM-DD is more common


v19.2/migrate-from-oracle.md, line 116 at r2 (raw file):

  with open(sys.argv[1]) as f:
    reader = csv.reader(f, delimiter="|")

I don't think this will work correctly if there are extra | characters in the row besides the delimiter -- and from what I can see, SQL*PLUS doesn't do anything to "escape" row separators when they occur in the row itself - in the second example I set row separators to the letter 'd', which occurs in the WAREHOUSE_NAME column of various rows, and you can see that there's no way to distinguish separators from data:

SQL> set colsep |
SQL> select * from warehouses where rownum <= 5;

WAREHOUSE_ID|WAREHOUSE_NAME			|LOCATION_ID
------------|-----------------------------------|-----------
	   1|C4d8kmazi0Z44w8ltOsRM6bMGOoLH6	|	9209
	   2|M7g9q7udw				|	6760
	   3|Fd7BuzN1DBLnyGH			|	9971
	   4|MI9hY2Ik50Dw2DuxRV9PB5Hj		|	9546
	   5|LkZnmIfS7y1pLv3HjKyDtMWnRTiKjs	|	3447

SQL> set colsep d
SQL> select * from warehouses where rownum <= 5;

WAREHOUSE_IDdWAREHOUSE_NAME			dLOCATION_ID
------------d-----------------------------------d-----------
	   1dC4d8kmazi0Z44w8ltOsRM6bMGOoLH6	d	9209
	   2dM7g9q7udw				d	6760
	   3dFd7BuzN1DBLnyGH			d	9971
	   4dMI9hY2Ik50Dw2DuxRV9PB5Hj		d	9546
	   5dLkZnmIfS7y1pLv3HjKyDtMWnRTiKjs	d	3447

So I think we need to do something else to split these into columns. I'm not sure which of those SET statements above omits the header line in each file, but maybe we should keep it, and use it to determine the widths of the columns. We should also set trimspool to off, so that all lines are the same length. Then we could split each line at those widths, like this (I'm using struct.unpack to read each line like a binary file, using a format string like '10sx5s', meaning '10 string bytes, 1 padding byte, 5 string bytes'):

import struct

with open(infile) as f:
    line_one = f.readline().rstrip('\n')
    struct_format = 'x'.join([str(len(f)) + 's' for f in line_one.split('|')])
    with open(outfile, 'w') as fo:
        writer = csv.writer(fo, delimiter='|')
        for line in f:
            writer.writerow(map(string.strip, struct.unpack(struct_format, line.rstrip('\n'))))

v19.2/migrate-from-oracle.md, line 143 at r2 (raw file):

You will need to export one CSV file per table, with the following requirements:

- Files must be in [valid CSV format](https://tools.ietf.org/html/rfc4180), with the caveat that the delimiter must be a single character.  To use a character other than comma (such as a tab), set a custom delimiter using the [`delimiter` option](import.html#delimiter).

as specified above I don't think we actually meet these requirements

Copy link
Contributor Author

@lnhsingh lnhsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @drewdeally, @glennfawcett, @lhirata, @robert-s-lee, and @rolandcrosby)


v19.2/migrate-from-oracle.md, line 8 at r2 (raw file):

Previously, rolandcrosby (Roland Crosby) wrote…

as of 19.2 we'll have IMPORT INTO as well

Wasn't sure if I should link to IMPORT INTO yet since it's still experimental, but I guess it won't hurt.


v19.2/migrate-from-oracle.md, line 54 at r2 (raw file):

Previously, rolandcrosby (Roland Crosby) wrote…

what happens to literal | characters in strings? is there a reason to prefer this over commas or tabs?

Not sure. @drewdeally, do you know?


v19.2/migrate-from-oracle.md, line 58 at r2 (raw file):

Previously, rolandcrosby (Roland Crosby) wrote…

this seems to work with cockroachdb but YYYY-MM-DD is more common

Done.


v19.2/migrate-from-oracle.md, line 116 at r2 (raw file):

Previously, rolandcrosby (Roland Crosby) wrote…

I don't think this will work correctly if there are extra | characters in the row besides the delimiter -- and from what I can see, SQL*PLUS doesn't do anything to "escape" row separators when they occur in the row itself - in the second example I set row separators to the letter 'd', which occurs in the WAREHOUSE_NAME column of various rows, and you can see that there's no way to distinguish separators from data:

SQL> set colsep |
SQL> select * from warehouses where rownum <= 5;

WAREHOUSE_ID|WAREHOUSE_NAME			|LOCATION_ID
------------|-----------------------------------|-----------
	   1|C4d8kmazi0Z44w8ltOsRM6bMGOoLH6	|	9209
	   2|M7g9q7udw				|	6760
	   3|Fd7BuzN1DBLnyGH			|	9971
	   4|MI9hY2Ik50Dw2DuxRV9PB5Hj		|	9546
	   5|LkZnmIfS7y1pLv3HjKyDtMWnRTiKjs	|	3447

SQL> set colsep d
SQL> select * from warehouses where rownum <= 5;

WAREHOUSE_IDdWAREHOUSE_NAME			dLOCATION_ID
------------d-----------------------------------d-----------
	   1dC4d8kmazi0Z44w8ltOsRM6bMGOoLH6	d	9209
	   2dM7g9q7udw				d	6760
	   3dFd7BuzN1DBLnyGH			d	9971
	   4dMI9hY2Ik50Dw2DuxRV9PB5Hj		d	9546
	   5dLkZnmIfS7y1pLv3HjKyDtMWnRTiKjs	d	3447

So I think we need to do something else to split these into columns. I'm not sure which of those SET statements above omits the header line in each file, but maybe we should keep it, and use it to determine the widths of the columns. We should also set trimspool to off, so that all lines are the same length. Then we could split each line at those widths, like this (I'm using struct.unpack to read each line like a binary file, using a format string like '10sx5s', meaning '10 string bytes, 1 padding byte, 5 string bytes'):

import struct

with open(infile) as f:
    line_one = f.readline().rstrip('\n')
    struct_format = 'x'.join([str(len(f)) + 's' for f in line_one.split('|')])
    with open(outfile, 'w') as fo:
        writer = csv.writer(fo, delimiter='|')
        for line in f:
            writer.writerow(map(string.strip, struct.unpack(struct_format, line.rstrip('\n'))))

@rolandcrosby / @drewdeally can one of you help me rewrite the script to implement Roland's suggested changes, since I don't know how to edit it?

@drewdeally
Copy link

drewdeally commented Aug 7, 2019 via email

@cockroach-teamcity
Copy link
Member

@cockroach-teamcity
Copy link
Member

@lnhsingh
Copy link
Contributor Author

lnhsingh commented Aug 8, 2019

@drewdeally
Copy link

drewdeally commented Aug 9, 2019 via email

@lnhsingh
Copy link
Contributor Author

@Amruta-Ranade / @jseldess - can you please review when you have a chance? FYI, @rolandcrosby will be helping me with the delimiter stuff.

@cockroach-teamcity
Copy link
Member

@cockroach-teamcity
Copy link
Member

Copy link
Contributor

@Amruta-Ranade Amruta-Ranade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only did a copy-check. Didn't go through the code yet.

Copy link
Contributor Author

@lnhsingh lnhsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Amruta-Ranade, @drewdeally, @glennfawcett, @jseldess, @lhirata, @robert-s-lee, and @rolandcrosby)


v19.2/migrate-from-oracle.md, line 148 at r7 (raw file):

Previously, Amruta-Ranade (Amruta Ranade) wrote…

nit: Should "double quote" be "double-quote"? If yes, change throughout the doc.

No, I think it's "double quote". Keeping as is!

Copy link
Contributor

@jseldess jseldess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still some comments from others to resolve, and I have a few suggestions, but this is looking good, @lhirata!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Amruta-Ranade, @drewdeally, @glennfawcett, @jseldess, @lhirata, @robert-s-lee, and @rolandcrosby)


v19.2/migrate-from-oracle.md, line 10 at r7 (raw file):

<span class="version-tag">New in v19.2:</span> This page has instructions for migrating data from Oracle into CockroachDB by [importing](import.html) CSV files. Note that `IMPORT` only works for creating new tables. For information on how to add CSV data to existing tables, see [`IMPORT INTO`](import-into.html).

The general steps for migrating from Oracle into CockroachDB are as follows:

This list of the steps isn't adding value, as they are already visible in the sidenav. I'd either describe the stages of the process in prose or leave this out.


v19.2/migrate-from-oracle.md, line 21 at r7 (raw file):
I'd change this a bit:

To illustrate this process, we use the following sample data and tools:


v19.2/migrate-from-oracle.md, line 29 at r7 (raw file):

## Step 1. Export the Oracle schema

Using [Data Pump Export](https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sutil/oracle-data-pump-export-utility.html), export the schema:

To clarify that this tool is external, I'd change Data Pump Export to Oracle's Data Pump Export utility.


v19.2/migrate-from-oracle.md, line 40 at r7 (raw file):

## Step 2. Convert the Oracle schema to SQL

Using [Data Pump Import](https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sutil/datapump-import-utility.html), load the DMP file you exported in [Step 1](#step-1-export-the-oracle-schema) and convert it to a SQL file:

Same comment as above.

Also feels unnecessary to reference the step directly before. Maybe just: load the exported DMP file to convert it to a SQL file.


v19.2/migrate-from-oracle.md, line 51 at r7 (raw file):

## Step 3. Export table data

You need to extract each table's data into a data list file (`.lst`). We wrote a simple SQL script(`spool.sql`) to do this:

Add space between script and the following parenthetical.


v19.2/migrate-from-oracle.md, line 321 at r7 (raw file):

Repeat the above for each CSV file you want to import.

<!--

I'm in favor of leaving this in, even if it is high-level and similar to what's in a blog. We may want to make this a completely separate page in the future, but for now, it rounds out the migration guidance well.

Copy link
Contributor Author

@lnhsingh lnhsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Amruta-Ranade, @drewdeally, @glennfawcett, @jseldess, @lhirata, @robert-s-lee, and @rolandcrosby)


v19.2/migrate-from-oracle.md, line 8 at r2 (raw file):

Previously, lhirata wrote…

Wasn't sure if I should link to IMPORT INTO yet since it's still experimental, but I guess it won't hurt.

Done.


v19.2/migrate-from-oracle.md, line 10 at r7 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

This list of the steps isn't adding value, as they are already visible in the sidenav. I'd either describe the stages of the process in prose or leave this out.

Done.


v19.2/migrate-from-oracle.md, line 21 at r7 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

I'd change this a bit:

To illustrate this process, we use the following sample data and tools:

Done.


v19.2/migrate-from-oracle.md, line 29 at r7 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

To clarify that this tool is external, I'd change Data Pump Export to Oracle's Data Pump Export utility.

Done.


v19.2/migrate-from-oracle.md, line 40 at r7 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Same comment as above.

Also feels unnecessary to reference the step directly before. Maybe just: load the exported DMP file to convert it to a SQL file.

Done.


v19.2/migrate-from-oracle.md, line 51 at r7 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Add space between script and the following parenthetical.

Done.


v19.2/migrate-from-oracle.md, line 321 at r7 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

I'm in favor of leaving this in, even if it is high-level and similar to what's in a blog. We may want to make this a completely separate page in the future, but for now, it rounds out the migration guidance well.

Added back in.

@cockroach-teamcity
Copy link
Member

Copy link

@drewdeally drewdeally left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 4 files at r1, 1 of 1 files at r8.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @Amruta-Ranade, @drewdeally, @glennfawcett, @jseldess, @lhirata, @robert-s-lee, and @rolandcrosby)

@cockroach-teamcity
Copy link
Member

@lnhsingh
Copy link
Contributor Author

@jseldess / @rolandcrosby can you take a final look? thanks!

Copy link
Contributor

@jseldess jseldess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:, one you get your remaining answers/help from Roland and Drew.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @drewdeally, @glennfawcett, @lhirata, @robert-s-lee, and @rolandcrosby)

Copy link

@rolandcrosby rolandcrosby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks pretty good, I found a couple things we probably want to clarify/hedge a little more but otherwise I'm happy with this.

Reviewed 1 of 1 files at r9.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @drewdeally, @glennfawcett, @lhirata, and @robert-s-lee)


v19.2/migrate-from-oracle.md, line 116 at r2 (raw file):

Previously, lhirata wrote…

@rolandcrosby / @drewdeally can one of you help me rewrite the script to implement Roland's suggested changes, since I don't know how to edit it?

Disregard this for now, we will just say to use the workaround.


v19.2/migrate-from-oracle.md, line 123 at r9 (raw file):

    reader = csv.reader(f, delimiter="|")
    with open(filename+".csv", "w") as fo:
      writer = csv.writer(fo, delimiter="|")

I don't think we need to specify a delimiter when rewriting the file - Python's CSV writer should properly escape delimiters, so comma is fine.


v19.2/migrate-from-oracle.md, line 218 at r9 (raw file):

`XML` | [`JSON`](jsonb.html) [<sup>2</sup>](#considerations)

<a name="considerations"></a>

Have we verified that the Oracle text representation for each of these is importable into CockroachDB? Obviously XML needs to be converted but I'm not sure about others; we should maybe highlight which of these will definitely need the data to be converted pre-import in order to be imported into CockroachDB.


v19.2/migrate-from-oracle.md, line 287 at r9 (raw file):

       )
   CSV DATA (
        'https://cr-test.s3.us-east-2.amazonaws.com/CUSTOMERS.csv.gz'

is this going to remain accessible? if not, can we use a standard bucket where we share other docs resources?


v19.2/migration-overview.md, line 12 at r9 (raw file):

- MySQL
- Oracle

I'm not sure if we can really say this here but your call.

Copy link
Contributor Author

@lnhsingh lnhsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @drewdeally, @glennfawcett, @lhirata, @robert-s-lee, and @rolandcrosby)


v19.2/migrate-from-oracle.md, line 116 at r2 (raw file):

Previously, rolandcrosby (Roland Crosby) wrote…

Disregard this for now, we will just say to use the workaround.

Done.


v19.2/migrate-from-oracle.md, line 143 at r2 (raw file):

Previously, rolandcrosby (Roland Crosby) wrote…

as specified above I don't think we actually meet these requirements

Removed the info about the custom delimiter


v19.2/migrate-from-oracle.md, line 123 at r9 (raw file):

Previously, rolandcrosby (Roland Crosby) wrote…

I don't think we need to specify a delimiter when rewriting the file - Python's CSV writer should properly escape delimiters, so comma is fine.

Done.


v19.2/migrate-from-oracle.md, line 287 at r9 (raw file):

Previously, rolandcrosby (Roland Crosby) wrote…

is this going to remain accessible? if not, can we use a standard bucket where we share other docs resources?

Ah, good catch. I was actually going to put placeholders since this is an example and we can assume that the user is going to have their own bucket


v19.2/migration-overview.md, line 12 at r9 (raw file):

Previously, rolandcrosby (Roland Crosby) wrote…

I'm not sure if we can really say this here but your call.

Maybe adding "Oracle (using CSV)" to clarify?

@cockroach-teamcity
Copy link
Member

@cockroach-teamcity
Copy link
Member

@lnhsingh
Copy link
Contributor Author

Follow up re: data type mapping: #5236

@cockroach-teamcity
Copy link
Member

@cockroach-teamcity
Copy link
Member

Closes #3539.

Oracle Migration Guide

Closes #3539.

Fix link

Added info for mapping data types, refactoring app SQL

Edits based on Jesse's feedback

Add delimiter note

Edits based on Roland's feedback

Add note about converting XML
@cockroach-teamcity
Copy link
Member

@lnhsingh lnhsingh merged commit a3d5bb8 into master Aug 16, 2019
@lnhsingh lnhsingh deleted the oracle-migration branch August 22, 2019 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate from Oracle to CockroachDB
6 participants