Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uploading tables with string columns to the database in Python 3 #176

Open
bechtol opened this issue Oct 22, 2019 · 0 comments
Open

Uploading tables with string columns to the database in Python 3 #176

bechtol opened this issue Oct 22, 2019 · 0 comments

Comments

@bechtol
Copy link

bechtol commented Oct 22, 2019

Python 3 treats all strings as unicode (link), which involves a different number of bytes compared to ascii strings. This causes problems when trying to upload tables with string columns (such as TILENAME) into a VARCHAR2 column in the database, leading to errors like

ORA-12899: value too large for column "BECHTOL"."Y6A1_EXTINCTION"."TILENAME" (actual: 24, maximum: 12)

Here's a more detailed illustration of the issue

import numpy as np
from astropy.table import Table
import easyaccess as ea

print('easyaccess version:', ea.__version__)
print('python version:', sys.version)

query = '''
SELECT tilename 
FROM y6a1_coadd_object_summary
WHERE rownum < 10;
'''

try:
    print('=== Stay within pandas ===')
    connection = ea.connect()
    df_pandas = connection.query_to_pandas(query)
    connection.pandas_to_db(df_pandas, tablename='ea_test_pandas', append=False)
    connection.describe_table('ea_test_pandas') 
    connection.close()
except Exception as e: 
    print(e)
    print('pandas upload error')
    connection.close()
    pass

fileout = 'easyaccess_test.fits'

try:
    print('=== Save to file and then upload directly ===')
    connection = ea.connect()
    connection.query_and_save(query, fileout)
    connection.load_table(fileout, name='ea_test_fileio', memsize=100)
    connection.describe_table('ea_test_fileio')
    connection.close()
except Exception as e: 
    print(e)
    print('file upload error')
    connection.close()
    pass

try:
    print('=== Save to file and then upload via pandas ===')
    data = Table.read(fileout, format='fits')
    df_multistep = data.to_pandas()
    connection = ea.connect()
    connection.pandas_to_db(df_multistep, tablename='ea_test_multistep', append=False)
    connection.describe_table('ea_test_multistep') 
    connection.close()
except Exception as e: 
    print(e)
    print('fileio to pandas upload error')
    connection.close()
    pass

try:
    print('=== Save to file, convert encoding, then upload via pandas ===')
    data = Table.read(fileout, format='fits')
    df_convert = data.to_pandas()
    df_convert['TILENAME'] = np.array(list(map(lambda s: s.decode('ascii'), df_convert['TILENAME'])), dtype=object)
    connection = ea.connect()
    connection.pandas_to_db(df_convert, tablename='ea_test_convert', append=False)
    connection.describe_table('ea_test_convert') 
    connection.close()
except Exception as e: 
    print(e)
    print('convert upload error')
    connection.close()
    pass

This is simple test that downloads the TILENAME from the first 10 rows of the table and then tries to upload to a table in different ways. Among these four examples, on the first and fourth currently work with python 3.

Here’s the critical line that converts the encoding in the dataframe. It’s a little bit clumsy because there isn’t a vectorized function for the encoding conversion.

df_convert['TILENAME'] = np.array(list(map(lambda s: s.decode('ascii'), df_convert['TILENAME'])), dtype=object)

Note that the datatype for the converted column must be object and not a string.
I’ve also checked that using NVARCHAR2 instead of VARCHAR2 as the type in the database does not work (and anyway, we probably don’t want unicode columns in the database).

I’m not sure what is the best place to do this in the code. All the table uploads seem to be using the insert_data function, but the conversion in the pandas DataFrame actually would need to come before that function is called, so it seems the fix would have to be in multiple places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant