You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Python 3 treats all strings as unicode (link), which involves a different number of bytes compared to ascii strings. This causes problems when trying to upload tables with string columns (such as TILENAME) into a VARCHAR2 column in the database, leading to errors like
ORA-12899: value too large for column "BECHTOL"."Y6A1_EXTINCTION"."TILENAME" (actual: 24, maximum: 12)
Here's a more detailed illustration of the issue
import numpy as np
from astropy.table import Table
import easyaccess as ea
print('easyaccess version:', ea.__version__)
print('python version:', sys.version)
query = '''
SELECT tilename
FROM y6a1_coadd_object_summary
WHERE rownum < 10;
'''
try:
print('=== Stay within pandas ===')
connection = ea.connect()
df_pandas = connection.query_to_pandas(query)
connection.pandas_to_db(df_pandas, tablename='ea_test_pandas', append=False)
connection.describe_table('ea_test_pandas')
connection.close()
except Exception as e:
print(e)
print('pandas upload error')
connection.close()
pass
fileout = 'easyaccess_test.fits'
try:
print('=== Save to file and then upload directly ===')
connection = ea.connect()
connection.query_and_save(query, fileout)
connection.load_table(fileout, name='ea_test_fileio', memsize=100)
connection.describe_table('ea_test_fileio')
connection.close()
except Exception as e:
print(e)
print('file upload error')
connection.close()
pass
try:
print('=== Save to file and then upload via pandas ===')
data = Table.read(fileout, format='fits')
df_multistep = data.to_pandas()
connection = ea.connect()
connection.pandas_to_db(df_multistep, tablename='ea_test_multistep', append=False)
connection.describe_table('ea_test_multistep')
connection.close()
except Exception as e:
print(e)
print('fileio to pandas upload error')
connection.close()
pass
try:
print('=== Save to file, convert encoding, then upload via pandas ===')
data = Table.read(fileout, format='fits')
df_convert = data.to_pandas()
df_convert['TILENAME'] = np.array(list(map(lambda s: s.decode('ascii'), df_convert['TILENAME'])), dtype=object)
connection = ea.connect()
connection.pandas_to_db(df_convert, tablename='ea_test_convert', append=False)
connection.describe_table('ea_test_convert')
connection.close()
except Exception as e:
print(e)
print('convert upload error')
connection.close()
pass
This is simple test that downloads the TILENAME from the first 10 rows of the table and then tries to upload to a table in different ways. Among these four examples, on the first and fourth currently work with python 3.
Here’s the critical line that converts the encoding in the dataframe. It’s a little bit clumsy because there isn’t a vectorized function for the encoding conversion.
Note that the datatype for the converted column must be object and not a string.
I’ve also checked that using NVARCHAR2 instead of VARCHAR2 as the type in the database does not work (and anyway, we probably don’t want unicode columns in the database).
I’m not sure what is the best place to do this in the code. All the table uploads seem to be using the insert_data function, but the conversion in the pandas DataFrame actually would need to come before that function is called, so it seems the fix would have to be in multiple places.
The text was updated successfully, but these errors were encountered:
Python 3 treats all strings as unicode (link), which involves a different number of bytes compared to ascii strings. This causes problems when trying to upload tables with string columns (such as
TILENAME
) into aVARCHAR2
column in the database, leading to errors likeORA-12899: value too large for column "BECHTOL"."Y6A1_EXTINCTION"."TILENAME" (actual: 24, maximum: 12)
Here's a more detailed illustration of the issue
This is simple test that downloads the
TILENAME
from the first 10 rows of the table and then tries to upload to a table in different ways. Among these four examples, on the first and fourth currently work with python 3.Here’s the critical line that converts the encoding in the dataframe. It’s a little bit clumsy because there isn’t a vectorized function for the encoding conversion.
df_convert['TILENAME'] = np.array(list(map(lambda s: s.decode('ascii'), df_convert['TILENAME'])), dtype=object)
Note that the datatype for the converted column must be
object
and not a string.I’ve also checked that using
NVARCHAR2
instead ofVARCHAR2
as the type in the database does not work (and anyway, we probably don’t want unicode columns in the database).I’m not sure what is the best place to do this in the code. All the table uploads seem to be using the insert_data function, but the conversion in the pandas DataFrame actually would need to come before that function is called, so it seems the fix would have to be in multiple places.
The text was updated successfully, but these errors were encountered: