When using to_sql(), continue if duplicate primary keys are detected?Β #15988
Description
Code Sample, a copy-pastable example if possible
df.to_sql('TableNameHere', engine, if_exists='append', chunksize=900, index=False)
Problem description
I am trying to append a large DataFrame to a SQL table. Some of the rows in the DataFrame are duplicates of those in the SQL table, some are not. But to_sql()
completely stops executing if even one duplicate is detected.
It would make sense for to_sql(if_exists='append')
to merely warn the user which rows had duplicate keys and just continue to add the new rows, not completely stop executing. For large datasets, you will likely have duplicates but want to ignore them.
Maybe add an argument to ignore duplicates and keep executing? Perhaps an additional if_exists
option like 'append_skipdupes'
?
Output of pd.show_versions()
pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 28.8.0
Cython: None
numpy: 1.12.0
scipy: None
statsmodels: None
xarray: None
IPython: 5.3.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
httplib2: None
apiclient: None
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.5
boto: None
pandas_datareader: None