Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding (Insert or update if key exists) option to .to_sql #14553

Closed
cdagnino opened this issue Nov 1, 2016 · 59 comments · Fixed by #53264
Closed

Adding (Insert or update if key exists) option to .to_sql #14553

cdagnino opened this issue Nov 1, 2016 · 59 comments · Fixed by #53264
Labels
Enhancement IO SQL to_sql, read_sql, read_sql_query

Comments

@cdagnino
Copy link

cdagnino commented Nov 1, 2016

Suppose you have an existing SQL table called person_age, where id is the primary key:

    age
id	
1	18
2	42

and you also have new data in a DataFrame called extra_data

	age
id	
2	44
3	95

then it would be useful to have an option on extra_data.to_sql() that allows to pass the DataFrame to SQL with an INSERT or UPDATE option on the rows, based on the primary key.

In this case, the id=2 row would get updated to age=44 and the id=3 row would get added

Expected Output

	age
id	
1	18
2	44
3	95

(Maybe) helpful code references

I looked at pandas sql.py sourcecode to come up with a solution, but I couldn't follow.

Code to replicate the example above

(Apologies for mixing sqlalchemy and sqlite

import pandas as pd
from sqlalchemy import create_engine
import sqlite3
conn = sqlite3.connect('example.db')
 
c = conn.cursor()
c.execute('''DROP TABLE IF EXISTS person_age;''')
c.execute('''
          CREATE TABLE person_age
          (id INTEGER PRIMARY KEY ASC, age INTEGER NOT NULL)
          ''')
conn.commit()
conn.close()

##### Create original table

engine = create_engine("sqlite:///example.db")
sql_df = pd.DataFrame({'id' : [1, 2], 'age' : [18, 42]})

sql_df.to_sql('person_age', engine, if_exists='append', index=False)


#### Extra data to insert/update

extra_data = pd.DataFrame({'id' : [2, 3], 'age' : [44, 95]})
extra_data.set_index('id', inplace=True)

#### extra_data.to_sql()  with row update or insert option

expected_df = pd.DataFrame({'id': [1, 2, 3], 'age': [18, 44, 95]})
expected_df.set_index('id', inplace=True)
@jorisvandenbossche jorisvandenbossche added Enhancement IO SQL to_sql, read_sql, read_sql_query labels Nov 2, 2016
@jorisvandenbossche
Copy link
Member

This would be nice functionality, but the main problem is that we want it to be database-flavor independent and based on sqlalchemy core (so not sqlalchemy ORM) for inclusion in pandas itself.
Which will make this difficult to implement ..

@TomAugspurger
Copy link
Contributor

Yeah, I think this is out of scope for pandas since upserts aren't supported by all db engines.

@kjford
Copy link
Contributor

kjford commented Nov 6, 2016

While an INSERT OR UPDATE isn't supported by all engines, an INSERT OR REPLACE can be made engine agnostic by deleting rows from the target table for the set of primary keys in the DataFrame index followed by an insert of all rows in the DataFrame. You'd want to do this in a transaction.

@neilfrndes
Copy link

@TomAugspurger Could we add the upsert option for supported db engines and throw an error for unsupported db engines ?

@ldacey
Copy link

ldacey commented Jun 6, 2017

I'd like to see this as well. I am caught in between using pure SQL and SQL Alchemy (haven't gotten this to work yet, I think it has something to do with how I pass the dicts). I use psycopg2 COPY to bulk insert, but I would love to use pd.to_sql for tables where values might change over time and I don't mind it inserting a bit slower.

insert_values = df.to_dict(orient='records')
insert_statement = sqlalchemy.dialects.postgresql.insert(table).values(insert_values)
upsert_statement = insert_statement.on_conflict_do_update(
    constraint='fact_case_pkey',
    set_= df.to_dict(orient='dict')
)

And pure SQL:

def create_update_query(df, table=FACT_TABLE):
    """This function takes the Airflow execution date passes it to other functions"""
    columns = ', '.join([f'{col}' for col in DATABASE_COLUMNS])
    constraint = ', '.join([f'{col}' for col in PRIMARY_KEY])
    placeholder = ', '.join([f'%({col})s' for col in DATABASE_COLUMNS])
    values = placeholder
    updates = ', '.join([f'{col} = EXCLUDED.{col}' for col in DATABASE_COLUMNS])
    query = f"""INSERT INTO {table} ({columns}) 
    VALUES ({placeholder}) 
    ON CONFLICT ({constraint}) 
    DO UPDATE SET {updates};"""
    query.split()
    query = ' '.join(query.split())
    return query

def load_updates(df, connection=DATABASE):
    """Uses COPY from STDIN to load to Postgres
     :param df: The dataframe which is writing to StringIO, then loaded to the the database
     :param connection: Refers to a PostgresHook
    """
    conn = connection.get_conn()
    cursor = conn.cursor()
    df1 = df.where((pd.notnull(df)), None)
    insert_values = df1.to_dict(orient='records')
    for row in insert_values:
        cursor.execute(create_update_query(df), row)
        conn.commit()
    cursor.close()
    del cursor
    conn.close()

@ODemidenko
Copy link

ODemidenko commented Jun 30, 2017

@ldacey this style worked for me (insert_statement.excluded is an alias to the row of data that violated the constraint):

insert_values = merged_transactions_channels.to_dict(orient='records')
 insert_statement = sqlalchemy.dialects.postgresql.insert(orders_to_channels).values(insert_values)
    upsert_statement = insert_statement.on_conflict_do_update(
        constraint='orders_to_channels_pkey',
        set_={'channel_owner': insert_statement.excluded.channel_owner}
    )

@rajbiswas
Copy link

@cdagnino This snippet might not work in the case of composite keys, that scenario has to be taken care of also. I'll try to find a way to do the same

chromium-infra-bot pushed a commit to catapult-project/catapult that referenced this issue May 1, 2018
Pandas DataFrame.to_sql method has limitation of not being able to
"insert or replace" records, see e.g:
pandas-dev/pandas#14553

Using pandas.io.sql primitives, however, it's not too hard to implement
such a functionality (for the SQLite case only).

Assuming that index columns of the frame have names, this method will
use those columns as the PRIMARY KEY of the table.

Bug: catapult:#4382
Change-Id: I5f70a04c18b998590b1e77e5ff6b89b2d27138af
Reviewed-on: https://chromium-review.googlesource.com/1035266
Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org>
Reviewed-by: Charlie Andrews <charliea@chromium.org>
@danich1
Copy link

danich1 commented May 1, 2018

One way to solve this update issue is to use sqlachemy's bulk_update_mappings. This function takes in a list of dictionary values and updates each row based on the tables primary key.

session.bulk_update_mappings(
  Table,
  pandas_df.to_dict(orient='records)
)

@joshhornby
Copy link

I agree with @neilfrndes, shouldn't allow a nice feature like this not to be implemented because some DBs don't support. Is there any chance this feature might happen?

@TomAugspurger
Copy link
Contributor

Probably. if someone makes a PR. On further consideration, I don't think I'm opposed to this on the principle that some databases don't support it. However, I'm not too familiar with the sql code, so I'm not sure what the best approach is.

@kjford
Copy link
Contributor

kjford commented Nov 26, 2018

One possibility is to provide some examples for upserts using the method callable if this PR is introduced: #21401

For postgres that would look something like (untested):

from sqlalchemy.dialects import postgresql

def pg_upsert(table, conn, keys, data_iter):
    for row in data:
        row_dict = dict(zip(keys, row))
        stmt = postgresql.insert(table).values(**row_dict)
        upsert_stmt = stmt.on_conflict_do_update(
            index_elements=table.index,
            set_=row_dict)
        conn.execute(upsert_stmt)

Something similar could be done for mysql.

@ldacey
Copy link

ldacey commented Nov 27, 2018

For postgres I am using execute_values. In my case, my query is a jinja2 template to flag whether I should do update set or do nothing. This has been quite quick and flexible. Not as fast as using COPY or copy_expert but it works well.

from psycopg2.extras import execute_values

df = df.where((pd.notnull(df)), None)
tuples = [tuple(x) for x in df.values]
with pg_conn:
    with pg_conn.cursor() as cur:
        execute_values(cur=cur,
                                 sql=insert_query,
                                 argslist=tuples,
                                 template=None,
                                           )`

@cristianionescu92
Copy link

@danich1 can you, please, set an example of how this would work?

I tried to have a look into bulk_update_mappings but I got really lost and couldn't make it to work.

@danich1
Copy link

danich1 commented Dec 20, 2018

@cristianionescu92 An example would be this:
I have a table called User with the following fields: id and name.

id name
0 John
1 Joe
2 Harry

I have a pandas data frame with the same columns but updated values:

id name
0 Chris
1 James

Let's also assume that we have a session variable open to access the database. By calling this method:

session.bulk_update_mappings(
User,
<pandas dataframe above>.to_dict(orient='records')
)

Pandas will convert the table into a list of dictionaries [{id: 0, name: "chris"}, {id: 1, name:"james"}] that sql will use to update the rows of the table. So final table will look like:

id name
0 Chris
1 James
2 Harry

@cristianionescu92
Copy link

Hi, @danich1 and thanks a lot for your response. I figured out myself the mechanics of how the update would work. Unfortunately I don't know work how to work with a session, I am quite beginner.

Let me show you what I am doing:

` import pypyodbc
from to_sql_newrows import clean_df_db_dups, to_sql_newrows #these are 2 functions I found on GitHub, unfortunately I cannot remember the link. Clean_df_db_dups excludes from a dataframe the rows which already exist in an SQL table by checking several key columns and to_sql_newrows is a function which inserts into sql the new rows.

from sqlalchemy import create_engine
engine = create_engine("engine_connection_string")

#Write data to SQL
Tablename = 'Dummy_Table_Name'
Tablekeys = Tablekeys_string
dftoupdateorinsertinSQL= random_dummy_dataframe

#Connect to sql server db using pypyodbc
cnxn = pypyodbc.connect("Driver={SQL Server};"
                        "Server=ServerName;"
                        "Database=DatabaseName;"
                        "uid=userid;pwd=password")

newrowsdf= clean_df_db_dups(dftoupdateorinsertinSQL, Tablename, engine, dup_cols=Tablekeys)
newrowsdf.to_sql(Tablename, engine, if_exists='append', index=False, chunksize = 140)
end=timer()

tablesize = (len(newrowsdf.index))

print('inserted %r rows '%(tablesize))`

The above code is basically excluding from a dataframe the rows which I already have in SQL and only inserts the new rows. What I need is to update the rows which exists. Can you, please, help me understand what I should do next?

@rugg2
Copy link

rugg2 commented Jul 29, 2019

Motivation for a better TO_SQL
to_sql integrating better with database practices is increasingly of value as data science grows and gets mixed with data engineering.

upsert is one of them, in particular because many people find that the work around is to use replace instead, which drops the table, and with it all the views and constraints.

The alternative I've seen in more experienced users is to stop using pandas at this stage, and this tends to propagate upstream and makes the pandas package loose retention among experienced users. Is this the direction Pandas wants to go?

I understand we want to_sql to remain database agnostic as much as possible, and use core sql alchemy. A method that truncates or deletes instead of a true upsert would still add a lot of value though.

Integration with Pandas product vision
A lot of the above debate happened before the introduction of the method argument (as mentioned by @kjford with psql_insert_copy) and the possibility to pass on a callable.

I'd happily contribute to either the core pandas functionality, or failing that, documentation on solution / best practice on how to achieve an upsert functionality within Pandas, such as the below:
https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql-method

What is the preferred way forward for Pandas core dev / product managers?

@TomAugspurger
Copy link
Contributor

I think we're open to an implementation that's engine-specific. The proposal to use method='upsert' seems reasonable, but at this point I think we need someone to come up with a clear design proposal.

@pratham2003
Copy link

pratham2003 commented Aug 20, 2019

I have a similar requirement where I want to update existing data in a MySQL table from multiple CSVs over time.

I thought I could df.to_sql() to insert the new data into a newly created temporary table and then run a MySQL query to control how to append/update data in the existing table.

MySQL Reference: https://stackoverflow.com/questions/2472229/insert-into-select-from-on-duplicate-key-update?answertab=active#tab-top

Disclaimer: I started using Python and Pandas only a few days ago.

@notsambeck
Copy link

notsambeck commented Sep 11, 2019

Hey Pandas folk: I have had this same issue, needing to frequently update my local database with records that I ultimately load and manipulate in pandas. I built a simple library to do this - it’s basically a stand-in for df.to_sql and pd.read_sql_table that uses the DataFrame index as a primary key by default. Uses sqlalchemy core only.

https://pypi.org/project/pandabase/0.2.1/
Https://github.com/notsambeck/pandabase

This tool is fairly opinionated, probably not appropriate to include in Pandas as-is. But for my specific use case it solves the problem... if there is interest in massaging this to make it fit in Pandas I am happy to help.

For now, the following works (in the limited case of current-ish pandas and sqlalchemy, named index as primary key, SQLite or Postgres back end, and supported datatypes):

pip install pandabase / pandabase.to_sql(df, table_name, con_string, how=‘upsert’)

@rugg2
Copy link

rugg2 commented Sep 11, 2019

Working on a general solution to this with cvonsteg. Planning to come back with a proposed design in October.

@cvonsteg
Copy link

@TomAugspurger as suggested, @rugg2 and I have come up with the following design proposal for an upsert option in to_sql().

Interface Proposal

2 new variables to be added as a possible method argument in the to_sql() method:

  1. upsert_update - on row match, update row in database (for knowingly updating records - represents most use cases)
  2. upsert_ignore - on row match, do not update row in database (for cases where datasets have overlap, and you do not want to override data in tables)
import pandas as pd
from sqlalchemy import create_engine

engine = create_engine("connection string")
df = pd.DataFrame(...)

df.to_sql(
    name='table_name', 
    con=engine, 
    if_exists='append', 
    method='upsert_update' # (or upsert_ignore)
)

Implementation Proposal

To implement this, SQLTable class would receive 2 new private methods containing the upsert logic, which would be called from the SQLTable.insert() method:

def insert(self, chunksize=None, method=None):

    #set insert method
    if method is None:
        exec_insert = self._execute_insert
    elif method == "multi":
        exec_insert = self.execute_insert_multi
    #new upsert methods <<<
    elif method == "upsert_update":
        exec_insert = self.execute_upsert_update
    elif method == "upsert_ignore":
        exec_insert = self.execute_upsert_ignore
    # >>>
    elif callable(method):
        exec_inset = partial(method, self)
    else:
        raise ValueError("Invalid parameter 'method': {}".format(method))
    
    ...

We propose the following implementation, with rationale outlined in detail below (all points are open for discussion):

(1) Engine agnostic using SQLAlchemy core, via an atomic sequence of DELETE and INSERT

  • Only some dbms natively support upsert, and implementations can vary across flavours
  • As a first implementation we believe it'd be easier to test and maintain one implementation across all dbms. In future, if the demand exists, engine-specific implementations can be added.
  • For upsert_ignore these operations would obviously be skipped on matching records
  • It will be worth comparing an engine-agnostic implementation vs engine-specific implementations in terms of performance.

(2) Upsert on Primary Key only

  • Upserts default to primary key clashes unless otherwise specifed
  • Some DBMS allow users to specify non-primary-key columns, against which to check for uniqueness. Whilst this grants the user more flexibility, it comes with potential pitfalls. If these columns do not have a UNIQUE constraint, then it is plausible that multiple rows may match the upsert condition. In this case, no upsert should be performed as it is ambiguous as to which record should be updated. To enforce this from pandas, each row would need to be individually assessed to check that only 1 or 0 rows match, before it is inserted. While this functionality is reasonably straightforward to implement, it results in each record requiring a read and a write operation (plus a delete if a 1 record clash found), which feels highly inefficient for larger datasets.
  • In a future improvement, if the community calls for it, we could add the functionality to extend upsert to not only work on the primary key, but also on user specified fields. This is a longer term question for the core-dev team, as to whether Pandas should remain simple to protect users that have a poorly designed database, or have more functionalities.

@rugg2
Copy link

rugg2 commented Oct 9, 2019

@TomAugspurger, if the upsert proposal designed with @cvonsteg suits you, we will proceed with the implementation in code (incl. tests) and raise a pull request.

Let us know if you would like to proceed differently.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Oct 9, 2019 via email

@WillAyd
Copy link
Member

WillAyd commented Oct 9, 2019

I personally don't have anything against it so think a PR is welcome. One implementation across all DBMs using SQLAlchemy core is certainly how this should start if I am reading your points correctly, and same with just primary keys.

Always easier to start small and focused and expand from there

@bearHunting
Copy link

need this feature badly.

@agigao
Copy link

agigao commented Jul 5, 2020

Any news?))

Coming from Java world, never thought this simple functionality might turned my codebase upside down.

@GoldstHa
Copy link

Hi everyone,

I've looked into how upserts are implemented in SQL across dialects and found a number of techniques that can inform design decisions here. But first, I want to warn against using DELETE ... INSERT logic. If there are foreign keys or triggers, other records across the database will end up being deleted or otherwise messed up. In MySQL, REPLACE does the same damage. I've actually created hours of work for myself fixing data because I used REPLACE. So, that said, here are the techniques implemented in SQL:

Dialect Technique
MySQL INSERT ... ON DUPLICATE KEY UPDATE
PostgreSQL INSERT ... ON CONFLICT
SQLite INSERT ... ON CONFLICT
Db2 MERGE
SQL Server MERGE
Oracle MERGE
SQL:2016 MERGE

With wildly varying syntax, I understand the temptation to use DELETE ... INSERT to make the implementation dialect agnostic. But there's another way: we can imitate the logic of the MERGE statement using a temp table and basic INSERT and UPDATE statements. The SQL:2016 MERGE syntax is as follows:

MERGE INTO target_table 
USING source_table 
ON search_condition
    WHEN MATCHED THEN
        UPDATE SET col1 = value1, col2 = value2,...
    WHEN NOT MATCHED THEN
        INSERT (col1,col2,...)
        VALUES (value1,value2,...);

Borrowed from Oracle Tutorial
and adjusted to conform to SQL Wikibook

Since every dialect supported by SQLAlchemy supports temp tables, a safer, dialect-agnostic approach to doing an upsert would be to, in a single transaction:

  1. Create a temp table.
  2. Insert the data into that temp table.
  3. Do an UPDATE ... JOIN.
  4. INSERT where the key (PRIMARY or UNIQUE) doesn't match.
  5. Drop the temp table.

Besides being a dialect-agnostic technique, it also has the advantage of being expanded upon by allowing the end-user to choose how to insert or how to update the data as well as on what key to join the data.

While the syntax of temp tables, and update joins might differ slightly between dialects, they should be supported everywhere.

Below is a proof of concept I wrote for MySQL:

import uuid

import pandas as pd
from sqlalchemy import create_engine


# This proof of concept uses this sample database
# https://downloads.mysql.com/docs/world.sql.zip


# Arbitrary, unique temp table name to avoid possible collision
source = str(uuid.uuid4()).split('-')[-1]

# Table we're doing our upsert against
target = 'countrylanguage'

db_url = 'mysql://<{user: }>:<{passwd: }>.@<{host: }>/<{db: }>'

df = pd.read_sql(
    f'SELECT * FROM `{target}`;',
    db_url
)

# Change for UPDATE, 5.3->5.4
df.at[0,'Percentage'] = 5.4
# Change for INSERT
df = df.append(
    {'CountryCode': 'ABW','Language': 'Arabic','IsOfficial': 'F','Percentage':0.0},
    ignore_index=True
)

# List of PRIMARY or UNIQUE keys
key = ['CountryCode','Language']

# Do all of this in a single transaction
engine = create_engine(db_url)
with engine.begin() as con:
    # Create temp table like target table to stage data for upsert
    con.execute(f'CREATE TEMPORARY TABLE `{source}` LIKE `{target}`;')
    # Insert dataframe into temp table
    df.to_sql(source,con,if_exists='append',index=False,method='multi')
    # INSERT where the key doesn't match (new rows)
    con.execute(f'''
        INSERT INTO `{target}`
        SELECT
            *
        FROM
            `{source}`
        WHERE
            (`{'`, `'.join(key)}`) NOT IN (SELECT `{'`, `'.join(key)}` FROM `{target}`);
    ''')
    # Create a doubled list of tuples of non-key columns to template the update statement
    non_key_columns = [(i,i) for i in df.columns if i not in key]
    # Whitespace for aesthetics
    whitespace = '\n\t\t\t'
    # Do an UPDATE ... JOIN to set all non-key columns of target to equal source
    con.execute(f'''
        UPDATE
            `{target}` `t`
                JOIN
            `{source}` `s` ON `t`.`{"` AND `t`.`".join(["`=`s`.`".join(i) for i in zip(key,key)])}`
        SET
            `t`.`{f"`,{whitespace}`t`.`".join(["`=`s`.`".join(i) for i in non_key_columns])}`;
    ''')
    # Drop our temp table.
    con.execute(f'DROP TABLE `{source}`;')

Here, I make the following assumptions:

  1. The structure of your source and destination are the same.
  2. That you want to do simple inserts using the data in your dataframe.
  3. That you want to simply update all non-key columns with the data from your dataframe.
  4. That you don't want to make any changes to data in key columns.

Despite the assumptions, I hope my MERGE-inspired technique informs efforts to build a flexible, robust upsert option.

@raajtilaksarma
Copy link

I think this is an useful functionality however out of scope it seems as it is intuitive to have such a common feature while adding rows to a table.

@Nemecsek
Copy link

Please think again to add this function: it is very useful to add rows to an existing table.
Alas Pangres is limited to Python 3.7+. As in my case (I am forced to use an old Python 3.4) it is not always a viable solution.

@cvonsteg
Copy link

cvonsteg commented Oct 1, 2020

Thanks, @GoldstHa - that is really helpful input. I will attempt to create a POC for the MERGE-like implementation

@cvonsteg
Copy link

cvonsteg commented Nov 2, 2020

Given the issues with the DELETE/INSERT approach, and the potential blocker on @GoldstHa MERGE approach on MySQL DBs, I've done a bit more digging. I have scratched together a proof of concept using the sqlalchemy update functionality, which looks promising. I will attempt to implement it properly this week in the Pandas codebase, ensuring that this approach works across all DB flavours.

Modified Approach Proposal

There have been some good discussions around the API, and how an upsert should actually be called (i.e. via the if_exists argument, or via an explicit upsert argument). This will be clarified soon. For now, this is the pseudocode proposal for how the functionality would work using the SqlAlchemy upsert statement:

Identify primary key(s) and existing pkey values from DB table (if no primary key constraints identified, but upsert is called, return an error)

Make a temp copy of the incoming DataFrame

Identify records in incoming DataFrame with matching primary keys

Split temp DataFrame into records which have a primary key match, and records which don't

if upsert:
    Update the DB table using `update` for only the rows which match
else:
    Ignore rows from DataFrame with matching primary key values
finally:
    Append remaining DataFrame rows with non-matching values in the primary key column to the DB table

@Nok021
Copy link

Nok021 commented Jan 12, 2021

Is it be implemented? I am really looking forward to it.

@lan2720
Copy link

lan2720 commented Jan 13, 2021

Is it be implemented? I am really looking forward to it.

+1

@erfannariman
Copy link
Member

erfannariman commented Feb 13, 2021

@cvonsteg not sure if this is anyway helpful for your approach atm, but I created a method for ourselves internally (waiting till this is available in pandas).

Here's the main class which creates MERGE statement generically from the dataframe itself. Furthermore let me know I can help with anything.

class SqlUpsert:
    def __init__(self, table_name, schema, id_cols, columns):
        self.table_name = table_name
        self.schema = schema
        self.id_cols = id_cols
        self.columns = [col.strip() for col in columns]

    def create_on_statement(self):
        on = " AND ".join([f"s.{id_col} = t.{id_col}" for id_col in self.id_cols])
        return on

    def create_update_statement(self):
        update = ", ".join(
            [f"t.{col} = s.{col}" for col in self.columns if col not in self.id_cols]
        )
        return update

    def create_insert_statement(self):
        insert = f"({', '.join(self.columns)})"

        values = ", ".join([f"s.{col}" for col in self.columns])
        values = f"({values})"

        return insert, values

    def create_merge_query(self):
        insert = self.create_insert_statement()
        query = f"""
        CREATE PROCEDURE [UPSERT_{self.table_name}]
        AS
        MERGE {self.schema}.{self.table_name} t
            USING staging.{self.table_name} s
        ON {self.create_on_statement()}
        WHEN MATCHED
            THEN UPDATE SET
                {self.create_update_statement()}
        WHEN NOT MATCHED BY TARGET
            THEN INSERT {insert[0]}
                 VALUES {insert[1]};
        """
        logging.info(query)

        return query

@cvonsteg
Copy link

Thank you for input @erfannariman - I've been a bit busy moving house, but will get to looking into that asap.

@erfannariman
Copy link
Member

@cvonsteg, if you need any help, please let me know.

@Hassaan-Elahi
Copy link

Is this feature released ? need it badly

@jreback
Copy link
Contributor

jreback commented Feb 19, 2021

this is open and at least one (closed PR)

we would need a complete and fully tested solution

@charles2588
Copy link

this feature will greatly help..would appreciate any updates if it is going to be available soon.

@espoirMur
Copy link
Contributor

I am subscribing for updates, and to increase the list of people who needs this.

@sdrap
Copy link

sdrap commented Apr 6, 2021

On behalf of my class in data analysis, as a strong supporter of Pandas, I strongly support this new functionality :). The work around are kind of ugly with different DB.

@bragarods
Copy link

Also posting to follow. I can help with the code if there's WIP.

@oregano90
Copy link

This would be a great feature!

@cvonsteg
Copy link

Thank you all for the offers of help - the PR is here. I've re-worked it and functionally it should all work now, however the build still fails due to some house-keeping (and seemingly some unrelated tests). Please feel free to have a look and offer any suggestions or feedback which you can think of!

@mzeitlin11 mzeitlin11 mentioned this issue Oct 13, 2021
@schild
Copy link

schild commented Dec 23, 2021

waiting for ...

@MayasMess
Copy link

I just developed this package: https://github.com/MayasMess/pandas-oop
it is using pangres for the insert, overwrite or ignore on conflicts

@rafagsiqueira
Copy link

rafagsiqueira commented Sep 30, 2022

Arriving late to the party. While this PR still waiting to be merged, does anyone have an implementation of the callable method that works with MSSQL?
@erfannariman would you mind explaining how you use that class to upsert a pandas dataframe?

@ddxv
Copy link

ddxv commented Oct 2, 2022

@rafagsiqueira I modified code similar to this for PostgreSQL, not sure if it will help, but just passing it along:
https://gist.github.com/pedrovgp/b46773a1240165bf2b1448b3f70bed32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment