Skip to content

Very bad performance using insert many #346

Closed
@akashgurava

Description

@akashgurava
  • asyncpg version: 0.17.0
  • PostgreSQL version: 10
  • Do you use a PostgreSQL SaaS? If so, which? Can you reproduce
    the issue with a local PostgreSQL install?
    : local
  • Python version: 3.7
  • Platform: Windows
  • Do you use pgbouncer?: No
  • Did you install asyncpg with pip?: Yes
  • If you built asyncpg locally, which version of Cython did you use?: No
  • Can the issue be reproduced under both asyncio and
    uvloop?
    : did not use UVLoop

I am building a DB for storing financial tick data, so was testing insert performance for different DBs using different libs. For postgres I used psycopg2 in sync mode, aiopg, and asyncpg. I was inserting 120000 rows of symbol and OHLC data.
What I was getting very bad insert performances using asyncpg insertmany
For creating sample data

import asyncio
import string
import random
from time import time

import numpy as np
import pandas as pd

import asyncpg
import psycopg2
import aiopg

# Number of securirities to insert
SECURITIES = 2000


def string_gen(size=6, chars=string.ascii_uppercase + string.digits):
    return ''.join(random.choice(chars) for _ in range(size))


def generate_random_data():
    """Generate random data for inserting to DB"""
    index = pd.date_range(start='2018-01-01 00:00:01', end='2018-01-01 00:01:00', freq='s')
    
    dflist = []
    for _ in range(SECURITIES):
        data = np.random.rand(len(index), 4)
        data = pd.DataFrame(data, index=index, columns=['open', 'high', 'low', 'close'])
        data['symbol'] = string_gen()
        dflist.append(data)
    data = pd.concat(dflist)
    data.index.name = 'time'
    data = data.reset_index()
    data = data[['time', 'symbol', 'open', 'high', 'low', 'close']]

    return [tuple(x) for x in data.values]

For psycopg2 insert took 5 to 6 seconds

args_str = b','.join(cur.mogrify(string, row) for row in data)
args_str = args_str.decode('utf-8') # Convert byte string to UTF-8
cur.execute("INSERT INTO ohlc (time, symbol, open, high, low, close) VALUES " + args_str)
conn.commit()

For asyncpg insertmany took 30 seconds so i did something like by creating a insert statement using psycopg2 and insert using asyncpg.. still took 7 to 8 seconds

p_conn = psycopg2.connect(user='postgres', password='postgres')
cur = p_conn.cursor()
string = '(' + ('%s,' * len(data[0]))[:-1] + ')'
args_str = b','.join(cur.mogrify(string, row) for row in data)
cur.close()
args_str = args_str.decode('utf-8')
insert_str = "INSERT INTO ohlc (time, symbol, open, high, low, close) VALUES " + args_str
await conn.execute(insert_str)

similarly for aiopg i was getting 6 to 7 seconds..
I guess i am doing something wrong.. since the DB is same for all 3 libs.. we can ignore performance issues of DB.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions