What is PyPika?
PyPika is a Python API for building SQL queries. The motivation behind PyPika is to provide a simple interface for building SQL queries without limiting the flexibility of handwritten SQL. Designed with data analysis in mind, PyPika leverages the builder design pattern to construct queries to avoid messy string formatting and concatenation. It is also easily extended to take full advantage of specific features of SQL database vendors.
Read the docs: http://pypika.readthedocs.io/en/latest/
PyPika supports python 2.7
and 3.3+
. It may also work on pypy, cython, and jython, but is not being tested for these versions.
To install PyPika run the following command:
pip install pypika
The main classes in pypika are :class:`pypika.Query`, :class:`pypika.Table`, and :class:`pypika.Field`.
from pypika import Query, Table, Field
The entry point for building queries is :class:`pypika.Query`. In order to select columns from a table, the table must first be added to the query. For simple queries with only one table, tables and and columns can be references using strings. For more sophisticated queries a :class:`pypika.Table` must be used.
q = Query.from_('customers').select('id', 'fname', 'lname', 'phone')
To convert the query into raw SQL, it can be cast to a string.
str(q)
Using :class:`pypika.Table`
customers = Table('customers')
q = Query.from_(customers).select(customers.id, customers.fname, customers.lname, customers.phone)
Both of the above examples result in the following SQL:
SELECT id,fname,lname,phone FROM customers
Arithmetic expressions can also be constructed using pypika. Operators such as +, -, *, and / are implemented by :class:`pypika.Field` which can be used simply with a :class:`pypika.Table` or directly.
from pypika import Field
q = Query.from_('account').select(
Field('revenue') - Field('cost')
)
SELECT revenue-cost FROM accounts
Using :class:`pypika.Table`
accounts = Table('accounts')
q = Query.from_(accounts).select(
accounts.revenue - accounts.cost
)
SELECT revenue-cost FROM accounts
An alias can also be used for fields and expressions.
q = Query.from_(accounts).select(
(accounts.revenue - accounts.cost).as_('profit')
)
SELECT revenue-cost profit FROM accounts
More arithmetic examples
table = Table('table')
q = Query.from_(table).select(
table.foo + table.bar,
table.foo - table.bar,
table.foo * table.bar,
table.foo / table.bar,
(table.foo+table.bar) / table.fiz,
)
SELECT foo+bar,foo-bar,foo*bar,foo/bar,(foo+bar)/fiz FROM table
Queries can be filtered with :class:`pypika.Criterion` by using equality or inequality operators
customers = Table('customers')
q = Query.from_(customers).select(
customers.id, customers.fname, customers.lname, customers.phone
).where(
customers.lname == 'Mustermann'
)
SELECT id,fname,lname,phone FROM customers WHERE lname='Mustermann'
Query methods such as select, where, groupby, and orderby can be called multiple times. Multiple calls to the where method will add additional conditions as
customers = Table('customers')
q = Query.from_(customers).select(
customers.id, customers.fname, customers.lname, customers.phone
).where(
customers.fname == 'Max'
).where(
customers.lname == 'Mustermann'
)
SELECT id,fname,lname,phone FROM customers WHERE fname='Max' AND lname='Mustermann'
Filters such as IN and BETWEEN are also supported
customers = Table('customers')
q = Query.from_(customers).select(
customers.id,customers.fname
).where(
customers.age[18:65] & customers.status.isin(['new', 'active'])
)
SELECT id,fname FROM customers WHERE age BETWEEN 18 AND 65 AND status IN ('new','active')
Filtering with complex criteria can be created using boolean symbols &
, |
, and ^
.
AND
customers = Table('customers')
q = Query.from_(customers).select(
customers.id, customers.fname, customers.lname, customers.phone
).where(
(customers.age >= 18) & (customers.lname == 'Mustermann')
)
SELECT id,fname,lname,phone FROM customers WHERE age>=18 AND lname='Mustermann'
OR
customers = Table('customers')
q = Query.from_(customers).select(
customers.id, customers.fname, customers.lname, customers.phone
).where(
(customers.age >= 18) | (customers.lname == 'Mustermann')
)
SELECT id,fname,lname,phone FROM customers WHERE age>=18 OR lname='Mustermann'
XOR
customers = Table('customers')
q = Query.from_(customers).select(
customers.id, customers.fname, customers.lname, customers.phone
).where(
(customers.age >= 18) ^ customers.is_registered
)
SELECT id,fname,lname,phone FROM customers WHERE age>=18 XOR is_registered
Grouping allows for aggregated results and works similar to SELECT
clauses.
from pypika import functions as fn
customers = Table('customers')
q = Query.from_(customers).where(
customers.age >= 18
).groupby(
customers.id
).select(
customers.id, fn.Sum(customers.revenue)
)
SELECT id,SUM(revenue) FROM customers WHERE age>=18 GROUP BY id ORDER BY id ASC
After adding a GROUP BY
clause to a query, the HAVING
clause becomes available. The method
:class:`Query.having()` takes a :class:`Criterion` parameter similar to the method :class:`Query.where()`.
from pypika import functions as fn
payments = Table('payments')
q = Query.from_(payments).where(
payments.transacted[date(2015, 1, 1):date(2016, 1, 1)]
).groupby(
payments.customer_id
).having(
fn.Sum(payments.total) >= 1000
).select(
payments.customer_id, fn.Sum(payments.total)
)
SELECT customer_id,SUM(total) FROM payments
WHERE transacted BETWEEN '2015-01-01' AND '2016-01-01'
GROUP BY customer_id HAVING SUM(total)>=1000
Tables and subqueries can be joined to any query using the :class:`Query.join()` method. When joining tables and subqueries, a criterion must provided containing an equality between a field from the primary table or joined tables and a field from the joining table. When calling :class:`Query.join()` with a table, a :class:`TablerJoiner` will be returned with only the :class:`Joiner.on()` function available which takes a :class:`Criterion` parameter. After calling :class:`Joiner.on()` the original query builder is returned and additional methods may be chained.
history, customers = Tables('history', 'customers')
q = Query.from_(history).join(
customers
).on(
history.customer_id == customers.id
).select(
history.star
).where(
customers.id == 5
)
SELECT history.* FROM history JOIN customers ON history.customer_id=customers.id WHERE customers.id=5
Both UNION
and UNION ALL
are supported. UNION DISTINCT
is synonomous with "UNION`` so and PyPika does not
provide a separate function for it. Unions require that queries have the same number of SELECT
clauses so
trying to cast a unioned query to string with through a :class:`UnionException` if the column sizes are mismatched.
To create a union query, use either the :class:`Query.union()` method or + operator with two query instances. For a union all, use :class:`Query.union_all()` or the * operator.
provider_a, provider_b = Tables('provider_a', 'provider_b')
q = Query.from_(provider_a).select(
provider_a.created_time, provider_a.foo, provider_a.bar
) + Query.from_(provider_b).select(
provider_b.created_time, provider_b.fiz, provider_b.buz
)
SELECT created_time,foo,bar FROM provider_a UNION SELECT created_time,fiz,buz FROM provider_b
Using :class:`pypika.Interval`, queries can be constructed with date arithmetic. Any combination of intervals can be used except for weeks and quarters, which must be used separately and will ignore any other values if selected.
from pypika import functions as fn
fruits = Tables('fruits')
q = Query.from_(fruits).select(
fruits.id,
fruits.name,
).where(
fruits.harvest_date + Interval(months=1) < fn.Now()
)
SELECT id,name FROM fruits WHERE harvest_date+INTERVAL 1 MONTH<NOW()
There are several string operations and function wrappers included in PyPika. Function wrappers can be found in the :class:`pypika.functions` package. In addition, LIKE and REGEX queries are supported as well.
from pypika import functions as fn
customers = Tables('customers')
q = Query.from_(customers).select(
customers.id,
customers.fname,
customers.lname,
).where(
customers.lname.like('Mc%')
)
SELECT id,fname,lname FROM customers WHERE lname LIKE 'Mc%'
from pypika import functions as fn
customers = Tables('customers')
q = Query.from_(customers).select(
customers.id,
customers.fname,
customers.lname,
).where(
customers.lname.regex(r'^[abc][a-zA-Z]+&')
)
SELECT id,fname,lname FROM customers WHERE lname REGEX '^[abc][a-zA-Z]+&';
from pypika import functions as fn
customers = Tables('customers')
q = Query.from_(customers).select(
customers.id,
fn.Concat(customers.fname, ' ', customers.lname).as_('full_name'),
)
SELECT id,CONCAT(fname, ' ', lname) full_name FROM customers
Data can be inserted into tables either by providing the values in the query or by selecting them through another query.
By default, data can be inserted by providing values for all columns in the order that they are defined in the table.
customers = Table('customers')
q = Query.into(customers).insert(1, 'Jane', 'Doe', 'jane@example.com')
INSERT INTO customers VALUES (1,'Jane','Doe','jane@example.com')
Multiple rows of data can be inserted either by chaining the insert
function or passing multiple tuples as args.
customers = Table('customers')
q = Query.into(customers).insert(1, 'Jane', 'Doe', 'jane@example.com').insert(2, 'John', 'Doe', 'john@example.com')
customers = Table('customers')
q = Query.into(customers).insert((1, 'Jane', 'Doe', 'jane@example.com'),
(2, 'John', 'Doe', 'john@example.com'))
INSERT INTO customers VALUES (1,'Jane','Doe','jane@example.com'),(2,'John','Doe','john@example.com')
To specify the columns and the order, use the columns
function.
customers = Table('customers')
q = Query.into(customers).columns('id', 'fname', 'lname').insert(1, 'Jane', 'Doe')
INSERT INTO customers (id,fname,lname) VALUES (1,'Jane','Doe','jane@example.com')
Inserting data with a query works the same as querying data with the additional call to the into
method in the
builder chain.
customers, customers_backup = Tables('customers', 'customers_backup')
q = Query.into(customers_backup).from_(customers).select('*')
INSERT INTO customers_backup SELECT * FROM customers
Copyright 2016 KAYAK Germany, GmbH
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.