Skip to content
This repository was archived by the owner on May 17, 2024. It is now read-only.

sqeleton vendoring PoC #480

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ dsnparse = "*"
click = "^8.1"
rich = "*"
toml = "^0.10.2"
sqeleton = "0.0.8"
sqeleton = { path = "./vendor/sqeleton", develop = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a side note: I wonder, will this approach also package the files properly into tarballs & wheels and install it in a clean environment that is not git-based (i.e. pip install data-diff)?

mysql-connector-python = {version="8.0.29", optional=true}
psycopg2 = {version="*", optional=true}
snowflake-connector-python = {version="^2.7.2", optional=true}
Expand Down
138 changes: 138 additions & 0 deletions vendor/sqeleton/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# Mac
.DS_Store

# IntelliJ
.idea

# VSCode
.vscode
21 changes: 21 additions & 0 deletions vendor/sqeleton/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2022 Datafold

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
126 changes: 126 additions & 0 deletions vendor/sqeleton/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Sqeleton

**Under construction!**

Sqeleton is a Python library for querying SQL databases.

It consists of -

- A fast and concise query builder, inspired by PyPika and SQLAlchemy

- A modular database interface, with drivers for a long list of SQL databases.

It is comparable to other libraries such as SQLAlchemy or PyPika, in terms of API and intended audience. However there are several notable ways in which it is different.

## Overview

### Built for performance

- Multi-threaded by default -
The same connection object can be used from multiple threads without any additional setup.

- No ORM
ORMs are easy and familiar, but they encourage bad and slow code. Sqeleton is designed to push the compute to SQL.

- Fast query-builder
Sqeleton's query-builder runs about 4 times faster than SQLAlchemy's.

### Type-aware

Sqeleton has a built-in feature to query the schemas of the databases it supports.

This feature can be also used to inform the query-builder, either as an alternative to defining the tables yourself, or to validate that your definitions match the actual schema.

The schema is used for validation when building expressions, making sure the names are correct, and that the data-types align.

(Still WIP)

### Multi-database access

Sqeleton is designed to work with several databases at the same time. Its API abstracts away as many implementation details as possible.

Databases we fully support:

- PostgreSQL >=10
- MySQL
- Snowflake
- BigQuery
- Redshift
- Oracle
- Presto
- Databricks
- Trino
- Clickhouse
- Vertica
- DuckDB >=0.6
- SQLite (coming soon)

## Documentation

[Read the docs!](https://sqeleton.readthedocs.io)

Or jump straight to the [introduction](https://sqeleton.readthedocs.io/en/latest/intro.html).

### Install

Install using pip:

```bash
pip install sqeleton
```

It is recommended to install the driver dependencies using pip's `[]` syntax:

```bash
pip install 'sqeleton[mysql, postgresql]'
```

Read more in [install / getting started.](https://sqeleton.readthedocs.io/en/latest/install.html)

### Basic usage

```python
from sqeleton import connect, table, this

# Create a new database connection
ddb = connect("duckdb://:memory:")

# Define a table with one int column
tbl = table('my_list', schema={'item': int})

# Make a bunch of queries
queries = [
# Create table 'my_list'
tbl.create(),

# Insert 100 numbers
tbl.insert_rows([x] for x in range(100)),

# Get the sum of the numbers
tbl.select(this.item.sum())
]
# Query in order, and return the last result as an int
result = ddb.query(queries, int)

# Prints: Total sum of 0..100 = 4950
print(f"Total sum of 0..100 = {result}")
```


# TODO

- Transactions

- Indexes

- Date/time expressions

- Window functions

## Possible plans for the future (not determined yet)

- Cache the compilation of repetitive queries for even faster query-building

- Compile control flow, functions

- Define tables using type-annotated classes (SQLModel style)
25 changes: 25 additions & 0 deletions vendor/sqeleton/dev/Dockerfile.prestosql.340
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
FROM openjdk:11-jdk-slim-buster

ENV PRESTO_VERSION=340
ENV PRESTO_SERVER_URL=https://repo1.maven.org/maven2/io/prestosql/presto-server/${PRESTO_VERSION}/presto-server-${PRESTO_VERSION}.tar.gz
ENV PRESTO_CLI_URL=https://repo1.maven.org/maven2/io/prestosql/presto-cli/${PRESTO_VERSION}/presto-cli-${PRESTO_VERSION}-executable.jar
ENV PRESTO_HOME=/opt/presto
ENV PATH=${PRESTO_HOME}/bin:${PATH}

WORKDIR $PRESTO_HOME

RUN set -xe \
&& apt-get update \
&& apt-get install -y curl less python \
&& curl -sSL $PRESTO_SERVER_URL | tar xz --strip 1 \
&& curl -sSL $PRESTO_CLI_URL > ./bin/presto \
&& chmod +x ./bin/presto \
&& apt-get remove -y curl \
&& rm -rf /var/lib/apt/lists/*

VOLUME /data

EXPOSE 8080

ENTRYPOINT ["launcher"]
CMD ["run"]
23 changes: 23 additions & 0 deletions vendor/sqeleton/dev/dev.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
POSTGRES_USER=postgres
POSTGRES_PASSWORD=Password1
POSTGRES_DB=postgres

MYSQL_DATABASE=mysql
MYSQL_USER=mysql
MYSQL_PASSWORD=Password1
MYSQL_ROOT_PASSWORD=RootPassword1

CLICKHOUSE_USER=clickhouse
CLICKHOUSE_PASSWORD=Password1
CLICKHOUSE_DB=clickhouse
CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1

# Vertica credentials
APP_DB_USER=vertica
APP_DB_PASSWORD=Password1
VERTICA_DB_NAME=vertica

# To prevent generating sample demo VMart data (more about it here https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/GettingStartedGuide/IntroducingVMart/IntroducingVMart.htm),
# leave VMART_DIR and VMART_ETL_SCRIPT empty.
VMART_DIR=
VMART_ETL_SCRIPT=
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
connector.name=jmx
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
connector.name=memory
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
connector.name=postgresql
connection-url=jdbc:postgresql://postgres:5432/postgres
connection-user=postgres
connection-password=Password1
allow-drop-table=true
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
connector.name=tpcds
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
connector.name=tpch
Loading