-
-
Notifications
You must be signed in to change notification settings - Fork 0
Conversation
Codecov Report
@@ Coverage Diff @@
## main #9 +/- ##
========================================
+ Coverage 94.1% 100.0% +5.8%
========================================
Files 2 2
Lines 51 57 +6
========================================
+ Hits 48 57 +9
+ Misses 3 0 -3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great looking tests 👍 Left a comment and a question.
tests/conftest.py
Outdated
con = engine.connect() | ||
con.execute( | ||
"""CREATE TABLE temp ( | ||
pk BIGINT PRIMARY KEY, | ||
a REAL NOT NULL, | ||
b BIGINT NOT NULL, | ||
c TEXT NOT NULL);""" | ||
) | ||
con.execute( | ||
"""CREATE TABLE temp2 ( | ||
d REAL NOT NULL, | ||
e BIGINT NOT NULL, | ||
f TEXT NOT NULL);""" | ||
) | ||
df1.to_sql("temp", con=engine, if_exists="append") | ||
df2.to_sql("temp_nopk", con=engine, if_exists="append", index=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I typically use a context manager for sqlalchemy connections because it automatically closes the connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah that would be better. I adapted this from the intake-sql
tests. I'll change it.
|
||
|
||
def test_auto_src_partition(temp_db: tuple[str, str, str], df1: pd.DataFrame) -> None: | ||
"""Test automatic partitioning of table.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you figure out how it partitions the table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, at least not in the case of the automatic partitioning. I adapted these test cases from the intake-sql
test suite.
For the manual partitions you give it a list of WHERE
clauses, which seems kind of janky -- like you could easily mess them up and have the same records showing up in more than one partition.
Also, because dask
can't deal with multi-indexes, and the partitioning is done based on the index, we'll have to think a bit about how to use the partitioning if/when we get to tables that we want to read in their entirety, and are too large to work with in memory.
It looks like the behavior is really controlled by dask.dataframe.read_sql_table
Closes #6