Skip to content
This repository was archived by the owner on Jan 12, 2024. It is now read-only.

Flesh out integration tests. #9

Merged
merged 6 commits into from
May 20, 2022
Merged

Flesh out integration tests. #9

merged 6 commits into from
May 20, 2022

Conversation

zaneselvans
Copy link
Member

Closes #6

@codecov
Copy link

codecov bot commented May 19, 2022

Codecov Report

Merging #9 (f8e2ffe) into main (492df1a) will increase coverage by 5.8%.
The diff coverage is 100.0%.

@@           Coverage Diff            @@
##            main       #9     +/-   ##
========================================
+ Coverage   94.1%   100.0%   +5.8%     
========================================
  Files          2        2             
  Lines         51       57      +6     
========================================
+ Hits          48       57      +9     
+ Misses         3        0      -3     
Impacted Files Coverage Δ
src/intake_sqlite/sqlite_src.py 100.0% <100.0%> (+7.6%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 492df1a...f8e2ffe. Read the comment docs.

@zaneselvans zaneselvans marked this pull request as ready for review May 20, 2022 02:58
@zaneselvans zaneselvans requested a review from bendnorman May 20, 2022 03:05
Copy link
Member

@bendnorman bendnorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great looking tests 👍 Left a comment and a question.

Comment on lines 51 to 66
con = engine.connect()
con.execute(
"""CREATE TABLE temp (
pk BIGINT PRIMARY KEY,
a REAL NOT NULL,
b BIGINT NOT NULL,
c TEXT NOT NULL);"""
)
con.execute(
"""CREATE TABLE temp2 (
d REAL NOT NULL,
e BIGINT NOT NULL,
f TEXT NOT NULL);"""
)
df1.to_sql("temp", con=engine, if_exists="append")
df2.to_sql("temp_nopk", con=engine, if_exists="append", index=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I typically use a context manager for sqlalchemy connections because it automatically closes the connection.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah that would be better. I adapted this from the intake-sql tests. I'll change it.



def test_auto_src_partition(temp_db: tuple[str, str, str], df1: pd.DataFrame) -> None:
"""Test automatic partitioning of table."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you figure out how it partitions the table?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, at least not in the case of the automatic partitioning. I adapted these test cases from the intake-sql test suite.

For the manual partitions you give it a list of WHERE clauses, which seems kind of janky -- like you could easily mess them up and have the same records showing up in more than one partition.

Also, because dask can't deal with multi-indexes, and the partitioning is done based on the index, we'll have to think a bit about how to use the partitioning if/when we get to tables that we want to read in their entirety, and are too large to work with in memory.

It looks like the behavior is really controlled by dask.dataframe.read_sql_table

@zaneselvans zaneselvans merged commit 4fb428c into main May 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flesh out software tests
2 participants