Skip to content

Complete Fetch Phase (EXTERNAL_LINKS disposition + ARROW format) #596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 380 commits into
base: fetch-json-inline
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
380 commits
Select commit Hold shift + click to select a range
677e66a
fix: ensure open attribute of Connection never fails
varun-edachali-dbx May 21, 2025
4ec8703
introduce databricksClient interface and thrift backend implementation
varun-edachali-dbx May 22, 2025
6ecb6bf
change names of ThriftBackend -> ThriftDatabricksClient in tests
varun-edachali-dbx May 22, 2025
fe2ce17
formatting: black + re-organise backend into new dir
varun-edachali-dbx May 23, 2025
568b1f4
Update CODEOWNERS (#562)
jprakash-db May 21, 2025
ebbd150
remove un-necessary example change
varun-edachali-dbx May 26, 2025
aa8af45
[empty commit] trigger integration tests
varun-edachali-dbx May 26, 2025
e7be76b
introduce normalised sessionId and CommandId for (near) complete back…
varun-edachali-dbx May 27, 2025
7461715
fix: Any is not defined
varun-edachali-dbx May 27, 2025
b295acd
fix: get_session_id_hex() is not defined
varun-edachali-dbx May 27, 2025
8afc5d5
command_handle -> command_id in ExecuteResponse
varun-edachali-dbx May 27, 2025
b8f9146
fix: active op handle -> active command id in Cursor
varun-edachali-dbx May 27, 2025
7c733ee
fixed (most) tests by accounting for normalised Session interface
varun-edachali-dbx May 27, 2025
0917ea1
fix: convert command id to operationHandle in status_request
varun-edachali-dbx May 27, 2025
3fd2a46
decouple session class from existing Connection
varun-edachali-dbx May 20, 2025
03d3ae7
add open property to Connection to ensure maintenance of existing API
varun-edachali-dbx May 20, 2025
fb0fa46
use connection open property instead of long chain through session
varun-edachali-dbx May 20, 2025
e3770cd
trigger integration workflow
varun-edachali-dbx May 20, 2025
3b8002f
fix: ensure open attribute of Connection never fails
varun-edachali-dbx May 21, 2025
f1a350a
fix: de-complicate earlier connection open logic
varun-edachali-dbx May 23, 2025
98b0dc7
Revert "fix: de-complicate earlier connection open logic"
varun-edachali-dbx May 23, 2025
afee423
[empty commit] attempt to trigger ci e2e workflow
varun-edachali-dbx May 23, 2025
2d24fdd
PECOBLR-86 improve logging on python driver (#556)
saishreeeee May 22, 2025
a5561e8
Revert "Merge remote-tracking branch 'upstream/sea-migration' into de…
varun-edachali-dbx May 23, 2025
0d890a5
Reapply "Merge remote-tracking branch 'upstream/sea-migration' into d…
varun-edachali-dbx May 23, 2025
c2aa762
fix: separate session opening logic from instantiation
varun-edachali-dbx May 23, 2025
fe642da
chore: use get_handle() instead of private session attribute in client
varun-edachali-dbx May 24, 2025
394333c
fix: remove accidentally removed assertions
varun-edachali-dbx May 26, 2025
ef07acd
generalise open session, fix session tests to consider positional args
varun-edachali-dbx May 27, 2025
1ef46cf
formatting (black)
varun-edachali-dbx May 27, 2025
76ca997
correct session logic after duplication during merge
varun-edachali-dbx May 27, 2025
afc6f8f
args -> kwargs in tests
varun-edachali-dbx May 27, 2025
f6660ba
delegate protocol version to SessionId
varun-edachali-dbx May 28, 2025
9871a93
ids -> backend/types
varun-edachali-dbx May 28, 2025
595d795
update open session with normalised SessionId
varun-edachali-dbx May 28, 2025
10ee940
remove merge artifacts, account for result set
varun-edachali-dbx May 28, 2025
3d75d6c
fix: import CommandId in client tests
varun-edachali-dbx May 28, 2025
b8e1bbd
expect session_id in protocol version getter
varun-edachali-dbx May 28, 2025
dac08f2
enforce ResultSet return in exec commands in backend client
varun-edachali-dbx May 28, 2025
7b0cbed
abstract Command State away from Thrift specific types
varun-edachali-dbx May 28, 2025
9267ef9
close_command return not used, replacing with None and logging resp
varun-edachali-dbx May 28, 2025
5f00532
move py.typed to correct places (#403)
wyattscarpenter Jul 2, 2024
59ed5ce
Upgrade mypy (#406)
wyattscarpenter Jul 3, 2024
c95951d
Do not retry failing requests with status code 401 (#408)
Hodnebo Jul 3, 2024
335d918
[PECO-1715] Remove username/password (BasicAuth) auth option (#409)
jackyhu-db Jul 4, 2024
ee4f94c
[PECO-1751] Refactor CloudFetch downloader: handle files sequentially…
kravets-levko Jul 11, 2024
1c8bb11
Fix CloudFetch retry policy to be compatible with all `urllib3` versi…
kravets-levko Jul 11, 2024
9de280e
Disable SSL verification for CloudFetch links (#414)
kravets-levko Jul 16, 2024
04b626a
Prepare relese 3.3.0 (#415)
kravets-levko Jul 17, 2024
2a01173
Fix pandas 2.2.2 support (#416)
kfollesdal Jul 26, 2024
b1faa09
[PECO-1801] Make OAuth as the default authenticator if no authenticat…
jackyhu-db Aug 1, 2024
270edcf
[PECO-1857] Use SSL options with HTTPS connection pool (#425)
kravets-levko Aug 22, 2024
8523fd3
Prepare release v3.4.0 (#430)
kravets-levko Aug 27, 2024
763f070
[PECO-1926] Create a non pyarrow flow to handle small results for the…
jprakash-db Oct 3, 2024
1e0d9d5
[PECO-1961] On non-retryable error, ensure PySQL includes useful info…
shivam2680 Oct 3, 2024
890cdd7
Reformatted all the files using black (#448)
jprakash-db Oct 3, 2024
9bdee1d
Prepare release v3.5.0 (#457)
jackyhu-db Oct 18, 2024
cdd7a19
[PECO-2051] Add custom auth headers into cloud fetch request (#460)
jackyhu-db Oct 25, 2024
fcc2da9
Prepare release 3.6.0 (#461)
jackyhu-db Oct 25, 2024
d354309
[ PECO - 1768 ] PySQL: adjust HTTP retry logic to align with Go and N…
jprakash-db Nov 20, 2024
d63544e
[ PECO-2065 ] Create the async execution flow for the PySQL Connector…
jprakash-db Nov 26, 2024
5bbf223
Fix for check_types github action failing (#472)
jprakash-db Nov 26, 2024
9c62b21
Remove upper caps on dependencies (#452)
arredond Dec 5, 2024
7bb7ca6
Updated the doc to specify native parameters in PUT operation is not …
jprakash-db Dec 6, 2024
438a080
Incorrect rows in inline fetch result (#479)
jprakash-db Dec 22, 2024
eb50411
Bumped up to version 3.7.0 (#482)
jprakash-db Dec 23, 2024
2a5b9c7
PySQL Connector split into connector and sqlalchemy (#444)
jprakash-db Dec 27, 2024
d31aa59
Removed CI CD for python3.8 (#490)
jprakash-db Jan 17, 2025
3e62c90
Added CI CD upto python 3.12 (#491)
jprakash-db Jan 18, 2025
f9a6b13
Merging changes from v3.7.1 release (#488)
jprakash-db Jan 18, 2025
a941575
Bumped up to version 4.0.0 (#493)
jprakash-db Jan 22, 2025
032c276
Updated action's version (#455)
newwingbird Feb 27, 2025
d36889d
Support Python 3.13 and update deps (#510)
dhirschfeld Feb 27, 2025
22e5ce4
Improve debugging + fix PR review template (#514)
samikshya-db Mar 2, 2025
7772403
Forward porting all changes into 4.x.x. uptil v3.7.3 (#529)
jprakash-db Mar 7, 2025
8b27150
Updated the actions/cache version (#532)
jprakash-db Mar 7, 2025
398db45
Updated the CODEOWNERS (#531)
jprakash-db Mar 7, 2025
c962b63
Add version check for urllib3 in backoff calculation (#526)
shivam2680 Mar 11, 2025
c246872
[ES-1372353] make user_agent_header part of public API (#530)
shivam2680 Mar 12, 2025
326f338
Updates runner used to run DCO check to use databricks-protected-runn…
madhav-db Mar 12, 2025
37e73a9
Support multiple timestamp formats in non arrow flow (#533)
jprakash-db Mar 18, 2025
3d7123c
prepare release for v4.0.1 (#534)
shivam2680 Mar 19, 2025
132e1b7
Relaxed bound for python-dateutil (#538)
jprakash-db Apr 1, 2025
46090c0
Bumped up the version for 4.0.2 (#539)
jprakash-db Apr 1, 2025
28249c0
Added example for async execute query (#537)
jprakash-db Apr 1, 2025
5ab0a2c
Added urllib3 version check (#547)
jprakash-db Apr 21, 2025
6528cd1
Bump version to 4.0.3 (#549)
jprakash-db Apr 22, 2025
8f7754b
Cleanup fields as they might be deprecated/removed/change in the futu…
vikrantpuppala May 9, 2025
f7d3865
Refactor decimal conversion in PyArrow tables to use direct casting (…
jayantsing-db May 12, 2025
61cc398
[PECOBLR-361] convert column table to arrow if arrow present (#551)
shivam2680 May 16, 2025
554d011
decouple session class from existing Connection
varun-edachali-dbx May 20, 2025
6f28297
add open property to Connection to ensure maintenance of existing API
varun-edachali-dbx May 20, 2025
983ec03
update unit tests to address ThriftBackend through session instead of…
varun-edachali-dbx May 20, 2025
6f3b5b7
chore: move session specific tests from test_client to test_session
varun-edachali-dbx May 20, 2025
29a2840
formatting (black)
varun-edachali-dbx May 20, 2025
0d28b69
use connection open property instead of long chain through session
varun-edachali-dbx May 20, 2025
8cb8cdd
trigger integration workflow
varun-edachali-dbx May 20, 2025
4495f9b
fix: ensure open attribute of Connection never fails
varun-edachali-dbx May 21, 2025
c744117
introduce databricksClient interface and thrift backend implementation
varun-edachali-dbx May 22, 2025
ef5a06b
change names of ThriftBackend -> ThriftDatabricksClient in tests
varun-edachali-dbx May 22, 2025
abbaaa5
fix: remove excess debug log
varun-edachali-dbx May 22, 2025
33765cb
fix: replace thrift_backend with backend in result set param
varun-edachali-dbx May 22, 2025
788d8c7
fix: replace module replacement with concrete mock instance in execut…
varun-edachali-dbx May 22, 2025
4debbd3
formatting: black + re-organise backend into new dir
varun-edachali-dbx May 23, 2025
0e6e215
fix: sql.thrift_backend -> sql.backend.thrift_backend in tests and ex…
varun-edachali-dbx May 23, 2025
925394c
Update CODEOWNERS (#562)
jprakash-db May 21, 2025
4ad6c8d
Enhance Cursor close handling and context manager exception managemen…
madhav-db May 21, 2025
51369c8
PECOBLR-86 improve logging on python driver (#556)
saishreeeee May 22, 2025
9541464
Update github actions run conditions (#569)
jprakash-db May 26, 2025
cbdd3d7
remove un-necessary example change
varun-edachali-dbx May 26, 2025
ca38e95
[empty commit] trigger integration tests
varun-edachali-dbx May 26, 2025
b40c0fd
fix: use backend in Cursor, not thrift_backend
varun-edachali-dbx May 26, 2025
35ed462
fix: backend references in integration tests
varun-edachali-dbx May 26, 2025
37f3af1
fix: thrift_backend -> backend in ResultSet reference in e2e test
varun-edachali-dbx May 26, 2025
09c5e2f
introduce normalised sessionId and CommandId for (near) complete back…
varun-edachali-dbx May 27, 2025
4ce6aab
fix: Any is not defined
varun-edachali-dbx May 27, 2025
307f447
fix: get_session_id_hex() is not defined
varun-edachali-dbx May 27, 2025
802d8dc
command_handle -> command_id in ExecuteResponse
varun-edachali-dbx May 27, 2025
944d446
fix: active op handle -> active command id in Cursor
varun-edachali-dbx May 27, 2025
6338083
fixed (most) tests by accounting for normalised Session interface
varun-edachali-dbx May 27, 2025
3658a91
fix: convert command id to operationHandle in status_request
varun-edachali-dbx May 27, 2025
8ef6ed6
decouple session class from existing Connection
varun-edachali-dbx May 20, 2025
61300b2
add open property to Connection to ensure maintenance of existing API
varun-edachali-dbx May 20, 2025
44e7d17
formatting (black)
varun-edachali-dbx May 20, 2025
d2035ea
use connection open property instead of long chain through session
varun-edachali-dbx May 20, 2025
8b4451b
trigger integration workflow
varun-edachali-dbx May 20, 2025
d21d2c3
fix: ensure open attribute of Connection never fails
varun-edachali-dbx May 21, 2025
21068a3
fix: de-complicate earlier connection open logic
varun-edachali-dbx May 23, 2025
476e763
Revert "fix: de-complicate earlier connection open logic"
varun-edachali-dbx May 23, 2025
1e1cf1e
[empty commit] attempt to trigger ci e2e workflow
varun-edachali-dbx May 23, 2025
b408c2c
PECOBLR-86 improve logging on python driver (#556)
saishreeeee May 22, 2025
73649f2
Revert "Merge remote-tracking branch 'upstream/sea-migration' into de…
varun-edachali-dbx May 23, 2025
a61df99
Reapply "Merge remote-tracking branch 'upstream/sea-migration' into d…
varun-edachali-dbx May 23, 2025
e1a2c0e
fix: separate session opening logic from instantiation
varun-edachali-dbx May 23, 2025
71ba9d5
chore: use get_handle() instead of private session attribute in client
varun-edachali-dbx May 24, 2025
160ba9f
fix: remove accidentally removed assertions
varun-edachali-dbx May 26, 2025
6b3436f
generalise open session, fix session tests to consider positional args
varun-edachali-dbx May 27, 2025
30849dc
formatting (black)
varun-edachali-dbx May 27, 2025
4d455bb
correct session logic after duplication during merge
varun-edachali-dbx May 27, 2025
6fc0834
args -> kwargs in tests
varun-edachali-dbx May 27, 2025
d254e48
delegate protocol version to SessionId
varun-edachali-dbx May 28, 2025
370627d
ids -> backend/types
varun-edachali-dbx May 28, 2025
ca1b57d
update open session with normalised SessionId
varun-edachali-dbx May 28, 2025
6c120c0
Merging changes from v3.7.1 release (#488)
jprakash-db Jan 18, 2025
cdf6865
Support Python 3.13 and update deps (#510)
dhirschfeld Feb 27, 2025
12ce717
Updated the actions/cache version (#532)
jprakash-db Mar 7, 2025
1215fd8
Add version check for urllib3 in backoff calculation (#526)
shivam2680 Mar 11, 2025
dd083f6
Support multiple timestamp formats in non arrow flow (#533)
jprakash-db Mar 18, 2025
8d30436
Added example for async execute query (#537)
jprakash-db Apr 1, 2025
066aef9
Added urllib3 version check (#547)
jprakash-db Apr 21, 2025
1ed3514
decouple session class from existing Connection
varun-edachali-dbx May 20, 2025
ca80f94
formatting (black)
varun-edachali-dbx May 20, 2025
6027fb1
use connection open property instead of long chain through session
varun-edachali-dbx May 20, 2025
7a2f9b5
trigger integration workflow
varun-edachali-dbx May 20, 2025
39294e9
fix: ensure open attribute of Connection never fails
varun-edachali-dbx May 21, 2025
709e910
Revert "fix: de-complicate earlier connection open logic"
varun-edachali-dbx May 23, 2025
1ad0ace
[empty commit] attempt to trigger ci e2e workflow
varun-edachali-dbx May 23, 2025
913da63
PECOBLR-86 improve logging on python driver (#556)
saishreeeee May 22, 2025
d8159e7
Revert "Merge remote-tracking branch 'upstream/sea-migration' into de…
varun-edachali-dbx May 23, 2025
0b91183
Reapply "Merge remote-tracking branch 'upstream/sea-migration' into d…
varun-edachali-dbx May 23, 2025
ff78b5f
fix: separate session opening logic from instantiation
varun-edachali-dbx May 23, 2025
c1d53d2
Enhance Cursor close handling and context manager exception managemen…
madhav-db May 21, 2025
a5a8e51
PECOBLR-86 improve logging on python driver (#556)
saishreeeee May 22, 2025
f7be10c
New Complex type test table + Github Action changes (#575)
jprakash-db May 28, 2025
a888dd6
remove excess logs, assertions, instantiations
varun-edachali-dbx May 28, 2025
29a2985
Merge remote-tracking branch 'origin/sea-migration' into backend-inte…
varun-edachali-dbx May 28, 2025
9b9735e
formatting (black) + remove excess log (merge artifact)
varun-edachali-dbx May 28, 2025
0a8226c
fix typing
varun-edachali-dbx May 28, 2025
42263c4
remove un-necessary check
varun-edachali-dbx May 28, 2025
ac984e4
remove un-necessary replace call
varun-edachali-dbx May 28, 2025
8da84e8
introduce __str__ methods for CommandId and SessionId
varun-edachali-dbx May 28, 2025
f4f27e3
Merge remote-tracking branch 'origin/backend-interface' into fetch-in…
varun-edachali-dbx May 28, 2025
4e3ccce
correct some merge artifacts
varun-edachali-dbx May 28, 2025
04eb8c1
replace match case with if else for compatibility with older python v…
varun-edachali-dbx May 28, 2025
ca425eb
correct TOperationState literal, remove un-necessary check
varun-edachali-dbx May 28, 2025
7a47dd0
chore: remove duplicate def
varun-edachali-dbx May 28, 2025
00d9aeb
correct typing
varun-edachali-dbx May 28, 2025
eecc67d
docstrings for DatabricksClient interface
varun-edachali-dbx May 29, 2025
9800636
stronger typing of Cursor and ExecuteResponse
varun-edachali-dbx May 29, 2025
e07f56c
remove utility functions from backend interface, fix circular import
varun-edachali-dbx May 29, 2025
73fb141
rename info to properties
varun-edachali-dbx May 29, 2025
d838653
newline for cleanliness
varun-edachali-dbx May 29, 2025
6654f06
fix circular import
varun-edachali-dbx May 29, 2025
89425f9
formatting (black)
varun-edachali-dbx May 29, 2025
93e55e8
to_hex_id -> get_hex_id
varun-edachali-dbx May 29, 2025
7689d75
better comment on protocol version getter
varun-edachali-dbx May 29, 2025
1ec8c45
formatting (black)
varun-edachali-dbx May 29, 2025
80b7bc3
Merge remote-tracking branch 'origin/backend-interface' into fetch-in…
varun-edachali-dbx May 29, 2025
904efe7
stricter typing for cursor
varun-edachali-dbx May 29, 2025
c91bc37
correct typing
varun-edachali-dbx May 29, 2025
42a0d08
init sea backend
varun-edachali-dbx May 29, 2025
1056bc2
move test script into experimental dir
varun-edachali-dbx May 29, 2025
d59880c
formatting (black)
varun-edachali-dbx May 29, 2025
16ff4ec
cleanup: removed excess comments, validated decisions
varun-edachali-dbx May 29, 2025
0bba7f1
init sea exec
varun-edachali-dbx May 29, 2025
1da5694
introduce models
varun-edachali-dbx May 30, 2025
f23ef8f
introduce req resp models, update example tester script
varun-edachali-dbx May 30, 2025
95b0781
add unit tests for sea backend
varun-edachali-dbx May 30, 2025
ef09dbe
Merge remote-tracking branch 'origin/sess-sea' into exec-sea
varun-edachali-dbx May 30, 2025
d552695
introduce unit tests for sea backend
varun-edachali-dbx May 30, 2025
b8a170e
typing, change DESCRIBE TABLE to SHOW COLUMNS
varun-edachali-dbx Jun 2, 2025
1003319
remove model redundancies
varun-edachali-dbx Jun 2, 2025
9f11c9d
raise ServerOpError in case of not SUCCEEDED state
varun-edachali-dbx Jun 2, 2025
e1c7091
simplify logging, comments
varun-edachali-dbx Jun 2, 2025
b6d7c0c
review metadata ops
varun-edachali-dbx Jun 2, 2025
611d79f
result compression
varun-edachali-dbx Jun 3, 2025
edd4c87
client side table_types filtering
varun-edachali-dbx Jun 3, 2025
9395141
preliminary table filtering
varun-edachali-dbx Jun 4, 2025
7fdc01d
filters and sea_result_set unit tests
varun-edachali-dbx Jun 4, 2025
2871e05
stronger typing on ResultSet
varun-edachali-dbx Jun 4, 2025
15a8efc
fix type issues
varun-edachali-dbx Jun 4, 2025
f0d9c65
init fetch phase JSON + INLINE
varun-edachali-dbx Jun 4, 2025
c540987
working example script
varun-edachali-dbx Jun 4, 2025
6862929
introduce SeaResultSetQueueFactory
varun-edachali-dbx Jun 4, 2025
6e1b4d3
raise error for invalid data
varun-edachali-dbx Jun 4, 2025
1aec8b9
add metadata tests do example script
varun-edachali-dbx Jun 4, 2025
ad45046
redundancies + op_handle -> command_id
varun-edachali-dbx Jun 4, 2025
448476b
ensure empty data allowed
varun-edachali-dbx Jun 4, 2025
fb35f69
refactor fetch interface
varun-edachali-dbx Jun 5, 2025
2b8b4c4
cleaner result logging
varun-edachali-dbx Jun 5, 2025
8da6f5a
tabletype is 5th col
varun-edachali-dbx Jun 5, 2025
71266b1
remove redundant params (byte limit, catalog, schema) in exec command
varun-edachali-dbx Jun 7, 2025
d7ab57f
init cloud fetch stuffs
varun-edachali-dbx Jun 7, 2025
75c5a62
fixed multi chunk
varun-edachali-dbx Jun 7, 2025
74dd311
move get chunk links into backend
varun-edachali-dbx Jun 8, 2025
db185b9
abstract cloud fetch queue stuff
varun-edachali-dbx Jun 8, 2025
db22f6e
CloudQueueBase -> CloudFetchQueue, CloudFetchQueue -> ThriftCloudFetc…
varun-edachali-dbx Jun 8, 2025
d5322eb
explicitly declare CloudFetchQueue as ABC
varun-edachali-dbx Jun 8, 2025
961873a
clean up SeaCloudFetchQueue
varun-edachali-dbx Jun 8, 2025
b765e33
clean up Queue more
varun-edachali-dbx Jun 8, 2025
7ce9d28
ease log warnings
varun-edachali-dbx Jun 8, 2025
cdc7f42
move sea stuff into sea/ dir
varun-edachali-dbx Jun 9, 2025
54c7f6d
Merge branch 'exec-resp-norm' into cloudfetch-sea
varun-edachali-dbx Jun 10, 2025
dd7c410
add back SeaResltSet (to fix)
varun-edachali-dbx Jun 10, 2025
e116d9b
fix merge artifacts
varun-edachali-dbx Jun 10, 2025
bddef1f
add back og test
varun-edachali-dbx Jun 10, 2025
b404af7
refactors
varun-edachali-dbx Jun 11, 2025
8cf118f
Merge branch 'exec-resp-norm' into cloudfetch-sea
varun-edachali-dbx Jun 11, 2025
a8004a0
fix tests
varun-edachali-dbx Jun 11, 2025
0d4f49a
fix types
varun-edachali-dbx Jun 11, 2025
66c2baa
default to case sensitive comparison in client side table type filtering
varun-edachali-dbx Jun 11, 2025
936713c
lots of loffing
varun-edachali-dbx Jun 11, 2025
d426655
move to new test structure
varun-edachali-dbx Jun 12, 2025
1fef8f3
revert to old script
varun-edachali-dbx Jun 12, 2025
ce08e01
cloudfetch fix?
varun-edachali-dbx Jun 13, 2025
75505cf
remove non cloud fetch form multi chunk test
varun-edachali-dbx Jun 13, 2025
c0bf461
merge
varun-edachali-dbx Jun 13, 2025
597f116
Revert "Merge branch 'sea-res-set' into fetch-json-inline"
varun-edachali-dbx Jun 13, 2025
97d65d3
Reapply "Merge branch 'sea-res-set' into fetch-json-inline"
varun-edachali-dbx Jun 13, 2025
0ae2ab0
Revert "merge"
varun-edachali-dbx Jun 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 11 additions & 3 deletions .github/workflows/code-quality-checks.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,15 @@
name: Code Quality Checks

on: [pull_request]

on:
push:
branches:
- main
- sea-migration
- telemetry
pull_request:
branches:
- main
- sea-migration
- telemetry
jobs:
run-unit-tests:
runs-on: ubuntu-latest
Expand Down
10 changes: 7 additions & 3 deletions .github/workflows/integration.yml
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
name: Integration Tests

on:
push:
push:
paths-ignore:
- "**.MD"
- "**.md"
pull_request:
branches:
- main
pull_request:
- sea-migration
- telemetry

jobs:
run-e2e-tests:
Expand Down
50 changes: 26 additions & 24 deletions examples/experimental/sea_connector_test.py
Original file line number Diff line number Diff line change
@@ -1,51 +1,54 @@
"""
Main script to run all SEA connector tests.

This script imports and runs all the individual test modules and displays
This script runs all the individual test modules and displays
a summary of test results with visual indicators.
"""
import os
import sys
import logging
import importlib.util
from typing import Dict, Callable, List, Tuple
import subprocess
from typing import List, Tuple

# Configure logging
logging.basicConfig(level=logging.INFO)
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

# Define test modules and their main test functions
TEST_MODULES = [
"test_sea_session",
"test_sea_sync_query",
"test_sea_async_query",
"test_sea_metadata",
"test_sea_multi_chunk",
]


def load_test_function(module_name: str) -> Callable:
"""Load a test function from a module."""
def run_test_module(module_name: str) -> bool:
"""Run a test module and return success status."""
module_path = os.path.join(
os.path.dirname(os.path.abspath(__file__)), "tests", f"{module_name}.py"
)

spec = importlib.util.spec_from_file_location(module_name, module_path)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
# Handle the multi-chunk test which is in the main directory
if module_name == "test_sea_multi_chunk":
module_path = os.path.join(
os.path.dirname(os.path.abspath(__file__)), f"{module_name}.py"
)

# Simply run the module as a script - each module handles its own test execution
result = subprocess.run(
[sys.executable, module_path], capture_output=True, text=True
)

# Get the main test function (assuming it starts with "test_")
for name in dir(module):
if name.startswith("test_") and callable(getattr(module, name)):
# For sync and async query modules, we want the main function that runs both tests
if name == f"test_sea_{module_name.replace('test_sea_', '')}_exec":
return getattr(module, name)
# Log the output from the test module
if result.stdout:
for line in result.stdout.strip().split("\n"):
logger.info(line)

# Fallback to the first test function found
for name in dir(module):
if name.startswith("test_") and callable(getattr(module, name)):
return getattr(module, name)
if result.stderr:
for line in result.stderr.strip().split("\n"):
logger.error(line)

raise ValueError(f"No test function found in module {module_name}")
return result.returncode == 0


def run_tests() -> List[Tuple[str, bool]]:
Expand All @@ -54,12 +57,11 @@ def run_tests() -> List[Tuple[str, bool]]:

for module_name in TEST_MODULES:
try:
test_func = load_test_function(module_name)
logger.info(f"\n{'=' * 50}")
logger.info(f"Running test: {module_name}")
logger.info(f"{'-' * 50}")

success = test_func()
success = run_test_module(module_name)
results.append((module_name, success))

status = "✅ PASSED" if success else "❌ FAILED"
Expand Down
223 changes: 223 additions & 0 deletions examples/experimental/test_sea_multi_chunk.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
"""
Test for SEA multi-chunk responses.

This script tests the SEA connector's ability to handle multi-chunk responses correctly.
It runs a query that generates large rows to force multiple chunks and verifies that
the correct number of rows are returned.
"""
import os
import sys
import logging
import time
import json
import csv
from pathlib import Path
from databricks.sql.client import Connection

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


def test_sea_multi_chunk_with_cloud_fetch(requested_row_count=5000):
"""
Test executing a query that generates multiple chunks using cloud fetch.

Args:
requested_row_count: Number of rows to request in the query

Returns:
bool: True if the test passed, False otherwise
"""
server_hostname = os.environ.get("DATABRICKS_SERVER_HOSTNAME")
http_path = os.environ.get("DATABRICKS_HTTP_PATH")
access_token = os.environ.get("DATABRICKS_TOKEN")
catalog = os.environ.get("DATABRICKS_CATALOG")

# Create output directory for test results
output_dir = Path("test_results")
output_dir.mkdir(exist_ok=True)

# Files to store results
rows_file = output_dir / "cloud_fetch_rows.csv"
stats_file = output_dir / "cloud_fetch_stats.json"

if not all([server_hostname, http_path, access_token]):
logger.error("Missing required environment variables.")
logger.error(
"Please set DATABRICKS_SERVER_HOSTNAME, DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN."
)
return False

try:
# Create connection with cloud fetch enabled
logger.info(
"Creating connection for query execution with cloud fetch enabled"
)
connection = Connection(
server_hostname=server_hostname,
http_path=http_path,
access_token=access_token,
catalog=catalog,
schema="default",
use_sea=True,
user_agent_entry="SEA-Test-Client",
use_cloud_fetch=True,
)

logger.info(
f"Successfully opened SEA session with ID: {connection.get_session_id_hex()}"
)

# Execute a query that generates large rows to force multiple chunks
cursor = connection.cursor()
query = f"""
SELECT
id,
concat('value_', repeat('a', 10000)) as test_value
FROM range(1, {requested_row_count} + 1) AS t(id)
"""

logger.info(f"Executing query with cloud fetch to generate {requested_row_count} rows")
start_time = time.time()
cursor.execute(query)

# Fetch all rows
rows = cursor.fetchall()
actual_row_count = len(rows)
end_time = time.time()
execution_time = end_time - start_time

logger.info(f"Query executed in {execution_time:.2f} seconds")
logger.info(f"Requested {requested_row_count} rows, received {actual_row_count} rows")

# Write rows to CSV file for inspection
logger.info(f"Writing rows to {rows_file}")
with open(rows_file, 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(['id', 'value_length']) # Header

# Extract IDs to check for duplicates and missing values
row_ids = []
for row in rows:
row_id = row[0]
value_length = len(row[1])
writer.writerow([row_id, value_length])
row_ids.append(row_id)

# Verify row count
success = actual_row_count == requested_row_count

# Check for duplicate IDs
unique_ids = set(row_ids)
duplicate_count = len(row_ids) - len(unique_ids)

# Check for missing IDs
expected_ids = set(range(1, requested_row_count + 1))
missing_ids = expected_ids - unique_ids
extra_ids = unique_ids - expected_ids

# Write statistics to JSON file
stats = {
"requested_row_count": requested_row_count,
"actual_row_count": actual_row_count,
"execution_time_seconds": execution_time,
"duplicate_count": duplicate_count,
"missing_ids_count": len(missing_ids),
"extra_ids_count": len(extra_ids),
"missing_ids": list(missing_ids)[:100] if missing_ids else [], # Limit to first 100 for readability
"extra_ids": list(extra_ids)[:100] if extra_ids else [], # Limit to first 100 for readability
"success": success and duplicate_count == 0 and len(missing_ids) == 0 and len(extra_ids) == 0
}

with open(stats_file, 'w') as f:
json.dump(stats, f, indent=2)

# Log detailed results
if duplicate_count > 0:
logger.error(f"❌ FAILED: Found {duplicate_count} duplicate row IDs")
success = False
else:
logger.info("✅ PASSED: No duplicate row IDs found")

if missing_ids:
logger.error(f"❌ FAILED: Missing {len(missing_ids)} expected row IDs")
if len(missing_ids) <= 10:
logger.error(f"Missing IDs: {sorted(list(missing_ids))}")
success = False
else:
logger.info("✅ PASSED: All expected row IDs present")

if extra_ids:
logger.error(f"❌ FAILED: Found {len(extra_ids)} unexpected row IDs")
if len(extra_ids) <= 10:
logger.error(f"Extra IDs: {sorted(list(extra_ids))}")
success = False
else:
logger.info("✅ PASSED: No unexpected row IDs found")

if actual_row_count == requested_row_count:
logger.info("✅ PASSED: Row count matches requested count")
else:
logger.error(f"❌ FAILED: Row count mismatch. Expected {requested_row_count}, got {actual_row_count}")
success = False

# Close resources
cursor.close()
connection.close()
logger.info("Successfully closed SEA session")

logger.info(f"Test results written to {rows_file} and {stats_file}")
return success

except Exception as e:
logger.error(
f"Error during SEA multi-chunk test with cloud fetch: {str(e)}"
)
import traceback
logger.error(traceback.format_exc())
return False


def main():
# Check if required environment variables are set
required_vars = [
"DATABRICKS_SERVER_HOSTNAME",
"DATABRICKS_HTTP_PATH",
"DATABRICKS_TOKEN",
]
missing_vars = [var for var in required_vars if not os.environ.get(var)]

if missing_vars:
logger.error(
f"Missing required environment variables: {', '.join(missing_vars)}"
)
logger.error("Please set these variables before running the tests.")
sys.exit(1)

# Get row count from command line or use default
requested_row_count = 5000

if len(sys.argv) > 1:
try:
requested_row_count = int(sys.argv[1])
except ValueError:
logger.error(f"Invalid row count: {sys.argv[1]}")
logger.error("Please provide a valid integer for row count.")
sys.exit(1)

logger.info(f"Testing with {requested_row_count} rows")

# Run the multi-chunk test with cloud fetch
success = test_sea_multi_chunk_with_cloud_fetch(requested_row_count)

# Report results
if success:
logger.info("✅ TEST PASSED: Multi-chunk cloud fetch test completed successfully")
sys.exit(0)
else:
logger.error("❌ TEST FAILED: Multi-chunk cloud fetch test encountered errors")
sys.exit(1)


if __name__ == "__main__":
main()
Loading
Loading