Skip to content

[DRIVERS-2926] [PYTHON-4577] BSON Binary Vector Subtype Support #1813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
245c869
First commit on DRIVERS-2926-BSON-Binary-Vectors
caseyclements Aug 22, 2024
031cd8c
Turns dtype into enum. Adds handling of padding, __eq__. Removal of n…
caseyclements Aug 23, 2024
8d4e8a2
Added docstring and comments
caseyclements Aug 23, 2024
2df0d6b
Changed order of BinaryVector and Binary in bson._ENCODERS to get tes…
caseyclements Aug 23, 2024
315a115
Changed order of BinaryVector and Binary in bson._ENCODERS to get tes…
caseyclements Aug 23, 2024
d74314d
json_util dumps/loads of BinaryVector
caseyclements Aug 23, 2024
27f13c8
Added bson_corpus tests. Needs more, and review of json_util
caseyclements Aug 24, 2024
263f8c7
Removed BinaryVector as separate class. Instead, Binary includes as_v…
caseyclements Sep 12, 2024
f8bcdef
Stop setting _USD_C to False
caseyclements Sep 13, 2024
5435785
mypy fixes
caseyclements Sep 13, 2024
5c4d152
Removed stub vector.json for bson_corpus tests
caseyclements Sep 13, 2024
f86d040
More tests
caseyclements Sep 13, 2024
adcb945
Added description of subtype 9 to bson.Binary docstring
caseyclements Sep 14, 2024
7986cc5
Addressed comments in docstrings.
caseyclements Sep 16, 2024
26b8398
Eased string comparison of exception in xfail in test_bson
caseyclements Sep 16, 2024
28de28a
Updates to docstrings of BinaryVector and BinaryVectorDtype
caseyclements Sep 17, 2024
68235b8
Simplified expected exeption case. Will be refactored with yaml anyway..
caseyclements Sep 17, 2024
e2a1a3c
Added draft of test runner
caseyclements Sep 18, 2024
bf9758a
Added test cases: padding, and overflow
caseyclements Sep 19, 2024
e1590aa
Merge branch 'master' into DRIVERS-2926-BSON-Binary-Vectors
caseyclements Sep 19, 2024
c4c7af7
Cast Path to str
caseyclements Sep 19, 2024
de5a245
Simplified as_vector API
caseyclements Sep 20, 2024
43bcce4
Added test case: list of floats with dtype int8 raises exception
caseyclements Sep 20, 2024
41ee0bb
Set default padding to 0 in test runner
caseyclements Sep 20, 2024
9d52aeb
Updated test_bson for new as_vector API
caseyclements Sep 20, 2024
0d34464
Updated resync-specs.sh to include bson-binary-vector
caseyclements Sep 20, 2024
1d49656
Updated resync-specs.sh and test cases
caseyclements Sep 20, 2024
2af0ca4
Broke tests into 3 files by dtype
caseyclements Sep 20, 2024
c93bae1
Update bson/binary.py
caseyclements Sep 27, 2024
f374b5a
Removed json from test_bson_binary_vector and its jsons
caseyclements Sep 27, 2024
0db9866
Addition of Provision (BETA) specifiers change references to 4.10
caseyclements Sep 30, 2024
0532803
Add references to from_vector() and as_vector()
caseyclements Sep 30, 2024
3edeef6
Add subtype number in changelog
caseyclements Sep 30, 2024
d199597
Raise ValueErrors not AssertionErrors. Bumped from 4.9 to 4.10
caseyclements Sep 30, 2024
abc7cd3
Docstring for as_vector
caseyclements Sep 30, 2024
4550c20
Add slots for BinaryVector
caseyclements Sep 30, 2024
99d44e1
Check subtype before decoding
caseyclements Oct 1, 2024
001636d
Try slots with default padding
caseyclements Oct 1, 2024
637c474
Removed slots arg
caseyclements Oct 1, 2024
2d511f6
Update dataclass
caseyclements Oct 1, 2024
17e1d33
Remove unompressed kwarg from as_vector
caseyclements Oct 1, 2024
ce5f3e3
Changed TypeError to ValueError
caseyclements Oct 1, 2024
edfe972
Updates after removing uncompressed
caseyclements Oct 1, 2024
8aaa2f6
Fixed expected exceptions in invalid test cases
caseyclements Oct 1, 2024
dfb322c
Merge branch 'master' into DRIVERS-2926-BSON-Binary-Vectors
blink1073 Oct 1, 2024
8946daf
padding in now Optional[int] = None
caseyclements Oct 1, 2024
9397129
padding does need to be an integer
caseyclements Oct 1, 2024
913403b
Removed unneeded ugly TYPE_FROM_HEX = {key.value: key for key in Bina…
caseyclements Oct 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
More tests
  • Loading branch information
caseyclements committed Sep 17, 2024
commit f86d04082087e82fec28337e603a82707a52837b
30 changes: 30 additions & 0 deletions test/bson_corpus/binary.json
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,36 @@
"description": "$type query operator (conflicts with legacy $binary form with $type field)",
"canonical_bson": "180000000378001000000010247479706500020000000000",
"canonical_extjson": "{\"x\" : { \"$type\" : {\"$numberInt\": \"2\"}}}"
},
{
"description": "subtype 0x09 Vector FLOAT32",
"canonical_bson": "170000000578000A0000000927000000FE420000E04000",
"canonical_extjson": "{\"x\": {\"$binary\": {\"base64\": \"JwAAAP5CAADgQA==\", \"subType\": \"09\"}}}"
},
{
"description": "subtype 0x09 Vector INT8",
"canonical_bson": "11000000057800040000000903007F0700",
"canonical_extjson": "{\"x\": {\"$binary\": {\"base64\": \"AwB/Bw==\", \"subType\": \"09\"}}}"
},
{
"description": "subtype 0x09 Vector PACKED_BIT",
"canonical_bson": "11000000057800040000000910007F0700",
"canonical_extjson": "{\"x\": {\"$binary\": {\"base64\": \"EAB/Bw==\", \"subType\": \"09\"}}}"
},
{
"description": "subtype 0x09 Vector (Zero-length) FLOAT32",
"canonical_bson": "0F0000000578000200000009270000",
"canonical_extjson": "{\"x\": {\"$binary\": {\"base64\": \"JwA=\", \"subType\": \"09\"}}}"
},
{
"description": "subtype 0x09 Vector (Zero-length) INT8",
"canonical_bson": "0F0000000578000200000009030000",
"canonical_extjson": "{\"x\": {\"$binary\": {\"base64\": \"AwA=\", \"subType\": \"09\"}}}"
},
{
"description": "subtype 0x09 Vector (Zero-length) PACKED_BIT",
"canonical_bson": "0F0000000578000200000009100000",
"canonical_extjson": "{\"x\": {\"$binary\": {\"base64\": \"EAA=\", \"subType\": \"09\"}}}"
}
],
"decodeErrors": [
Expand Down
29 changes: 26 additions & 3 deletions test/test_bson.py
Original file line number Diff line number Diff line change
Expand Up @@ -730,14 +730,17 @@ def test_uuid_legacy(self):
self.assertEqual(id, transformed)

def test_vector(self):
"""Tests of subtype 9"""
# We start with valid cases, across the 3 dtypes implemented.
# Work with a simple vector that can be interpreted as int8, float32, or ubyte
list_vector = [127, 7]
# As INT8, vector has length 2
binary_vector = Binary.from_vector(list_vector, BinaryVectorDtype.INT8)
vector = binary_vector.as_vector()
assert vector.data == list_vector
# test encoding roundtrip
assert {"vector": binary_vector} == decode(encode({"vector": binary_vector}))
# test json roundtrip # TODO - Is this the wrong place?
# test json roundtrip
assert binary_vector == json_util.loads(json_util.dumps(binary_vector))

# For vectors of bits, aka PACKED_BIT type, vector has length 8 * 2
Expand All @@ -759,13 +762,33 @@ def test_vector(self):
len(padded_vec.as_vector(BinaryVectorDtype.INT8).data) == 8 * len(list_vector) - padding
)

# It is worthwhile explicitly showing the values encoded to BSON
padded_doc = {"padded_vec": padded_vec}
assert (
encode(padded_doc)
== b"\x1a\x00\x00\x00\x05padded_vec\x00\x04\x00\x00\x00\t\x10\x03\x7f\x07\x00"
)
# and dumped to json
assert (
json_util.dumps(padded_doc)
== '{"padded_vec": {"$binary": {"base64": "EAN/Bw==", "subType": "09"}}}'
)

# FLOAT32 is also implemented
float_binary = Binary.from_vector(list_vector, BinaryVectorDtype.FLOAT32)
assert all(isinstance(d, float) for d in float_binary.as_vector().data)

# The C extension was segfaulting on unicode RegExs, so we have this test
# that doesn't really test anything but the lack of a segfault.
# Now some invalid cases
for x in [-1, 257]:
try:
Binary.from_vector([x], BinaryVectorDtype.PACKED_BIT)
except struct.error as e:
assert str(e) == "ubyte format requires 0 <= number <= 255"

def test_unicode_regex(self):
"""Tests we do not get a segfault for C extension on unicode RegExs.
This had been happening.
"""
regex = re.compile("revisi\xf3n")
decode(encode({"regex": regex}))

Expand Down