Skip to content

[DB-40533] Add support for new type VECTOR(<dim>,DOUBLE) #182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

alampe3ds
Copy link

Add encode/decode support for new VECTOR datatype.

A result column of VECTOR(, DOUBLE) shows as type datatype.VECTOR_DOUBLE in the metadata description.

It is returned as datatype.Vector with subtype
datatype.Vector.DOUBLE which can be used as a list.

For a parameter to be detected as VECTOR(,DOUBLE) it also needs to use the datatype.Vector datatype. Alternatively, binding a string is also possible - which currently includes binding a list of numbers which will then be converted to string.

This commit also adds tests for the described behavior.

Ansgar Lampe added 3 commits June 17, 2025 17:18
Add encode/decode support for new VECTOR datatype.

A result column of VECTOR(<dim>, DOUBLE) shows as type
datatype.VECTOR_DOUBLE in the metadata description.

It is returned as datatype.Vector with subtype
datatype.Vector.DOUBLE which can be used as a list.

For a parameter to be detected as VECTOR(<dim>,DOUBLE)
it also needs to use the datatype.Vector datatype.
Alternatively, binding a string is also possible -
which currently includes binding a list of numbers which
will then be converted to string.

This commit also adds tests for the described behavior.
- remove whitespace line
- add documentation

# the actual vector: Each value as double in little endian encoding
for val in value:
self.__output += struct.pack('<d', float(val))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work if the value is NULL?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A NULL should be set via putNull, not putVectorDouble.

But assuming someone would put some kind of NULL-value in here: What NULL representation would you expect there? I'll try to outline a few:

  • If value is None this path will not be taken (we only reach here through putValue below which calls putVector nly for instances of Vector
  • If value is an empty list this would encode an empty list. I did not include that in the tests, but I am pretty sure that would trigger a server side error with a VECTOR(0) not being supported or the like. Could add a test and ensure it works correctly.

Copy link
Contributor

@madscientist madscientist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider my comments on detecting the server version in the test.

Otherwise it looks OK to me.

@@ -149,10 +151,37 @@ def __cmp__(self, other):
return -1


class Vector(list):
"""A specific type for SQL VECTOR(<dim>, DOUBLE)
Copy link
Contributor

@madscientist madscientist Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not quite right, is it? A Vector element doesn't have to be a DOUBLE (I mean, it does now but it might not in the future)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. It is supposed to be usable with other element types as well. Will fix that.

Comment on lines +135 to +141
# only activate this tests if tested against version 8 or above
cursor.execute("select cast(substring_index(release_ver, '.', 1) as int)"
" from system.nodes limit 1")
row = cursor.fetchone()
database_major_version = row[0]
if database_major_version < 8:
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a problem with this, per se.

I think it would be cleaner to obtain this information once somewhere in conftest.py and make it available there, for example maybe in the DATABASE_FIXTURE since that already creates a connection.

The current DATABASE_FIXTURE is the connection arguments but you could add this there and then extract it again in the nuodb_base.NuoBase._setup().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into that. If it isn't extremely hard (which I doubt) I'll do that.

Comment on lines +136 to +137
cursor.execute("select cast(substring_index(release_ver, '.', 1) as int)"
" from system.nodes limit 1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be safer and easier to use geteffectiveversion() and compare based on that integer value, rather than parsing the version string?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just didn't know we have that function. Much better idea to use that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants