Skip to content

feat: support union type for basic types #510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 100 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
100 commits
Select commit Hold shift + click to select a range
5b33175
Add union type and JSON schema conversion
chardoncs May 7, 2025
a7f7e85
Add basic postgres conversion
chardoncs May 7, 2025
7b959b9
Compact lines
chardoncs May 7, 2025
6b6aae9
Workaround for JSON conversion in union
chardoncs May 7, 2025
bb18d97
Add stub impl for Python union conversion
chardoncs May 7, 2025
a2b5906
Add impl for python object union conversion
chardoncs May 8, 2025
0e19865
Update error message
chardoncs May 8, 2025
4cbd632
Rename type item
chardoncs May 8, 2025
69cc231
Add str parsing method
chardoncs May 8, 2025
3b386b6
Add union conversion for Qdrant
chardoncs May 8, 2025
159e8e3
Merge branch 'cocoindex-io:main' into expr-union-type-impl
chardoncs May 8, 2025
606d26e
Add basic string parsing for union type
chardoncs May 8, 2025
c68fd2a
Fix union conversion for Qdrant
chardoncs May 8, 2025
839ee85
Merge branch 'cocoindex-io:main' into expr-union-type-impl
chardoncs May 9, 2025
4752970
Replace if guards with matches
chardoncs May 9, 2025
b16d0f6
Add extra parsing for string
chardoncs May 9, 2025
c404388
Add rustdoc for parsing method
chardoncs May 9, 2025
312ef53
Turn string parsing into a util function
chardoncs May 9, 2025
c2ebd88
Update union parsing for serde value
chardoncs May 9, 2025
703565b
Add vector union type parsing for Qdrant
chardoncs May 9, 2025
d2d02b0
Switch to BTreeSet for union types
chardoncs May 9, 2025
72537b8
Remove nested union detection
chardoncs May 9, 2025
f837121
Remove TODO: Support struct/table
chardoncs May 9, 2025
2785e1c
Add union type helper struct
chardoncs May 9, 2025
a1f47a6
Add comments
chardoncs May 9, 2025
cca7d1f
Update Python type conversion for union type
chardoncs May 9, 2025
f9e9bec
Use reversed iteration for union type matching
chardoncs May 9, 2025
189a52d
Merge branch 'cocoindex-io:main' into expr-union-type-impl
chardoncs May 10, 2025
b1e1084
Merge branch 'cocoindex-io:main' into expr-union-type-impl
chardoncs May 11, 2025
4ec510d
Add test cases for union fmt
chardoncs May 11, 2025
9a31d2e
Update comments
chardoncs May 11, 2025
c168dcc
Add test cases
chardoncs May 11, 2025
2e06096
Remove "undetected JSON" parsing
chardoncs May 11, 2025
1e9d74c
Merge branch 'cocoindex-io:main' into expr-union-type-impl
chardoncs May 12, 2025
780d154
Merge branch 'cocoindex-io:main' into expr-union-type-impl
chardoncs May 13, 2025
0ffe380
Merge branch 'cocoindex-io:main' into expr-union-type-impl
chardoncs May 14, 2025
f384c16
Merge branch 'cocoindex-io:main' into expr-union-type-impl
chardoncs May 15, 2025
4463e52
Merge branch 'cocoindex-io:main' into expr-union-type-impl
chardoncs May 15, 2025
69a0280
Merge branch 'cocoindex-io:main' into expr-union-type-impl
chardoncs May 16, 2025
1d51d5c
Update union analysis in Python API
chardoncs May 16, 2025
efa1861
Add union type encoding for Python API
chardoncs May 16, 2025
6294ecf
Add single type checking for union type analysis
chardoncs May 16, 2025
fd37218
Merge branch 'cocoindex-io:main' into expr-union-type-impl
chardoncs May 17, 2025
052815a
Merge branch 'cocoindex-io:main' into expr-union-type-impl
chardoncs May 18, 2025
66a09ec
Update union type
chardoncs May 18, 2025
385487f
Merge branch 'cocoindex-io:main' into expr-union-type-impl
chardoncs May 18, 2025
f8eb3cc
Add union decoding
chardoncs May 18, 2025
bdd4b16
Revert "Add union decoding"
chardoncs May 18, 2025
534c791
Update encoded type field
chardoncs May 18, 2025
e8b972e
Merge branch 'cocoindex-io:main' into expr-union-type-impl
chardoncs May 19, 2025
beaa1c1
Update union types field in Python
chardoncs May 19, 2025
39b6039
Merge branch 'cocoindex-io:main' into expr-union-type-impl
chardoncs May 20, 2025
fa2dd14
Merge branch 'main' into expr-union-type-impl
chardoncs May 21, 2025
030281c
Update type serialization
chardoncs May 21, 2025
a037d9a
Revert "Update type serialization"
chardoncs May 22, 2025
224cb5b
Merge branch 'main' into expr-union-type-impl
chardoncs May 22, 2025
d4916d2
Add `UnionVariant` and conversions in `BasicValue`
chardoncs May 22, 2025
8869002
Merge branch 'main' into expr-union-type-impl
chardoncs May 23, 2025
7f08070
Add union value binding for Postgres
chardoncs May 23, 2025
0571e8c
Update type guessing for union from python object
chardoncs May 23, 2025
edeabe7
Replace direct return with break
chardoncs May 23, 2025
d194117
Use `Vec` to remove auto-sort
chardoncs May 25, 2025
fe4941b
Revert "Use `Vec` to remove auto-sort"
chardoncs May 25, 2025
0850cdc
Merge branch 'main' into expr-union-type-impl
chardoncs May 25, 2025
2775b28
Merge branch 'main' into expr-union-type-impl
chardoncs May 27, 2025
d649e0a
Merge branch 'main' into expr-union-type-impl
chardoncs May 29, 2025
9677169
Use `Vec` for union type
chardoncs May 29, 2025
bf2811c
Add union processing for KuzuDB
chardoncs May 29, 2025
a5f6c6c
Update Cypher generation for union type
chardoncs May 29, 2025
d48f6c5
Use 0-based index for `val{i}`
chardoncs May 29, 2025
abb920e
Update tuple
chardoncs Jun 2, 2025
09764e2
Merge branch 'main' into expr-union-type-impl
chardoncs Jun 2, 2025
b246485
Take values for JSON conversion for union
chardoncs Jun 2, 2025
34d0ca3
Update variable name
chardoncs Jun 2, 2025
d029def
Use typed value conversion for union in Postgres
chardoncs Jun 2, 2025
ce45aaa
Replace union conversion with error in `from_pg_value()`
chardoncs Jun 2, 2025
2c7d106
Update union conversion for Qdrant
chardoncs Jun 2, 2025
1a78b48
Update `PyErr` message for union
chardoncs Jun 2, 2025
e8ed867
Move `UnionType` to `schema.rs` as `UnionTypeSchema`
chardoncs Jun 2, 2025
f577da7
Use `to_value()` for union value conversion
chardoncs Jun 2, 2025
59f4bce
Use `bail!()` for early return
chardoncs Jun 2, 2025
70d2010
Update error message for union tuple conversion
chardoncs Jun 2, 2025
a913522
Merge branch 'main' into expr-union-type-impl
chardoncs Jun 2, 2025
dd0d48f
Merge branch 'main' into expr-union-type-impl
chardoncs Jun 3, 2025
549dc30
Move union type checking to the loop
chardoncs Jun 3, 2025
240cf16
Replace `.ok_or_else()` with `.unwrap()`
chardoncs Jun 3, 2025
4832227
Update union variant serialization
chardoncs Jun 3, 2025
5c48526
Merge branch 'main' into expr-union-type-impl
chardoncs Jun 5, 2025
2795d4b
Merge branch 'main' into expr-union-type-impl
chardoncs Jun 11, 2025
af45e67
Merge branch 'main' into expr-union-type-impl
chardoncs Jun 12, 2025
d093571
Match quote styling
chardoncs Jun 12, 2025
866865c
Break infinite loops
chardoncs Jun 12, 2025
6de65b5
Added a union test case
chardoncs Jun 12, 2025
4bcc53d
Fix union typing
chardoncs Jun 12, 2025
4157ff0
Merge branch 'main' into expr-union-type-impl
chardoncs Jun 14, 2025
64cf435
Make `union_variant_types` optional
chardoncs Jun 14, 2025
e873f65
Merge branch 'main' into expr-union-type-impl
chardoncs Jun 14, 2025
ae6e5bc
Update test case
chardoncs Jun 14, 2025
d1205cf
Fix JSON seder and decoding
chardoncs Jun 14, 2025
f748cda
Add UUID union test cases
chardoncs Jun 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions python/cocoindex/convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,9 @@ def decode_vector(value: Any) -> Any | None:

return decode_vector

if src_type_kind == "Union":
return lambda value: value[1]

return lambda value: value


Expand Down
18 changes: 18 additions & 0 deletions python/cocoindex/tests/test_convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -477,6 +477,24 @@ def test_field_position_cases(
assert decoder(engine_val) == PythonOrder(**expected_dict)


def test_roundtrip_union_simple() -> None:
t = int | str | float
value = 10.4
validate_full_roundtrip(value, t)


def test_roundtrip_union_with_active_uuid() -> None:
t = str | uuid.UUID | int
value = uuid.uuid4().bytes
validate_full_roundtrip(value, t)


def test_roundtrip_union_with_inactive_uuid() -> None:
t = str | uuid.UUID | int
value = "5a9f8f6a-318f-4f1f-929d-566d7444a62d" # it's a string
validate_full_roundtrip(value, t)


def test_roundtrip_ltable() -> None:
t = list[Order]
value = [Order("O1", "item1", 10.0), Order("O2", "item2", 20.0)]
Expand Down
38 changes: 26 additions & 12 deletions python/cocoindex/typing.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we also need to update make_engine_value_decoder() in convert.py.

Copy link
Author

@chardoncs chardoncs May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm currently confused by this decoding. It seems like there is no way to encode a union engine value from a Python value. The Python union type cannot be detected from encode_engine_value(). (Maybe a custom class is needed)

Also, for decoding. Currently, my implementation for the UnionVariant branch in the Rust function basic_value_to_py_object<'py>() is like this:

let result = match v {
// snip
        value::BasicValue::UnionVariant { value, .. } => {
            basic_value_to_py_object(py, &value)?
        }
};

In this case, I suspect the union type cannot be detected from the make decoder function in any way, neither through encode_engine_value() nor TransientFlow::evaluate_async().

But if we return a struct that is converted into the Python value (such as something like { type: "Str", value: "foo" }), then the make decoder function can detect it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python, I think there's no common way to get the specific branch for a value annotated with Union type, e.g. for list[int] | list[float], type information only exists in each specific list element. There's no type information to directly distinguish list[int] and list[float].

So, for encoding (Python->Rust), probably we have to try different branches and see which will succeed, on Rust side. Your current implementation looks good!

For decoding (Rust->Python), since both Rust and Python side have consistent type information, I think we can follow similar approach as how we serialize to JSON. On Rust side, we may convert it into a tuple of (tag_id, value), then on Python side using tag_id we can find out the type of the specific branch. What do you think about this? Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should do the trick.

Copy link
Author

@chardoncs chardoncs May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second thought, there is a problem with the B-Tree set. It seems like we have to use Vec to preserve the order if tag_id is returned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please use Vec - this is compatible with the tag_id approach.

Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,7 @@ class AnalyzedTypeInfo:

attrs: dict[str, Any] | None
nullable: bool = False
union_variant_types: list[type] | None = None # For Union


def analyze_type_info(t: Any) -> AnalyzedTypeInfo:
Expand All @@ -175,18 +176,6 @@ def analyze_type_info(t: Any) -> AnalyzedTypeInfo:
if base_type is Annotated:
annotations = t.__metadata__
t = t.__origin__
elif base_type is types.UnionType:
possible_types = typing.get_args(t)
non_none_types = [
arg for arg in possible_types if arg not in (None, types.NoneType)
]
if len(non_none_types) != 1:
raise ValueError(
f"Expect exactly one non-None choice for Union type, but got {len(non_none_types)}: {t}"
)
t = non_none_types[0]
if len(possible_types) > 1:
nullable = True
else:
break

Expand All @@ -205,6 +194,7 @@ def analyze_type_info(t: Any) -> AnalyzedTypeInfo:

struct_type: type | None = None
elem_type: ElementType | None = None
union_variant_types: typing.List[ElementType] = None
key_type: type | None = None
np_number_type: type | None = None
if _is_struct_type(t):
Expand Down Expand Up @@ -254,6 +244,22 @@ def analyze_type_info(t: Any) -> AnalyzedTypeInfo:
args = typing.get_args(t)
elem_type = (args[0], args[1])
kind = "KTable"
elif base_type is types.UnionType:
possible_types = typing.get_args(t)
non_none_types = [arg for arg in possible_types if arg not in (None, types.NoneType)]

if len(non_none_types) == 0:
return analyze_type_info(None)

nullable = len(non_none_types) < len(possible_types)

if len(non_none_types) == 1:
result = analyze_type_info(non_none_types[0])
result.nullable = nullable
return result

kind = 'Union'
union_variant_types = non_none_types
elif kind is None:
dtype_info = DtypeRegistry.get_by_dtype(t)
if dtype_info is not None:
Expand Down Expand Up @@ -286,6 +292,7 @@ def analyze_type_info(t: Any) -> AnalyzedTypeInfo:
kind=kind,
vector_info=vector_info,
elem_type=elem_type,
union_variant_types=union_variant_types,
key_type=key_type,
struct_type=struct_type,
np_number_type=np_number_type,
Expand Down Expand Up @@ -345,6 +352,13 @@ def _encode_type(type_info: AnalyzedTypeInfo) -> dict[str, Any]:
encoded_type["element_type"] = _encode_type(elem_type_info)
encoded_type["dimension"] = type_info.vector_info.dim

elif type_info.kind == 'Union':
if type_info.union_variant_types is None:
raise ValueError("Union type must have a variant type list")
encoded_type['types'] = [
_encode_type(analyze_type_info(typ)) for typ in type_info.union_variant_types
]

elif type_info.kind in TABLE_TYPES:
if type_info.elem_type is None:
raise ValueError(f"{type_info.kind} type must have an element type")
Expand Down
12 changes: 11 additions & 1 deletion src/base/json_schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ use crate::prelude::*;

use crate::utils::immutable::RefList;
use schemars::schema::{
ArrayValidation, InstanceType, ObjectValidation, Schema, SchemaObject, SingleOrVec,
ArrayValidation, InstanceType, ObjectValidation, Schema, SchemaObject, SingleOrVec, SubschemaValidation,
};
use std::fmt::Write;

Expand Down Expand Up @@ -176,6 +176,16 @@ impl JsonSchemaBuilder {
..Default::default()
}));
}
schema::BasicValueType::Union(s) => {
schema.subschemas = Some(Box::new(SubschemaValidation {
one_of: Some(
s.types.iter()
.map(|t| Schema::Object(self.for_basic_value_type(t, field_path)))
.collect()
),
..Default::default()
}));
}
}
schema
}
Expand Down
19 changes: 19 additions & 0 deletions src/base/schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ pub struct VectorTypeSchema {
pub dimension: Option<usize>,
}

#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct UnionTypeSchema {
pub types: Vec<BasicValueType>,
}

#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
#[serde(tag = "kind")]
pub enum BasicValueType {
Expand Down Expand Up @@ -56,6 +61,9 @@ pub enum BasicValueType {

/// A vector of values (usually numbers, for embeddings).
Vector(VectorTypeSchema),

/// A union
Union(UnionTypeSchema),
}

impl std::fmt::Display for BasicValueType {
Expand All @@ -82,6 +90,17 @@ impl std::fmt::Display for BasicValueType {
}
write!(f, "]")
}
BasicValueType::Union(s) => {
write!(f, "Union[")?;
for (i, typ) in s.types.iter().enumerate() {
if i > 0 {
// Add type delimiter
write!(f, " | ")?;
}
write!(f, "{}", typ)?;
}
write!(f, "]")
}
}
}
}
Expand Down
44 changes: 42 additions & 2 deletions src/base/value.rs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the implementations, I realized I overlooked one thing when creating the original issue: value -> JSON -> value may get a different value after a roundtrip, under union type (the value representation has stronger type information and can uniquely identify the type for basic types, but JSON are weaker typed).

This can be a problem, meaning a ser-deser roundtrip is no longer transparent (e.g. we cache intermediate computation results for reuse).

Now I'm thinking maybe we can do the following to aid:

  • For union typed values, add a number tag to indicate the branch ID, e.g. UnionVariant(tag_id, value), and serialize it as a 2-element array.
  • This means the type representation (UnionType) needs to keep a Vec, so the tag_id can more easily work with it. I still like the idea of sorting possible types though - consider doing the sorting when creating the UnionType.
  • Note that we have another TypedValue serialization logic:

    cocoindex/src/base/value.rs

    Lines 1049 to 1057 in 95c7ece

    impl Serialize for TypedValue<'_> {
    fn serialize<S: serde::Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
    match (self.t, self.v) {
    (_, Value::Null) => serializer.serialize_none(),
    (ValueType::Basic(_), v) => v.serialize(serializer),
    (ValueType::Struct(s), Value::Struct(field_values)) => TypedFieldsValue {
    schema: &s.fields,
    values_iter: field_values.fields.iter(),
    }

    The difference is that these serializations are more friendly for being consumed outside cocoindex without cocoindex type info (they are used in export / dump format now), and not for another deserialization. So for these, we don't need to carry tag_id along.

Welcome to any further discussions on this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand that transparency can be a problem. My current implementation includes type guessing that may cause inconsistency.

So, my understanding is that the tag_id is for the current type of the value, and it's serialized altogether with the value as a JSON array. Is that correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct.
(sorry missed this question yesterday)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the code! Looked at the latest in-progress implementation. This is still a little bit different from what I thought.

UnionVariant(tag_id, value) I suggested is a branch of BasicValue. Writing it out, may be something like this:

enum BasicValue {
  ...
  UnionVariant{tag_id: usize, value: Box<BasicValue>},
}

e.g. if a union type is Str | Int64, Str has tag ID 0, Int64 has tag ID 1.

Both serialization and deserialization will be more straightforward:

  • To serialize: directly convert it into array [tag_id, serialize(value)]
  • To deserialize (from_json() method): with the tag_id, we can directly get the specific type from the vector in UnionType, then use it to deserialize value.

This will be simpler and more robust: we don't need to guess the BasicType from the BasicValue, which may not be always possible (e.g. zero-length vectors, or a type like Str | Vector[Str | Int64]).


Besides, I realized there're one more thing may need discuss - I didn't realize last time.

For deserialization, there're two types of use cases:

  1. Deserialize what we serialized. It will guarantee we get the same thing in a roundtrip.
  2. Deserialize JSON values created by some external systems. Currently the only situation is ValueExtractor in json_schema.rs (e.g. values are created by LLM for the case), which we don't expect they put the tag ID. On the other hand, we don't expect the value type preserved in a roundtrip for this case too. So we can guess by trying each possible types during deserialization.

To distinguish these two, we can add a new input to from_json() method, to indicate which mode, e.g. we can have another enum like:

enum DeserializationMode {
  /// Deserialize what we serialized. Guaranteed to preserve original type.
  Internal,
  /// Deserialize JSON values produced by external systems. 
  External,
}

Thanks for your patience! Let me know if you have any other thoughts. Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the misunderstanding. I was actually concerned by serializing the element type.

It is a simple way to eliminate ambiguity.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I was initially thinking about serialize and deserialize BasicValue without a UnionVariant branch.

The serialization creates the tag_id when BasicValueType is Union:

E.g. BasicValue::Int64(10) with type Str | Int -> { tag_id: 1, value: 10 }

And deser uses the BasicValueType and tag_id to construct BasicValue:

E.g. { tag_id: 1, value: 10 } with type Str | Int -> BasicValue::Int64(10)

Copy link
Member

@badmonster0 badmonster0 May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thought! This may have the following difficulties though:

  • During serialization we don't have the BasicValueType. The serialization is designed in a way that can work standlone with the Value (otherwise we need to do it by a serialization method not standard for serde, and a lot things need to be changed, e.g. other serializable structs using Value as fields).
  • Even if we pass the type down, we still need to guess the type based on the value. It's doable for most cases, but there're still some difficult cases. e.g. when there's a vector, we need to look into the element, which can be union type too. We'll introduce more times in the future (e.g. Enum, [FEATURE] Support Enum as a basic type #523), which may add more complexity (e.g. how about Vector[Str] | Vector[Enum1])?

I prefer avoiding these complexities, to avoid hitting a case that we cannot handle some day.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that TypedValue in this file also needs to be updated. for UnionType, we should only serialize the value part, without the tag.

Original file line number Diff line number Diff line change
Expand Up @@ -379,6 +379,10 @@ pub enum BasicValue {
TimeDelta(chrono::Duration),
Json(Arc<serde_json::Value>),
Vector(Arc<[BasicValue]>),
UnionVariant {
tag_id: usize,
value: Box<BasicValue>,
},
}

impl From<Bytes> for BasicValue {
Expand Down Expand Up @@ -496,7 +500,8 @@ impl BasicValue {
| BasicValue::OffsetDateTime(_)
| BasicValue::TimeDelta(_)
| BasicValue::Json(_)
| BasicValue::Vector(_) => api_bail!("invalid key value type"),
| BasicValue::Vector(_)
| BasicValue::UnionVariant { .. } => api_bail!("invalid key value type"),
};
Ok(result)
}
Expand All @@ -517,7 +522,8 @@ impl BasicValue {
| BasicValue::OffsetDateTime(_)
| BasicValue::TimeDelta(_)
| BasicValue::Json(_)
| BasicValue::Vector(_) => api_bail!("invalid key value type"),
| BasicValue::Vector(_)
| BasicValue::UnionVariant { .. } => api_bail!("invalid key value type"),
};
Ok(result)
}
Expand All @@ -539,6 +545,7 @@ impl BasicValue {
BasicValue::TimeDelta(_) => "timedelta",
BasicValue::Json(_) => "json",
BasicValue::Vector(_) => "vector",
BasicValue::UnionVariant { .. } => "union",
}
}
}
Expand Down Expand Up @@ -890,6 +897,12 @@ impl serde::Serialize for BasicValue {
BasicValue::TimeDelta(v) => serializer.serialize_str(&v.to_string()),
BasicValue::Json(v) => v.serialize(serializer),
BasicValue::Vector(v) => v.serialize(serializer),
BasicValue::UnionVariant { tag_id, value } => {
let mut s = serializer.serialize_tuple(2)?;
s.serialize_element(tag_id)?;
s.serialize_element(value)?;
s.end()
}
}
}
}
Expand Down Expand Up @@ -954,6 +967,33 @@ impl BasicValue {
.collect::<Result<Vec<_>>>()?;
BasicValue::Vector(Arc::from(vec))
}
(v, BasicValueType::Union(typ)) => {
let obj: Vec<serde_json::Value> = serde_json::from_value(v)
.map_err(|_| anyhow::anyhow!("Invalid JSON value for union, expect array"))?;

if obj.len() != 2 {
anyhow::bail!("Invalid union tuple: expect 2 values, received {}", obj.len());
}

let mut obj_iter = obj.into_iter();

// Take first element
let tag_id = obj_iter
.next()
.and_then(|value| value.as_u64().map(|num_u64| num_u64 as usize))
.unwrap();

// Take second element
let value = obj_iter.next().unwrap();

let cur_type = typ.types.get(tag_id)
.ok_or_else(|| anyhow::anyhow!("No type in `tag_id` \"{tag_id}\" found"))?;

BasicValue::UnionVariant {
tag_id,
value: Box::new(BasicValue::from_json(value, cur_type)?),
}
}
(v, t) => {
anyhow::bail!("Value and type not matched.\nTarget type {t:?}\nJSON value: {v}\n")
}
Expand Down
13 changes: 13 additions & 0 deletions src/ops/storages/kuzu.rs
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,14 @@ fn basic_type_to_kuzu(basic_type: &BasicValueType) -> Result<String> {
t.dimension
.map_or_else(|| "".to_string(), |d| d.to_string())
),
BasicValueType::Union(t) => format!(
"UNION({})",
t.types.iter()
.enumerate()
.map(|(i,typ)| Ok(format!("val{} {}", i, basic_type_to_kuzu(typ)?)))
.collect::<Result<Vec<_>>>()?
.join(", "),
),
t @ (BasicValueType::Time | BasicValueType::Json) => {
api_bail!("{t} is not supported in Kuzu")
}
Expand Down Expand Up @@ -379,6 +387,11 @@ fn append_basic_value(cypher: &mut CypherBuilder, basic_value: &BasicValue) -> R
}
write!(cypher.query_mut(), "]")?;
}
BasicValue::UnionVariant { tag_id, value } => {
write!(cypher.query_mut(), "union_value(val{}:=", tag_id)?;
append_basic_value(cypher, value)?;
write!(cypher.query_mut(), ")")?;
}
v @ (BasicValue::Time(_) | BasicValue::Json(_)) => {
bail!("value types are not supported in Kuzu: {}", v.kind());
}
Expand Down
9 changes: 9 additions & 0 deletions src/ops/storages/neo4j.rs
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,15 @@ fn basic_value_to_bolt(value: &BasicValue, schema: &BasicValueType) -> Result<Bo
_ => anyhow::bail!("Non-vector type got vector value: {}", schema),
},
BasicValue::Json(v) => json_value_to_bolt_value(v)?,
BasicValue::UnionVariant { tag_id, value } => match schema {
BasicValueType::Union(s) => {
let typ = s.types.get(*tag_id)
.ok_or_else(|| anyhow::anyhow!("Invalid `tag_id`: {}", tag_id))?;

basic_value_to_bolt(value, typ)?
}
_ => anyhow::bail!("Non-union type got union value: {}", schema),
},
};
Ok(bolt_value)
}
Expand Down
7 changes: 7 additions & 0 deletions src/ops/storages/postgres.rs
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,12 @@ fn bind_value_field<'arg>(
builder.push_bind(sqlx::types::Json(v));
}
},
BasicValue::UnionVariant { .. } => {
builder.push_bind(sqlx::types::Json(TypedValue {
t: &field_schema.value_type.typ,
v: value,
}));
}
},
Value::Null => {
builder.push("NULL");
Expand Down Expand Up @@ -383,6 +389,7 @@ fn to_column_type_sql(column_type: &ValueType) -> String {
"jsonb".into()
}
}
BasicValueType::Union(_) => "jsonb".into(),
},
_ => "jsonb".into(),
}
Expand Down
22 changes: 22 additions & 0 deletions src/py/convert.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
use bytes::Bytes;
use pyo3::exceptions::PyTypeError;
use numpy::{PyArray1, PyArrayDyn, PyArrayMethods};
use pyo3::IntoPyObjectExt;
use pyo3::types::PyAny;
Expand Down Expand Up @@ -76,6 +77,9 @@ fn basic_value_to_py_object<'py>(
value::BasicValue::TimeDelta(v) => v.into_bound_py_any(py)?,
value::BasicValue::Json(v) => pythonize(py, v).into_py_result()?,
value::BasicValue::Vector(v) => handle_vector_to_py(py, v)?,
value::BasicValue::UnionVariant { tag_id, value } => {
(*tag_id, basic_value_to_py_object(py, &value)?).into_bound_py_any(py)?
}
};
Ok(result)
}
Expand Down Expand Up @@ -162,6 +166,24 @@ fn basic_value_from_py_object<'py>(
))
}
}
schema::BasicValueType::Union(s) => {
let mut valid_value = None;

// Try parsing the value
for (i, typ) in s.types.iter().enumerate() {
if let Ok(value) = basic_value_from_py_object(typ, v) {
valid_value = Some(value::BasicValue::UnionVariant {
tag_id: i,
value: Box::new(value),
});
break;
}
}

valid_value.ok_or_else(|| {
PyErr::new::<PyTypeError, _>(format!("invalid union value: {}, available types: {:?}", v, s.types))
})?
}
};
Ok(result)
}
Expand Down