-
Notifications
You must be signed in to change notification settings - Fork 120
feat: support union type for basic types #510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
5b33175
a7f7e85
7b959b9
6b6aae9
bb18d97
a2b5906
0e19865
4cbd632
69cc231
3b386b6
159e8e3
606d26e
c68fd2a
839ee85
4752970
b16d0f6
c404388
312ef53
c2ebd88
703565b
d2d02b0
72537b8
f837121
2785e1c
a1f47a6
cca7d1f
f9e9bec
189a52d
b1e1084
4ec510d
9a31d2e
c168dcc
2e06096
1e9d74c
780d154
0ffe380
f384c16
4463e52
69a0280
1d51d5c
efa1861
6294ecf
fd37218
052815a
66a09ec
385487f
f8eb3cc
bdd4b16
534c791
e8b972e
beaa1c1
39b6039
fa2dd14
030281c
a037d9a
224cb5b
d4916d2
8869002
7f08070
0571e8c
edeabe7
d194117
fe4941b
0850cdc
2775b28
d649e0a
9677169
bf2811c
a5f6c6c
d48f6c5
abb920e
09764e2
b246485
34d0ca3
d029def
ce45aaa
2c7d106
1a78b48
e8ed867
f577da7
59f4bce
70d2010
a913522
dd0d48f
549dc30
240cf16
4832227
5c48526
2795d4b
af45e67
d093571
866865c
6de65b5
4bcc53d
4157ff0
64cf435
e873f65
ae6e5bc
d1205cf
f748cda
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
chardoncs marked this conversation as resolved.
Show resolved
Hide resolved
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After reading the implementations, I realized I overlooked one thing when creating the original issue: value -> JSON -> value may get a different value after a roundtrip, under union type (the value representation has stronger type information and can uniquely identify the type for basic types, but JSON are weaker typed). This can be a problem, meaning a ser-deser roundtrip is no longer transparent (e.g. we cache intermediate computation results for reuse). Now I'm thinking maybe we can do the following to aid:
Welcome to any further discussions on this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I understand that transparency can be a problem. My current implementation includes type guessing that may cause inconsistency. So, my understanding is that the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that's correct. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for updating the code! Looked at the latest in-progress implementation. This is still a little bit different from what I thought.
enum BasicValue {
...
UnionVariant{tag_id: usize, value: Box<BasicValue>},
} e.g. if a union type is Both serialization and deserialization will be more straightforward:
This will be simpler and more robust: we don't need to guess the Besides, I realized there're one more thing may need discuss - I didn't realize last time. For deserialization, there're two types of use cases:
To distinguish these two, we can add a new input to enum DeserializationMode {
/// Deserialize what we serialized. Guaranteed to preserve original type.
Internal,
/// Deserialize JSON values produced by external systems.
External,
} Thanks for your patience! Let me know if you have any other thoughts. Thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for addressing the misunderstanding. I was actually concerned by serializing the element type. It is a simple way to eliminate ambiguity. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By the way, I was initially thinking about serialize and deserialize The serialization creates the
And deser uses the
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice thought! This may have the following difficulties though:
I prefer avoiding these complexities, to avoid hitting a case that we cannot handle some day. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we also need to update
make_engine_value_decoder()
inconvert.py
.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm currently confused by this decoding. It seems like there is no way to encode a union engine value from a Python value. The Python union type cannot be detected from
encode_engine_value()
. (Maybe a custom class is needed)Also, for decoding. Currently, my implementation for the
UnionVariant
branch in the Rust functionbasic_value_to_py_object<'py>()
is like this:In this case, I suspect the union type cannot be detected from the make decoder function in any way, neither through
encode_engine_value()
norTransientFlow::evaluate_async()
.But if we return a struct that is converted into the Python value (such as something like
{ type: "Str", value: "foo" }
), then the make decoder function can detect it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Python, I think there's no common way to get the specific branch for a value annotated with Union type, e.g. for
list[int] | list[float]
, type information only exists in each specific list element. There's no type information to directly distinguishlist[int]
andlist[float]
.So, for encoding (Python->Rust), probably we have to try different branches and see which will succeed, on Rust side. Your current implementation looks good!
For decoding (Rust->Python), since both Rust and Python side have consistent type information, I think we can follow similar approach as how we serialize to JSON. On Rust side, we may convert it into a tuple of
(tag_id, value)
, then on Python side usingtag_id
we can find out the type of the specific branch. What do you think about this? Thanks!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that should do the trick.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a second thought, there is a problem with the B-Tree set. It seems like we have to use
Vec
to preserve the order iftag_id
is returned.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please use
Vec
- this is compatible with thetag_id
approach.