Skip to content

Converters and scaffolding #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 4, 2022
Merged

Converters and scaffolding #4

merged 10 commits into from
Feb 4, 2022

Conversation

cretz
Copy link
Member

@cretz cretz commented Jan 31, 2022

What was changed

Added initial impl of data converters. Also added some scaffold for the bridge and client, but both are likely to change quite a bit, so no need to review them too hard. Just trying to get eyes before moving on.

@cretz cretz requested review from bergundy and Sushisource January 31, 2022 20:52
"""Base converter to/from single payload/value."""

@abstractmethod
async def encode(self, value: Any) -> Optional[temporalio.api.common.v1.Payload]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not convinced this needs to be an async method, it shouldn't do any IO or CPU intensive operations.
The DataConverter should be async because it's responsible for compression and encryption.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Y'all chose to make it async-capable in TypeScript, why not here? What are the concerns with making it async?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it needs to be async unless we plan to separate out bytes-only stuff soon, as that easily could be cpu heavy for encryption (or need to do IO by reading keys from some source, etc)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to make the TS PayloadConverter synchronous.
In TS, async framework code could change the order of workflow execution which could break determinism but that might not apply here, and we'd need to restructure things a bit for that.
I would suggest that users don't do any CPU or IO in the PayloadConverter and instead do it in the higher level DataConverter where multiple payloads could be compressed together.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to make the TS PayloadConverter synchronous.

Why? What are the concerns with making it async?

In TS, async framework code could change the order of workflow execution which could break determinism but that might not apply here, and we'd need to restructure things a bit for that.

Why is this a problem for payload converters but not data converters?

I would suggest that users don't do any CPU or IO in the PayloadConverter and instead do it in the higher level DataConverter where multiple payloads could be compressed together.

While we might suggest this, the question is why is it ok for data converters to be async but not payload converters?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In TS we're going to separate conversion into 2 steps, one sync that maps object -> payload and runs in workflow context, and the other async that maps payloads -> payloads and runs outside of workflow context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I think if we were using subprocess sandboxing or similar we might need to do the same (and we'd also have to take the class of a data converter instead of an instance of it), but I am hoping our sandboxing does not cross some boundary requiring serialization to occur on the workflow side of that boundary. Though a method on a dataclass could provide a way to escape the sandbox. Will think on it.

Even if it does, I think the async payload conversion in the asyncio event loop will be in a predictable order.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order is predictable but if you change the workflow data converter it's essentially a breaking non-deterministic change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I don't expect the data converter to be inline with the workflow wrt determinism, nor am I more concerned about this change of a data converter mid-use any more than other types of data converter changes mid-use.

await assert_payload(False, "json/plain", "false")

# Unknown type
with pytest.raises(RuntimeError) as excinfo:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think TypeError is better here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it's a type error necessarily vs lack of converter. I have now documented the error.

Comment on lines +393 to +395
# TODO(cretz): Should this be a var that can be changed instead? If so, can it
# be replaced _after_ client creation? We'd just have to fallback to this
# default at conversion time instead of instantiation time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do that, IMO it'd make sense to keep this and also introduce some kind of set_default so that you don't wipe out the original defaults and they're always easily available if you want to restore them for some reason.

Copy link
Member Author

@cretz cretz Feb 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think I'd like to keep it an immutable default. We have ways to set the default at the client level which then applies to all things within anyways. You don't need to set the "default default" I don't think (we don't allow it in Go).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other SDKs have an immutable default converter, but I see Spencer's point about setting the "default default" being more convenient than having to pass data converters everywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Keeping it immutable here. I have exposed converters from the default for easy creation of a new one.

Copy link
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking comments.

Comment on lines +23 to +31
Args:
values: Values to be converted.

Returns:
Converted payloads. Note, this does not have to be the same number
as values given, but at least one must be present.

Raises:
Exception: Any issue during conversion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rationale is that's what pycharm defaults to and from my experience is the most popular format.
I don't have a strong preference, my request was based on personal experience.

"""Base converter to/from single payload/value."""

@abstractmethod
async def encode(self, value: Any) -> Optional[temporalio.api.common.v1.Payload]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to make the TS PayloadConverter synchronous.
In TS, async framework code could change the order of workflow execution which could break determinism but that might not apply here, and we'd need to restructure things a bit for that.
I would suggest that users don't do any CPU or IO in the PayloadConverter and instead do it in the higher level DataConverter where multiple payloads could be compressed together.

"""See base class."""
message_type = payload.metadata.get("messageType", b"<unknown>").decode()
try:
value = _sym_db.GetSymbol(message_type)()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have the type hint here I recommend using that and falling back to the messageType header.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type hint is optional for proto payloads, the message type is not (at least how we're developing this across SDKs), so we might as well use the thing we know is always there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know Dmitry had his concerns with breaking compatibility if the message name changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're concerned with message name changes in the payload, he should also be concerned with incompatible type hint changes. As it stands, there are changes a dev can make that can break things.

Comment on lines +393 to +395
# TODO(cretz): Should this be a var that can be changed instead? If so, can it
# be replaced _after_ client creation? We'd just have to fallback to this
# default at conversion time instead of instantiation time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other SDKs have an immutable default converter, but I see Spencer's point about setting the "default default" being more convenient than having to pass data converters everywhere.

@cretz
Copy link
Member Author

cretz commented Feb 4, 2022

Gonna go ahead and merge this given a lack of large outstanding concerns and with an impending review for the client coming.

@cretz cretz merged commit 7b5a69c into temporalio:main Feb 4, 2022
@cretz cretz deleted the initial-impl branch February 4, 2022 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants