[WIP] Allow POJOs as CloudEvent data #198

slinkydeveloper · 2020-07-14T08:57:48Z

This PR allows CloudEvent to contain a data field different from byte[]. The user doesn't need to have a prior knowledge of the type of data and EventFormat/protocol bindings doesn't need to know it too. When the user extracts the data from CloudEvent, then the data can be converted using getData(Class). On the other end, the CloudEvent might contain half middle representation data field like JsonNode for json.

In order to support marshalling/unmarshalling from/to byte[], I added the concept of EventDataCodec and implemented it in the jackson module.

Signed-off-by: Francesco Guardiani francescoguard@gmail.com

Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>

api/src/main/java/io/cloudevents/CloudEvent.java

bsideup · 2020-07-14T09:54:19Z

api/src/main/java/io/cloudevents/CloudEvent.java

+     * @throws IllegalArgumentException if there isn't any unmarshaller from the inner data to T
+     */
+    @Nullable
+    <T> T getData(Class<T> c) throws IllegalArgumentException;


as seen in Jackson and others, Class may not represent all the types (e.g. List<User>). Consider using a type reference instead.

I thought about that, but that means we need to define our type reference? I'm not sure i wanna dive in into that complexity... And if we define our type reference, then the EventDataCodec implementer needs to also convert our type reference to its type reference

My idea for using Class is:

For the simple use cases, the user just uses getData(c).

For the user cases when the user needs a type reference, he can just use jackson directly: he gets the raw payload with getData(Object.class) and then he uses the jackson mapper to map it.

I am personally against this operation all together regardless of Type or Class. CloudEvent type is representation of the actual cloud event on the wire. It should be as simple and as light as possible containing only essentials. We had a very long and contentious debate about it, so let's not go back to square one.
If user needs stronger type for its data representation then there should be another abstraction that would perform such conversion, but only as an option.

Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>

bsideup · 2020-07-14T10:04:31Z

@slinkydeveloper previously, CloudEvent was a POJO, but now it may need to store a reference to the encoder/decoder service to perform getData(Class) (de-)serialization, and can't be treated as a simple POJO anymore.

Also, obviously getData(Class) cannot be generic for every Class, which means that an instance of CloudEvent may return from getData(Class) correctly when created by implementation X but fail (due to missing codec) with from Y.

Another question is how byte[] getData() should be implemented now. There are two options:

always have byte[] data representation in the implementing object - bad for performance
always ensure that the underlying object can be encoded as byte[] - hard to enforce

In Liiklus, I went with option number 2 for performance reasons, but I can't say I am 100% happy about the API of it :)

Curious to hear your thoughts on these.

slinkydeveloper · 2020-07-14T10:15:42Z

I think i didn't underlined the important prerequisite of this PR, as you sad:

always ensure that the underlying object can be encoded as byte[] - hard to enforce

If it's a CloudEvent, it must be somehow encodable/decodable to bytes. If you put inside the data field of CloudEvent something that cannot be encoded to bytes, then i suspect it's a bad usage of the CloudEvent abstraction. So IMO this enforcement is necessary.

I see this feature more as a mix of optimization and sugar to use the sdk, but it's not my intention to break the existing assumptions/requirements of CloudEvent:

Optimizes the use case where you create a CloudEvent in memory and it never goes to the wire
It allows to implement more efficiently an event format (like json) that defines some "special" encodings around some specific data content types
Provides some "sugar" to invoke directly the json mapper from the CloudEvent data field

slinkydeveloper · 2020-07-14T10:21:02Z

@slinkydeveloper previously, CloudEvent was a POJO, but now it may need to store a reference to the encoder/decoder service to perform getData(Class) (de-)serialization, and can't be treated as a simple POJO anymore.

Also, obviously getData(Class) cannot be generic for every Class, which means that an instance of CloudEvent may return from getData(Class) correctly when created by implementation X but fail (due to missing codec) with from Y.

So CloudEvent doesn't need to store a reference to the encoder/decoder service (as you see in the code, I resolve the codec using data content type), but yes this change definitely adds the requirement to the implementations of CloudEvent to have some kind of "codecs store", so i would love to hear some feedback before proceeding forward with fully implementing it.

Of course, an implementation can always fail on getData(Class) saying that it cannot find the codec, but that doesn't sound good.

bsideup · 2020-07-14T10:28:20Z

If you put inside the data field of CloudEvent something that cannot be encoded to bytes, then i suspect it's a bad usage of the CloudEvent abstraction

As we previously figured, CloudEvent is an in-memory representation of a CloudEvent, and it is a valid use case to use it for signalling events without even having to encode/decode them. Given that, I would not say that it is a "bad usage of the CloudEvent abstraction".

getData(Class) implies some sort of a conversion mechanism. Have you considered having CloudEvent<T> with T getData()? Then the decoding shifts to the CloudEvent creator (something like CloudEvent<User> receiveCloudEvent(User.class)), and encoding is done at the writer side, with an option to get the raw data as ByteBuffer, for example.

bsideup · 2020-07-14T10:30:21Z

Another question to ask: does it really need to be in the API at all. Since it is an optimization, it can as well be implementation specific, when the optimization can be made.

slinkydeveloper · 2020-07-14T10:32:47Z

Have you considered having CloudEvent with T getData()?

I don't like the idea of bringing back generics to CloudEvent interface. We removed them because they brought the important concern of prior knowledge of the payload of the event. As a user that receives 3/4 different type of events, i don't have the prior knowledge of the generic inside the CloudEvent. Also all the encoding process had to have knowledge of how to go from data to T, while now this is not required. Now the user can just read the CloudEvent, get the payload as bytes and that's all, without having prior knowledge of the content of data nor without registering additional mappers.

slinkydeveloper · 2020-07-14T10:37:51Z

Another question to ask: does it really need to be in the API at all. Since it is an optimization, it can as well be implementation specific, when the optimization can be made.

It needs to be an API in the sense that, if I create an in memory CloudEvent with MyPojo as data, I wanna retrieve it without going through byte[] (nor a downcasting) doing CloudEvent.getData(MyPojo.class). But maybe I'm watching the problem from the wrong POV...
Another solution could be removing getData(Class) and adding something like Object getRawData(), then with his assumptions the user might cast to MyPojo the returned value. But that still requires the implementation or to store the data field as byte[] or to define an encoding from its data field to byte[]

bsideup · 2020-07-14T10:38:28Z

As a user that receives 3/4 different type of events, i don't have the prior knowledge of the generic inside the CloudEvent

For that, you could have CloudEvent<JsonNode> (that can be decoded into a concrete type later, for the record).

Now the user can just read the CloudEvent, get the payload as bytes and that's all, without having prior knowledge of the content of data nor without registering additional mappers.

This sounds too magical for me given that we're talking about the thin interface in the API module. Think of this interface as the one that defines interoperability between various CloudEvent implementations. You can have as many features as desired in the core module (aka the "default" implementation), but the interface needs to be carefully reviewed to not block other implementations nor make them suffer working around some "features" of it that they can't support.

slinkydeveloper · 2020-07-14T10:40:30Z

CloudEvent<JsonNode>

Still, that's a prior assumption that the payload is json 😄 In sdk-go we have this interesting api usability feature that you can decode the payload without checking/having knowledge of the data content type: getData(MyPojo.class) always returns MyPojo, no matter if data content type is xml or json (of course assuming that MyPojo can be decoded from xml and json)

bsideup · 2020-07-14T10:49:51Z

@slinkydeveloper

Still, that's a prior assumption that the payload is json

Funny enough, readTree in Jackson's (aka "FasterXML") XML support returns... JsonNode 😂 It can also be any other "holder" of a value ready to be deserialized. As long as it decouples the decoding logic from CloudEvent implementations (where getData(Class) is considered a decoding operation)

slinkydeveloper · 2020-07-14T10:56:36Z

How does it sound the option of T getData(Class<T>) -> Object getRawData() and then some static method somewhere (probably in core) like readData(Class)? That doesn't sound problematic, doesn't add any new assumptions and a naive implementation without encoding/decoding can still store everything as byte[]. And we can still optimize the CloudEvent in memory use case

bsideup · 2020-07-14T11:01:30Z

@slinkydeveloper I like it, but I'd still bring back the generic parameter, to avoid unnecessary casting when the type is clear. It will always allow to have CloudEvent<Object>.

and then some static method somewhere (probably in core) like readData(Class)

Yeah, this definitely belongs to core. You could even add an instance method to core's implementation, and, if you don't want to expose implementation's type, add an additional interface (to core) that will have something like:

SmartCloudEvent<T> extends CloudEvent<T> {
    <T> T readData(Class<T> type);
}

(the name is random and there must be a better one, of course, just I didn't want to spend hours trying to find it 😂 )

bsideup · 2020-07-14T11:02:32Z

Re "naive implementation" - it will indeed always return CloudEvent<byte[]> in such case, which is absolutely valid and self descriptive: "I only know the byte representation".

johanhaleby · 2020-07-14T12:17:22Z

Thanks for looking into this. My use case is that I want to store CloudEvent's in MongoDB but hopefully provide a way to avoid double encoding/decoding of the body (see discussion here for context). This means that I'd like to be able to set an instance of org.bson.Document as body.

I kind of like the non-generic api better though (as compared with cloud-events 1.3), it looks less complex and you know that you're dealing with byte[] data, but it's probably too limited to cover cases such as mine. I guess a trade-off has to be made on performance vs complexity like you imply. I would personally be fine with enforcing byte[], but I'm not sure it's generalizable.

How does it sound the option of T getData(Class) -> Object getRawData() and then some static method somewhere (probably in core) like readData(Class)

I don't really understand what you mean here (I may be corrupted by -> in ML languages? :)). Do you mean two different methods, T getData(Class<T>) and Object getRawData() in the CloudEvent interface?

On a different note I would strongly suggest to keep encoding/decoding/codecs out of the CloudEvent implementation. In my mind the data representation of a cloud event and how its encoded/decoded are different concerns. Better to pass CloudEvent as a parameter to a function that does encoding/decoding.

jskeet · 2020-07-20T17:04:32Z

FWIW, I believe the question of "how should CloudEvent objects represent data" is likely to be an important one across multiple languages.

I've been thinking about this myself quite a lot for the C# SDK, but without any concrete progress yet.

olegz · 2020-07-20T17:10:38Z

@jskeet Can't agree with you more. It is certainly an important aspect and must cover many angles of type representation and conversion including class loader constrains (Foo that is really a Foo doesn't appear to be a Foo) and more. . . So, while 100% in agreement that it is needed, i just don't believe the question of "how should CloudEvent objects represent data" should be answered by CloudEvent interface.

slinkydeveloper · 2020-07-27T09:50:47Z

Ok I thought about it and I think i found a solution that reaches the 3 common goals we have:

nice ux for the basic data field unmarshalling while keeping it lazy (aka the data field parsing should not be done while reading the message from the wire)
support having pojos inside the event, so our jackson mapper can optimize json values (and users can optimize their event format impls too like @johanhaleby)
avoid adding new constraints to CloudEvent

That's more or less the idea:

public interface CloudEvent extends CloudEventAttributes, CloudEventExtensions {

    /**
     * The event data
     */
    @Nullable
    <T> T getData();
}

This relaxes the assumptions of getData() (the generic T does nothing more than triggering a downcast to make the api nicer to use) and should not hurt the user too much.

Now with a method in EventDataCodecProvider like readData we can have this UX:

CloudEvent ev = ....
MyPojo pojo = EventDataCodecProvider.deserializeData(ev, MyPojo.class);

This checks the content of getData() and if there is already MyPojo, it just returns it. Otherwise it calls the proper EventDataCodec implementation which invokes, in case of jackson, the mapper to do the crazy stuff of the conversion.

If the user needs more complex typing when unmarshalling, he can still use jackson directly pretty easily with:

List<AAA> l = mapper.readValue(ev.getData(), new TypeReference<List<AAA>>{})

On the write side, protocol bindings implementers needs to invoke something like EventDataCodecProvider.serializeData(String datacontenttype, Object val, Class<T> target) to convert from whatever is inside data to target type (which will allow to perform the optimizations like avoid double serialization of json)

WDYT about that?

bsideup · 2020-07-27T11:18:02Z

I would strongly advice against using " T". This does not allow knowing the type at runtime and creates a false feeling of being dynamic, although it is not. So, -1 from me (if counts).

olegz · 2020-07-27T12:16:21Z

As stated before I am also -1 on this one. Returning T creates an additional contractual obligation on the CloudEvent strategy - to become a type converter (directly or not) and that goes against the "separation of concerns", "loose coupling" etc. Basically it's not its job to deal with any type of type conversion. There are type converters, marshallers, transformers, serializers etc., and we should try to stick with these familiar java patterns and idioms. But the CloudEvent should be simple and unassuming.

slinkydeveloper · 2020-08-07T12:30:14Z

So, what are the next steps here? Any proposals on the table?

olegz · 2020-08-07T12:32:25Z

Yes, this PR has been rejected twice with detailed explanation(s) with no acceptable counter arguments . Personally I am not sure what else to add. Consider closing it and move on.

johanhaleby · 2020-08-07T13:03:46Z

This is a though one.. Maybe it would be possible to provide an alternative implementation that implements the CloudEvent interface but also allow a generic type (or simply Object). Implementations supporting optimizations like this could instance-of check to see if it's possible to avoid driect use of the byte-array. The implementation must obviously support serializing the content to a byte-array as well but this could be done lazily (and one could use memoization for caching if needed). fre 7 aug. 2020 kl. 14:32 skrev Oleg Zhurakousky <notifications@github.com>:

Yes, this PR has been rejected twice with detailed explanation(s) with no acceptable counter arguments . Personally I am not sure what else to add. Consider closing it and move on. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#198 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABNVFPAE3AKX65RJAJPZB3R7PX6PANCNFSM4OZJVW7A> . --

Sent from my phone

olegz · 2020-08-07T14:11:36Z

@johanhaleby what you're saying is very widely used pattern and personally I am +1 for it. Basically having specialized implementations that may provide extra functionality. In this case it doesn't even have to be one. Could be several or even hierarchy, but the core interface as far as I am concerned is final and IMHO I do not foresee any changes to it - ever.

olegz · 2020-08-07T14:12:49Z

Also, those specializations should live in core module (not the api)

slinkydeveloper · 2020-08-10T14:36:15Z

The implementation must obviously support
serializing the content to a byte-array as well but this could be done
lazily (and one could use memoization for caching if needed).

That's what i'm struggling to solve. When you go down the road of allowing a "non byte array", you end up with the requirements of mapping T to byte array.
One of the core ideas of sdk-java v2 is to keep jackson/mappers/whatever data serializer/deserializer as much as possible out of the sight of this codebase, because there is no point on re-doing what jackson already does pretty well (and, even more important, to avoid problems like #198 (comment)).

But, in order to fix #196, you need a solution to this problem anyway. We could, as you sad, relax the implementation of CloudEvent in core adding a new method getObjectData() that returns Object, but you still you need the mapper then to go from Object to byte[]. And, side effect, you need to allow in CloudEventBuilder the Object data.

The only doable solution I see is implementing only serialization with jackson (from Object to byte[]) and this could also fit straight in the CloudEvent interface just relaxing getData() to return Object

bsideup · 2020-08-10T14:39:08Z

FTR I second the idea of not having the notion of "typed CloudEvent" in api and using custom objects (TypedCloudEvent extends CloudEvent in core?) when typed object is desired.

slinkydeveloper · 2020-08-10T14:44:20Z

"typed CloudEvent"

I'm not proposing any "typed CloudEvent" here 😄 But yeah that's something that could be done in future in core, like the getData(Class) in this PR. Although i recognize that this PR diverged a bit from the initial idea, so maybe just relaxing getData() + some serialization code might do the trick. WDYT?

bsideup · 2020-08-10T14:50:27Z

IMO the current one is fine (well, I would still prefer ByteBuffer instead of byte[], or some other memory efficient option, but that's not the point) and we should not change it to Object. If one wants a CloudEvent that holds a reference to an object and avoid eager serialization, they are free to use a custom type that holds said object next to binary-friendly getData.

slinkydeveloper · 2020-08-10T15:11:24Z

I'm sorry @bsideup , but TBH I don't get why relaxing getData() in CloudEvent might be a problem. Any specific reason?

bsideup · 2020-08-10T15:23:14Z

@slinkydeveloper because it makes it dynamic. Object getData is a can of worms.

Either make it CloudEvent<T> or keep everything as it is right now, that would be my suggestion.

bsideup · 2020-08-10T15:23:54Z

In fact, even CloudEvent<T> would be problematic because it introduces the serialization to the table.

slinkydeveloper · 2020-08-11T09:47:18Z

Ok I'm trying a second iteration, stay tuned

johanhaleby · 2020-08-11T11:54:05Z

In occurrent I've run into a similar issue. My current approach is to add a level of indirection and instead of returning a byte array (byte[]) I return a Data interface. Data then has a method that returns byte[] but consumers can add instanceof checks to cast to a specific implementation that can be used to get access to a data structure before it's serialized/converted to a byte array for optimization purposes.

slinkydeveloper · 2020-08-11T12:41:19Z

@johanhaleby this is an approach adopted by other sdks too, eg: https://github.com/cloudevents/sdk-rust/blob/master/src/event/data.rs#L5

I think it might be interesting to explore it, I personally like a lot the kind of union types

slinkydeveloper · 2020-08-11T12:51:55Z

Second try: #208

slinkydeveloper · 2020-10-08T09:04:01Z

Replaced by #250

WIP allow POJOs as CloudEvent data

3c2b1f6

Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>

slinkydeveloper added the enhancement New feature or request label Jul 14, 2020

slinkydeveloper added this to the 2.0.0-milestone2 milestone Jul 14, 2020

slinkydeveloper mentioned this pull request Jul 14, 2020

How to best deal with conversions/seralization to other formats that byte[]? #196

Closed

Fix CloudEvent#getData()

3af01ee

Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>

bsideup reviewed Jul 14, 2020

View reviewed changes

api/src/main/java/io/cloudevents/CloudEvent.java Outdated Show resolved Hide resolved

bsideup reviewed Jul 14, 2020

View reviewed changes

CloudEvent#getRawData() -> CloudEvent#getData(Object.class)

4621b28

Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>

slinkydeveloper mentioned this pull request Aug 11, 2020

[WIP] Allow POJOs as CloudEvent data, round 2 #208

Closed

slinkydeveloper mentioned this pull request Aug 13, 2020

[WIP] Allow POJOs as CloudEvent data, round 3 #211

Closed

olegz mentioned this pull request Aug 18, 2020

Much harder with V2 than V1 to send binary CloudEvents #212

Closed

slinkydeveloper removed this from the 2.0.0-milestone2 milestone Sep 1, 2020

slinkydeveloper added the discussion label Sep 1, 2020

slinkydeveloper mentioned this pull request Sep 28, 2020

CloudEvents with Avro payloads, on Kafka #238

Closed

slinkydeveloper mentioned this pull request Oct 7, 2020

Introduce CloudEventData #250

Merged

slinkydeveloper closed this Oct 29, 2020

[WIP] Allow POJOs as CloudEvent data #198

[WIP] Allow POJOs as CloudEvent data #198

Uh oh!

Conversation

slinkydeveloper commented Jul 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

bsideup Jul 14, 2020

Choose a reason for hiding this comment

Uh oh!

slinkydeveloper Jul 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

slinkydeveloper Jul 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

olegz Jul 20, 2020

Choose a reason for hiding this comment

Uh oh!

bsideup commented Jul 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slinkydeveloper commented Jul 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slinkydeveloper commented Jul 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bsideup commented Jul 14, 2020

Uh oh!

bsideup commented Jul 14, 2020

Uh oh!

slinkydeveloper commented Jul 14, 2020

Uh oh!

slinkydeveloper commented Jul 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bsideup commented Jul 14, 2020

Uh oh!

slinkydeveloper commented Jul 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bsideup commented Jul 14, 2020

Uh oh!

slinkydeveloper commented Jul 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bsideup commented Jul 14, 2020

Uh oh!

bsideup commented Jul 14, 2020

Uh oh!

johanhaleby commented Jul 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jskeet commented Jul 20, 2020

Uh oh!

olegz commented Jul 20, 2020

Uh oh!

slinkydeveloper commented Jul 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bsideup commented Jul 27, 2020

Uh oh!

olegz commented Jul 27, 2020

Uh oh!

slinkydeveloper commented Aug 7, 2020

Uh oh!

olegz commented Aug 7, 2020

Uh oh!

johanhaleby commented Aug 7, 2020 via email

Uh oh!

olegz commented Aug 7, 2020

Uh oh!

olegz commented Aug 7, 2020

Uh oh!

slinkydeveloper commented Aug 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

slinkydeveloper commented Jul 14, 2020 •

edited

Loading

slinkydeveloper Jul 14, 2020 •

edited

Loading

slinkydeveloper Jul 14, 2020 •

edited

Loading

bsideup commented Jul 14, 2020 •

edited

Loading

slinkydeveloper commented Jul 14, 2020 •

edited

Loading

slinkydeveloper commented Jul 14, 2020 •

edited

Loading

slinkydeveloper commented Jul 14, 2020 •

edited

Loading

slinkydeveloper commented Jul 14, 2020 •

edited

Loading

slinkydeveloper commented Jul 14, 2020 •

edited

Loading

johanhaleby commented Jul 14, 2020 •

edited

Loading

slinkydeveloper commented Jul 27, 2020 •

edited

Loading

slinkydeveloper commented Aug 10, 2020 •

edited

Loading

slinkydeveloper commented Aug 10, 2020 •

edited

Loading

slinkydeveloper commented Aug 11, 2020 •

edited

Loading