Skip to content

Conversation

@twyatt
Copy link
Contributor

@twyatt twyatt commented Jul 29, 2020

Add support for ignoring elements in encoded CBOR that are not present in the Kotlin class representation.

Closes #935

How it works

When an unknown element is encountered, decoder.skipElement() is invoked until a known element is encountered.

Skipping of an element begins at the first byte of the element's value.

Bytes are processed to determine element type (and corresponding length), to ultimately
determine how many bytes can be skipped. The general process is demonstrated by the following pseudo-code:

lengthStack represents stack of "number of elements" at each depth.

fun skipElement() {
  do {
    when (element) {
      0xFF -> lengthStack.removeLast(); prune() // removes last `indefinite` in stack
      type is indefinite -> lengthStack.add(indefinite)
      type is collection -> lengthStack.add(collection.length) // e.g. array, list, map
      type is primitive -> skip(primitive.length); prune()
    }
  } while (lengthStack.isNotEmpty())
}

fun prune() {
  for (length in lengthStack[last]..lengthStack[0]) { // iterate over `lengthStack` values in reverse order
    if (length is indefinite) return // stop "pruning"
    length--
    if (length == 0) pop() // remove last `length` from stack
  }
}

Examples

The following examples show the hex notation of CBOR encoded payloads with the cooresponding lengthStack after processing the bytes in that row.

Array of objects

The encoded CBOR 82A16161636B656BA16161636B656B would undergo the following skipElement (decoding) process:

1 2 3 CBOR lengthStack Action(s) Taken
82 array(2) [2] pushed 2 (child count of 82: A1, A1) on stack
A1 map(1) [2, 2] pushed 2 (child count of A1: "a", "kek") on stack
61 61 text(1) "a" [2, 1] jumped over "a", decremented last stack item
63 6B656B text(3) "kek" [1] jumped over "kek", no more children of A1, decrement stack items
A1 map(1) [1, 2] pushed 2 (child count of A1: "a", "kek") on stack
61 61 text(1) "a" [1, 1] jumped over "a", decremented last stack item
63 6B656B text(3) "kek" [] jumped over "kek", no more children of A1, decremented stack items

Array with nested indefinite length array

The encoded CBOR 829FA16161636B656BFF6162 would undergo the following skipElement (decoding) process:

1 2 3 4 CBOR lengthStack Action(s) Taken
82 array(2) [2] pushed 2 (child count of 82: array(*), "b") on stack
9F array(*) [2, -1] pushed indefinite on stack
A1 map(1) [2, -1, 2] pushed 2 (child count of A1: "a", "kek") on stack
61 61 text(1) "a" [2, -1, 1] jumped over "a", decremented last stack item
63 6B656B text(3) "kek" [2, -1] jumped over "kek", no more children of A1, decrement stack items (from end to start; stopped at -1)
FF primitive(*) [1] encountered break, no more children of 9F, decrement stack items
61 62 text(1) "b" [] jumped over "b", no more children of 82, decremented stack items

@twyatt twyatt force-pushed the CBOR/ignoreUnknownKeys branch 2 times, most recently from cb5e48a to a4a477f Compare July 31, 2020 02:10
@twyatt twyatt force-pushed the CBOR/ignoreUnknownKeys branch from a4a477f to e62191e Compare July 31, 2020 06:15
@twyatt twyatt marked this pull request as ready for review July 31, 2020 06:43
Test case was covered in:
`testDecodeCborWithUnknownKeysInSealedClasses`
@qwwdfsad qwwdfsad self-requested a review August 11, 2020 15:53
@twyatt

This comment has been minimized.

@qwwdfsad
Copy link
Member

Small update: we are busy with releasing Kotlin 1.4.0 and release candidate of serialization right now.
I'll return to your PR right after the release (~middle of this week)

@twyatt twyatt force-pushed the CBOR/ignoreUnknownKeys branch from 6f42ad1 to 84b99e4 Compare August 18, 2020 01:52
@twyatt

This comment has been minimized.

@sandwwraith
Copy link
Member

@twyatt Sorry for not keeping in touch. I see you've rebased the branch, this should be enough of help. I'll try to return to you soon; I think this is an important feature for 1.0, so we'll try to put it there

Copy link
Member

@sandwwraith sandwwraith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR and explanations — they were really helpful. I'd left some comments, mainly about exceptions, but the rest of PR is fine.

}

fun skipElement() {
val lengthStack = mutableListOf<Int>()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a little concern that MutableList<Int> may cause performance problems due to boxing, but this may be just a premature optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a multiplatform-friendly dynamically sizing stack I could use in its place? Or would I have to roll my own?

If needed I could try to build a basic stack implementation to avoid the auto-boxing but I'm not confident my implementation would be optimal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, there's an IntArrayBuilder for deserializing IntArrays, but it is hidden for now. I think we can go with MutableList until we have a clear benchmarks that show if this is a real problem. @qwwdfsad WDYT?

@sandwwraith
Copy link
Member

The build check has failed — seems you have to run ./gradlew knit to re-generate documentation tests

@twyatt

This comment has been minimized.

val index = if (cbor.ignoreUnknownKeys) {
val knownIndex: Int
while (true) {
if (isDone()) return CompositeDecoder.DECODE_DONE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check could be lifted to the very first line of the method to simplify the control flow and make it slightly easier to read

Copy link
Contributor Author

@twyatt twyatt Sep 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately decoder.skipElement() (line 248) will skip the current value element and we'll need to check isDone() on the next loop to keep the previous short-circuit behavior (when beginning to read the next element value).

I tried lifting isDone() line (just to be sure) and it does in fact cause a number of tests to fail.

Would it be better to remove some of the code that is duplicated in the top-most if branches by having it always enter the while loop?:

override fun decodeElementIndex(descriptor: SerialDescriptor): Int {
    val index: Int

    while (true) {
        if (isDone()) return CompositeDecoder.DECODE_DONE
        val elemName = decoder.nextString()
        readProperties++

        if (cbor.ignoreUnknownKeys) {
            val elementIndex = descriptor.getElementIndex(elemName)
            if (elementIndex == CompositeDecoder.UNKNOWN_NAME) {
                decoder.skipElement()
            } else {
                index = elementIndex
                break
            }
        } else {
            index = descriptor.getElementIndexOrThrow(elemName)
            break
        }
    }

    decodeByteArrayAsByteString = descriptor.isByteString(index)
    return index
}

I could additionally create a SerialDescriptor.getElementIndexOrSkip(String) extension function, but it would need access to the decoder so could not live next to the existing getElementIndexOrThrow extension function (which is top-level)?


I agree that the control flow is a bit difficult to follow in this function, so any other suggestions/strategies to refactor and simplify it are appreciated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather stick with the current version — in the simple case (when ignoreUnknownKeys=false) it does not bother you with infinite loop. I think we can live with 3 lines of code duplicated.

twyatt and others added 3 commits September 24, 2020 22:31
Co-authored-by: Leonid Startsev <sandwwraith@users.noreply.github.com>
Co-authored-by: Leonid Startsev <sandwwraith@users.noreply.github.com>
Co-authored-by: Leonid Startsev <sandwwraith@users.noreply.github.com>
Copy link
Contributor Author

@twyatt twyatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I've addressed or responded to all the comments/feedback.

Additionally, added tests for non-happy-paths and executed/committed ./gradlew knit.

val index = if (cbor.ignoreUnknownKeys) {
val knownIndex: Int
while (true) {
if (isDone()) return CompositeDecoder.DECODE_DONE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather stick with the current version — in the simple case (when ignoreUnknownKeys=false) it does not bother you with infinite loop. I think we can live with 3 lines of code duplicated.

@twyatt twyatt requested a review from sandwwraith October 3, 2020 00:54
@twyatt
Copy link
Contributor Author

twyatt commented Oct 5, 2020

Pulled in the latest dev but am no longer able to confirm everything is working:

./gradlew clean knit check

...

> Task :dokkaHtmlMultiModule FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':dokkaHtmlMultiModule'.
> java.nio.file.Files.writeString(Ljava/nio/file/Path;Ljava/lang/CharSequence;[Ljava/nio/file/OpenOption;)Ljava/nio/file/Path;

I get the same failure when attempting the same command on dev (@ d3d2dca), so could just be something broken on my local machine setup w/ the new Dokka. 🤷

@sandwwraith
Copy link
Member

@twyatt What version of JDK do you use? Apparently this method is available only since Java 11.

Copy link
Member

@sandwwraith sandwwraith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the check for you with JDK 11 — it passes

@sandwwraith sandwwraith merged commit f7f1bcc into Kotlin:dev Oct 5, 2020
@sandwwraith
Copy link
Member

Thank you for your contribution, hope it'll help a lot of users. It was a pleasure working with you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants