Skip to content

Input/Output refactor, breaking changes #74

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Jul 16, 2018
Merged

Input/Output refactor, breaking changes #74

merged 24 commits into from
Jul 16, 2018

Conversation

ghik
Copy link
Contributor

@ghik ghik commented Jul 9, 2018

This PR contains some controversial, breaking but IMHO necessary changes in serialization framework:

  • InputType has been dropped.

It was used only to check if Input contains null so it was replaced by simple isNull method. This change is primarily motivated by the fact that some "simple" types may need to use array/object encoding, e.g. #64 - this way array/object values would not be properly distinguishable from "simple" values and inputType method would have to lie in some way.

  • GenCodec for Option has been changed to use unwrapped encoding with null representing None.

My original motivation for using list-based encoding was to be able to express null and Some(null) values for Option. It was additionally supported by an assumption that we don't care too much about serialized format as long as deserializes properly. This is no longer true as GenCodec is now being used for various purposes and may affect the way our OSS is used outside the company (e.g. through Udash).

Currently, the only way to ensure unwrapped encoding of optional values is to use Opt/OptArg which I definitely don't want to impose on people using our libraries. For this reason I see no other choice but to introduce this breaking change and somehow manage the breakage in our own use cases which primarily involve serialization to MongoDB. Therefore, I left an ugly backdoor that Input and Output implementations may use to retain the previous encoding of Option for compatibility purposes - see LegacyOptionEncodingInput/LegacyOptionEncodingOutput.

  • Introduced JsonOptions for better control of output format of JsonStringOutput (timestamp format, binary data format, etc.). See documentation of JsonOptions for more details.

The main breaking changes in JSON serialization is that timestamps are now by default serialized as ISO formatted strings, e.g. "2010-04-10T08:50:41.262Z" and byte arrays are serialized as actual JSON (signed) byte arrays instead of hex strings. This of course can be adjusted in JsonOptions to use previous format.

  • Introduced general purpose Timestamp type that serves as a minimal, platform-independent representation of UTC based instant.

@ghik ghik requested review from Starzu, ddworak and mkej July 9, 2018 18:44

import scala.scalajs.js

trait JsInterop {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A JsInterop companion would be nice, not everyone is so keen to inherit helpers or import the whole world from commons.

import com.avsystem.commons.misc.TimestampConversions

trait JavaTimeInterop {
implicit def instantTimestampConversions(instant: Instant): TimestampConversions =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add JDate to the mix, it's not my favourite either, but it's still widely used. It's also the only JVM time type present in SJS as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already in JBasicUtils because JDate is available also in ScalaJS while JavaTimeInterop is JVM only.

*
* @param millis milliseconds since UNIX epoch, UTC
*/
class Timestamp(val millis: Long) extends AnyVal {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for it not to be final?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Value classes are always final

*
* @param millis milliseconds since UNIX epoch, UTC
*/
class Timestamp(val millis: Long) extends AnyVal {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like that we'll finally have a general-purpose cross-compiled timestamp representation in commons, but it will be often useless for SJS frontends without DateTime-like formatting. Our go-to solution for now is to transform JVM formats to Moment.js formats, which binds us to JDate most of the time anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to have a sensible cross-compiled, immutable, comparable, serializable and human-readable type that could be put into RPC interfaces.

I know that in order to show it to the user, fronted will have to convert it to something raw (millis, js.Date) so it can be passed to moment.js or whatever other formatting library. However, this can now happen at the edge of our codebase and JDate will no longer pollute it, most importantly our RPC interfaces.

import scala.concurrent.duration.{FiniteDuration, TimeUnit}

/**
* Millisecond-precision, general purpose, cross compiled timestamp representation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instant has nanosecond resolution. JS has performance.now() with kinda-usually-microseconds resolution (browser dependent). I wouldn't introduce a new, less precise representation as an intermediate one.

Copy link
Contributor Author

@ghik ghik Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am aware of the precision loss but I think it's still worth the cost of having to remember about it.

I'm not sure what you mean by this intermediate precision.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that this new representation has neat conversions to all the different "times" one can have to use. Its loss of precision disables us from using it as an intermediate format.

Copy link
Contributor Author

@ghik ghik Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precision loss happens only at converting Instant to Timestamp - all the other conversions are fine. I can change the name of toTimestamp extension method on Instant to truncateToTimestamp so that it's clear that it loses precision. WDYT?

I don't think we need cross-compiled instant data type with more than millisecond precision.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sgtm

sealed trait JsonDateFormat
object JsonDateFormat {
/**
* Specifies that a timestamp should be represented in ISO 8861 format with UTC time zone,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case JsonBinaryFormat.ByteArray =>
val builder = new mutable.ArrayBuilder.ofByte
val li = readList()
while (li.hasNext) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Inputs/Outputs could look nicer if e.g. ListInputs were Traversable(Once)

Copy link
Contributor Author

@ghik ghik Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely not, I will not make it inherit collections API. Inputs and Outputs are as raw as possible.

case JsonBinaryFormat.HexString =>
val hex = checkedValue[String](JsonType.string)
val result = new Array[Byte](hex.length / 2)
var i = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(0 until result.length).map(... should not be much slower and you could even go all the way .toArray

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want this as raw and fast as possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark it then 😃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will. But on the first sight your version should be much slower because not only will it create intermediate collection but also every byte will be boxed in that collection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing foreach was relatively harmless but map + toArray cut off more than half of benchmark score. Ultimately I made a tail-recursive loop.

sb.toString
}
}

trait BaseJsonOutput {
protected final def writeJsonString(builder: JStringBuilder, str: String): Unit = {
protected final def indent(builder: JStringBuilder, options: JsonOptions, depth: Int): Unit =
options.indentSize match {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.foreach

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I don't want that lambda in here - this is called often and I want this as fast as possible.

package serialization

/**
* This ugly workaround has been introduced when standard `Option` encoding changed from zero-or-one element list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't be so hard on yourself, seen worse than that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link

@Starzu Starzu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about the performance of GenCodec[Option[T]] with these LegacyOptionEncoding input and output. I think we should check it with some benchmarks.


final class TimestampConversions(private val millis: Long) extends AnyVal {
def toJsDate: js.Date = new js.Date(millis.toDouble)
def toJDate: JDate = new JDate(millis)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toTimestamp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes


def parse(string: String): Long = {
def fail = throw new ReadFailure(s"invalid ISO instant: $string")
if (regex.test(string)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this regex test makes the whole method much slower. This affects JsonStringInput, so could you at least measure the difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very naive benchmark shows no significant difference (5-20%).
I want this to be secured with this regex because I want this to work as closely as possible in both JS and JVM.


final class TimestampConversions(private val millis: Long) extends AnyVal {
def toInstant: Instant = Instant.ofEpochMilli(millis)
def toJDate: JDate = new JDate(millis)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toTimestamp


implicit def optionCodec[T: GenCodec]: GenCodec[Option[T]] = create[Option[T]](
locally {
case i: LegacyOptionEncodingInput if i.legacyEncodingEnabled =>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern matching looks like something slow. Could you create a benchmark to compare this version with clean codec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just an instanceof check - I don't see a reason why would be particularly slow.
But yeah, measuring is always good.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 will check that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you were right - now I just left a method in Input & Output instead of having it in separate traits.

/**
* Specifies format used by `JsonStringOutput.writeBinary` / `JsonStringInput.readBinary` to represent byte arrays.
*/
sealed trait JsonBinaryFormat
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see lots of JS generated from this file. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a ValueEnum but I wanted to leave possibility to make a case class there.

while (i < result.length) {
result(i) = ((reader.fromHex(hex.charAt(2 * i)) << 4) | reader.fromHex(hex.charAt(2 * i + 1))).toByte
i += 1
def loop(i: Int): Unit = if (i < result.length) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tailrec?

import japgolly.scalajs.benchmark.gui.BenchmarkGUI
import org.scalajs.dom._

object Main {
def main(args: Array[String]): Unit = {
val body = document.getElementById("body")
BenchmarkGUI.renderSuite(body)(Benchmarks.suite)
BenchmarkGUI.renderSuite(body)(IsoInstantBenchmarks.suite)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BenchmarkGUI.renderMenu(body)(
  IsoInstantBenchmarks.suite,
  JsonBenchmarks.suite
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants