Skip to content

Serialize floats with decimal point and don't assume Int if rounds #1443

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

johnno1962
Copy link
Contributor

@johnno1962 johnno1962 commented Feb 17, 2018

Hi Apple,

I’ve been having some problems encoding and then decoding Doubles using JSONSerialization on Linux which do not occur using the same code on Darwin. The problem seems to be due to the value being serialized without a decimal point so deserialization threats it is as Int. This causes problems with a cast to Double further down the line. The fix is to make sure floating point values are serialised with a decimal point and also removes the code which assumes a number is Int if it happens to round. I’ve adjusted the test so it continues to run. Not sure what the wider effects of this change might be but it resolves the problem I’m seeing and seems more in the spirit of JSON. Oddly, Darwin doesn’t serialize with the decimal point and the deserialization is fine though it seems to use CFNumber rather than Any.

Backs out #914 so the following works on Linux:

import Foundation

let json = try! JSONSerialization.data(withJSONObject: [10.0])
print(String(data: json, encoding: .utf8)!)
let array = try! JSONSerialization.jsonObject(with: json) as! [Any]
print(array[0] as! Double)

@xwu
Copy link
Contributor

xwu commented Feb 19, 2018

As you mention, the current behavior of swift-corelibs-foundation does line up with Darwin Foundation, which also doesn't serialize the decimal point. So I don't think this change fixes the underlying issue that's causing swift-corelibs-foundation to behave differently.

@johnno1962
Copy link
Contributor Author

johnno1962 commented Feb 19, 2018

It’s a difficult problem to solve given the implementation but I still feel if a value is received with a decimal point in the JSON it should be represented as a Double rather than randomly converting it to an Int if it equals it’s rounded value as is the case now. Serialising with the decimal is a pragmatic fix at the end of the day where new Foundation messages new Foundation and should do no harm.

@parkera
Copy link
Contributor

parkera commented Feb 19, 2018

@johnno1962 I assume the problem here is that you can't predict if a given Double will be encoded as an Int or not?

@johnno1962
Copy link
Contributor Author

johnno1962 commented Feb 19, 2018

Hi, yes that’s the nub of the problem for which this is only a workaround really. The real solution is to change the compiler/language such that Int’s can be force downcast to Double which wouldn’t seem unreasonable or use NSNumber and implement it’s magic powers of casting on Linux.

let a = 10 as Double // is OK
print(10 as! Double) // is not
print(NSNumber(value: 10) as! Double) // OK on Darwin but not Linux

Seems to be something you could fix in the runtime somewhere in swift::swift_dynamicCast

@johnno1962
Copy link
Contributor Author

johnno1962 commented Feb 20, 2018

@parkera, I’ve prepared the patch to the Swift runtime allowing you to cast Int in an Any to Double: johnno1962/swift@430f52b

Somehow this doesn’t feel like the sort of thing Swift does but given the constraints that you have no way of guaranteeing a Double will be encoded with a decimal point and deserialisation uses Any type I can’t think of an alternative if we’re to avoid users having to special case this on Linux.

If you agree I've raised a Jira to explain the issue and can propose the change to the Swift repo.
The following would then work:

    let a: Any = Int.max
    print(a as! Double)

@spevans
Copy link
Contributor

spevans commented Feb 20, 2018

I think the way it works on macOS is that it uses NSNumber which as you note above can be cast to Double and Int on Darwin but not Linux. Maybe fixing that would be the best solution as it would also be useful in other situations as well.

@johnno1962
Copy link
Contributor Author

johnno1962 commented Feb 20, 2018

This could be one solution but replicating NSNumber’s unique casting powers on Darwin would be difficult on Linux as casting is implemented in the core stdlib and NSNumber is a part of Foundation. I quite like the use of “Any” instead in new Foundation. This is the one case I’ve found where it lets the side down. Would being able to cast from Int to Double be such a bad thing?

@parkera
Copy link
Contributor

parkera commented Feb 20, 2018

Swift bridging is a combination of features up and down the stack, including the compiler, runtime, standard library and the Foundation overlay on Darwin (by adopting certain protocols). None of it is enabled for Linux, which does require us to do some unfortunate workarounds.

On ObjC/Darwin the type is indeed decoded as NSNumber, which allows the caller to get whatever kind of number they like out of it (somewhat like an imaginary AnyNumber).

@xwu
Copy link
Contributor

xwu commented Feb 21, 2018

A lot to unpack here.

First, minor point: 10 as Double is actually how you write that the literal value 10 is meant to be of type Double (as has multiple meanings in Swift). Double(10) is how you convert a literal value 10 of type Int to Double. People get these confused very frequently, even in code committed in the Swift project itself. It doesn't help that 10 is Double is treated as a type checking operation that's always false and that 10 as! Double is treated as a type casting operation with a diagnostic that can mislead users.

Second, Double can't represent all values of Int exactly, nor vice versa; it would be incorrect to allow one to be cast to the other, even through Any. In fact your changes to Swift are incorrect (reinterpreting the bit pattern of an integral floating-point value doesn't give you the integer value) and is also memory unsafe on 32-bit systems for any value.

Third, JSON has no concept of integers or floating-point values: there is only a language-independent "number value." It's not really correct to say that a Swift Double is being "encoded as an Int" in JSON, as there's no such type; it's merely being encoded as a number. Therefore, inherent to the JSON format itself, there's no guarantee that a number value encoded from any language will roundtrip with the same type without relying on idiosyncrasies of specific JSON implementations. NSNumber is arguably the right fit for Swift's JSON implementation.

Solving the as? cast for Linux could be one approach to this problem. Alternatively, I wonder if it would be feasible for the Linux implementation to deserialize all values as Double by default (rather than Int, then falling back to Double). This offers total predictability for the type of deserialized JSON values and matches native JavaScript behavior:

var json = '{"count": 18014398509481985}';
obj = JSON.parse(json);
console.log(obj.count); // output: 18014398509481984

Users can always use Int.init(exactly:) if they want integers (just like they can with NSNumber).

@johnno1962
Copy link
Contributor Author

johnno1962 commented Feb 21, 2018

Not sure about your second point there. Sure, not all Int values can be represented as Double without loss of precision but I think the “as?" patch is correct and has been tested as working. It reinterprets what is guaranteed to be a long in memory as a double. Using long takes into account the difference between 32bit and 64 bit systems. Decoding all values as double is an idea with some merit but leaves subtle incompatibilities between the Darwin and Linux versions not to mention the conversion errors. I’d go for the minor change to allow casting Int in Any to Double even if it is decidedly “not the Swift way". Question is should it be available on Darwin as well for consistency and testing?

@xwu
Copy link
Contributor

xwu commented Feb 21, 2018

In Swift, CLong is a synonym for Int and its size varies between architectures, and reinterpreting a Double bit pattern as an integer does not give you the integral value.

@johnno1962
Copy link
Contributor Author

johnno1962 commented Feb 21, 2018

Eh? not with you at all. The patch is in C++:

    } else if (descriptor == &NOMINAL_TYPE_DESCR_SYM(Si) &&
      targetDescriptor == &NOMINAL_TYPE_DESCR_SYM(Sd)) {
      // Allow downcast of Int to Double for JSON deserialization on Linux
      // https://bugs.swift.org/browse/SR-7037
      *reinterpret_cast<double *>(destination) = *reinterpret_cast<long *>(source);
      if (flags & DynamicCastFlags::TakeOnSuccess) {
        sourceType->vw_destroy(source);
      }
      return true;
    } else {

if descriptor == &NOMINAL_TYPE_DESCR_SYM(Si) the thing pointed to by source is a long which can simply be cast to a to a double (targetDescriptor == &NOMINAL_TYPE_DESCR_SYM(Sd)) to achieve the conversion. Using long is a way to get int64_t on LP64 and int32_t otherwise. It would be helpful if there were typedef I could use in this case instead of long and double. This makes the following work in Swift where it would currently fail:

let a: Any = 10
print(a as! Double)

@xwu
Copy link
Contributor

xwu commented Feb 21, 2018

I understand the code in the patch. Reinterpreting a int32_t as a double is obviously problematic. What does that line of code print?

@johnno1962
Copy link
Contributor Author

10.0

@xwu
Copy link
Contributor

xwu commented Feb 21, 2018

I don't see how; I get 4.94066e-323. Where does the actual conversion happen?

@johnno1962
Copy link
Contributor Author

It is a patch in swift/stdlib/public/runtime/Casting.cpp in:

 static bool _dynamicCastStructToStruct(OpaqueValue *destination,
                                        OpaqueValue *source,
                                        const StructMetadata *sourceType,
                                        const StructMetadata *targetType,
                                        DynamicCastFlags flags) {

The line doing the actual conversion is old style:

      *reinterpret_cast<double *>(destination) = *reinterpret_cast<long *>(source);

@xwu
Copy link
Contributor

xwu commented Feb 21, 2018

The patch reinterprets a pointer to a long as a pointer to a double. That does not convert between integers and floating-point values. The bit pattern representing 10 as a long represents 4.94066e-323 as a double.

Ah, got it. You're using C++ to perform the conversion on assignment after dereferencing.

@johnno1962
Copy link
Contributor Author

johnno1962 commented Feb 21, 2018

It dereferences the pointers i.e. the * before the reinterpret_cast

@xwu
Copy link
Contributor

xwu commented Feb 21, 2018

This conversion, as I said, is not OK as it succeeds whether or not the values can be represented exactly.

Swift does not have a model of how to cast between different numeric types and changes to that effect are anything but "small."

@johnno1962
Copy link
Contributor Author

johnno1962 commented Feb 21, 2018

A fair criticism though whether this should crash a program when the number has been represented to 15 significant digits is debatable. There is worse going on in JSONSeriaization however e.g. this line:
https://github.com/apple/swift-corelibs-foundation/blob/master/Foundation/JSONSerialization.swift#L845. What would it do for numbers > Int.max?? It should be removed IMHO which is part of this PR.

@xwu
Copy link
Contributor

xwu commented Feb 21, 2018

JSONSerialization can do what it wants because JSON doesn't require any particular handling of numbers. Whatever happens on Darwin Foundation is fine (but that line of code is not a great idea, I agree).

By contrast, Swift's model for casting between numeric types is an expansive issue. In no case would it be acceptable for a cast to succeed between numeric types that causes data loss. Using C++ to perform a floating-point-to-integer conversion in this way is also not the right solution (LLVM intrinsics are used in Swift). In general, touching Swift's type casting code for JSON serialization is not the way to go.

@johnno1962
Copy link
Contributor Author

johnno1962 commented Feb 21, 2018

Data loss issues aside which are inherent in the conversion if a developer asks for it could you convert that line to use LLVM intrinsics that detect data loss or point me to an example please? It does stand out as being rather naïve.

@xwu
Copy link
Contributor

xwu commented Feb 23, 2018

Conversions between numeric types can be lossy (that is, those that use converting initializers such as Double(_: Int); casts are never lossy in Swift.

See SE-0170 for the semantics of NSNumber bridging, which even its relaxed form today requires as? to allow the safe expression of the value. You'll note that this is approved as a unique "middle ground" compromise to align the behavior of Swift and Objective-C. It's definitely not a behavior that's considered appropriate for Swift types.

You'll find that NSNumber bridging is now implemented in Swift by calling the appropriate init(exactly:) initializers. You can easily dig through the standard library's implementations of init(exactly:) to find the corresponding LLVM intrinsics that are called; if I recall, at least some of those checks are native Swift.

In any case, you'll see that this is very involved code and that there is a detailed design underlying NSNumber bridging; it is not at all like the method that you wrote. For Swift Foundation to have parity, as @parkera and @spevans wrote, the solution is to tackle NSNumber bridging so that the already written bridging code can actually be used on Linux, which involves addressing a known issue that'll also benefit all other NS* types here.

By the looks of it, it's not exactly an easy problem, but it is a tractable one. The Swift code for Objective-C bridging, its swift-corelibs-foundation counterpart, and the SILGen bits and optimizer bits are all copiously commented.

@johnno1962
Copy link
Contributor Author

johnno1962 commented Feb 23, 2018

Even though I have advocated for it (with due reservation - see above) I don’t expect the “Int in Any downcast-able to Double” hack to be welcomed into the Swift runtime. NSNumber is the solution in the longer term being what JSON deserialising random types needs and things seem further along than I had realised.

I’m not convinced there is an easy hard and fast rule that should be applied such as “cast are never lossy in Swift” myself. I have tried to draw a distinction between two forms of lossy conversion: invalid and imprecise. Clearly trying to shoehorn NSNumber(value: 1024) into an Int8 should fail but for me 9.2233720368547758e+18 is a valid Double representation of 9223372036854775807 (Int.max) within the limitations inherent in the type. The situation on Darwin seems to be in flux and in some edge cases incorrect 😱. Using the following function in a playground:

func tickle(_ i: Int) {
  let json = try! JSONSerialization.data(withJSONObject: [i])
  print(String(data: json, encoding: .utf8)!)
  let array = try! JSONSerialization.jsonObject(with: json) as! [Any]
  print(type(of: array[0]))
  print("As Int: ", array[0] as? Int as Any)
  print("As Double: ", array[0] as? Double as Any)
  print("Precise: ", Int((array[0] as? Double ?? 0)-1000)==i-1000)
}

The following is in line with my expectations on Xcode 9.2 (all values OK for all versions):

tickle(Int.max)
[9223372036854775807]
NSDecimalNumber
As Int:  Optional(9223372036854775807)
As Double:  Optional(9.2233720368547758e+18)
Precise:  false

This value seems to be applying the new SE-170 policy that value must be exact on Xcode9.3b3.

tickle(Int.max/10)
[922337203685477580]
__NSCFNumber
As Int:  Optional(922337203685477580)
As Double:  nil
Precise:  false

And then there is this value which manages to be wrong three ways in all versions on Sierra!

tickle(Int.max/9)
[1024819115206086200]
NSDecimalNumber
As Int:  nil
As Double:  Optional(1.0248191152060863e+18)
Precise:  false

All values fall into line with pre/post SE-0170 expectations on High Sierra. Go figure.

[9223372036854775807]
__NSCFNumber
As Int:  Optional(9223372036854775807)
As Double:  nil
Precise:  false
[1024819115206086200]
__NSCFNumber
As Int:  Optional(1024819115206086200)
As Double:  nil
Precise:  false
[922337203685477580]
__NSCFNumber
As Int:  Optional(922337203685477580)
As Double:  nil
Precise:  false

So where does this leave this PR? Since short term casting hacks while we wait for NSNumber are unacceptable I’d still apply the PR as originally proposed even if it does not solve the underlying problem as it is a small step in the right direction and removes some incorrect code.

@millenomi
Copy link
Contributor

millenomi commented May 25, 2018

NSNumber-powered casts are very close to landing — see #1526.

Given that, @johnno1962, should we dupe this there?

@johnno1962
Copy link
Contributor Author

Good work on NSNUmber @millenomi, you can close this now though keep in mind the bug I pointed out for large whole doubles being converted incorrectly to ints though that may no longer be relevant.

@millenomi
Copy link
Contributor

There may still be work to do in this later (eg emitting NSNumbers rather than specifically-types numeric types to align with Darwin.)

@johnno1962
Copy link
Contributor Author

@johnno1962
Copy link
Contributor Author

johnno1962 commented Jul 13, 2018

I’m leaving this open as it’s not resolved though the groundwork been done getting NSNumber casts to work for Swift 4.2. Recommend a patch along the following lines instead (Covered in PR #1633):

@xwu
Copy link
Contributor

xwu commented Jul 13, 2018

@johnno1962 Darwin Foundation, as you know, always uses reference types; we can't change the public API but I agree it makes sense to align behavior by using reference types (i.e. always emitting NSNumber) in corelibs-foundation too, as @millenomi mentions.

@johnno1962
Copy link
Contributor Author

Resolved by #1634

@parkera
Copy link
Contributor

parkera commented Jul 16, 2018

Great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants