Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ABI stability across major ICU4X versions #3790

Closed
Manishearth opened this issue Aug 4, 2023 · 22 comments
Closed

ABI stability across major ICU4X versions #3790

Manishearth opened this issue Aug 4, 2023 · 22 comments
Assignees
Labels
A-ffi Area: FFI, WebAssembly, Transpilation C-ffi-infra Component: Diplomat, horizontal FFI

Comments

@Manishearth
Copy link
Member

Manishearth commented Aug 4, 2023

I do not recall us discussing this1 (correct me if wrong). We should come up with a policy.

Basically, are we allowed to break the C ABI across ICU4X versions? The practical effect of this would be that people using old ICU4X bindings with a new ICU4X crate can continue to stick to the bindings they use. The reverse is not necessarily guaranteed since new ICU4X bindings may use new APIs under the hood that aren't present in the binary.

We do have this policy for minor ICU4X versions: it is possible to use ICU4X 1.1 headers with a compiled icu_capi 1.2, in theory. We do not test this.

In practical effects to ICU4X developers, this means

  • No renaming ffi types
  • No renaming ffi methods without leaving the old one around
  • No changing the parameters of ffi methods without leaving the old one around, or changing the parameters to explicitly ABI-compatible things

Discuss with:

Optional:

Footnotes

  1. I recall us discussing what to do across minor versions and the answer was "definitely no breakage"

@Manishearth Manishearth added the discuss Discuss at a future ICU4X-SC meeting label Aug 4, 2023
@Manishearth
Copy link
Member Author

Personally I'm moderately in favor of such a policy. People tend to use our FFI by checking in our bindings, so it's nice to be able to allow crate updates without requiring binding updates.

We already are on board with breaking the API surface of the high-level FFI APIs across major versions; and we plan to do so in 2.0 in a major way for C++ and JS.

A common mitigation strategy (available once we move to cpp2/js2) is using #[diplomat::attr(disable)] to hide old versions of APIs, and #[diplomat::attr(rename)] to make things look seamless at the higher level where desired.

For example, ICU4XCollatorOptions, if we have sufficient Diplomat features (rust-diplomat/diplomat#247, rust-diplomat/diplomat#271), can be at the C++/JS level just ICU4XCollatorOptions, but under the hood at the C level it would be ICU4XCollatorOptionsV1, v2, etc, and methods consuming it would also be _v1, etc, like we do today. When we add a V2, we transparently hide V1 stuff for higher-level APIs, and diplomat-rename the V2 APIs to match the unsuffixed ones. We keep V1 APIs around but attr(disable)d.

I would like to sketch out all the cases where we would potentially want to break ABI beforehand before making this decision, and decide on mitigation strategies where necessary.

Here are the cases I can think of:

  • Options structs, like above. I think we have a path for handling them, but it's contingent on a bunch of Diplomat features
  • Moving over to new Diplomat types, like optional or iterators. Probably has similar mitigation strategies, but it's unclear.
  • Changing how we prefix everything. I think the current strategy of "we name everything ICU4XFoo" and "all methods get prefixed with the struct name" is fine. I kinda would like to autogenerate that in diplomat but that's doable without breakage. We may want to instead fo ICU4X_ or something. I think the current thing is fine.

@sffc sffc added A-ffi Area: FFI, WebAssembly, Transpilation C-ffi-infra Component: Diplomat, horizontal FFI discuss-triaged The stakeholders for this issue have been identified and it can be discussed out-of-band labels Oct 19, 2023
@Manishearth
Copy link
Member Author

I feel like we already decided on this. Diplomat ABI renames make it possible to retain stability whilst adding new APIs. I think we have a good path forward here.

@sffc @zbraniecki @robertbastian does this seem acceptable?

@sffc
Copy link
Member

sffc commented Feb 29, 2024

There's more to discuss here, including how to deal with options structs and things that take options structs, version suffixes like CollatorOptionsV1, ...

@Manishearth
Copy link
Member Author

Manishearth commented Feb 29, 2024

Yes, I think we have a clear path for those as well in the face of Diplomat renames but I'll type it up later.

@Manishearth
Copy link
Member Author

@sffc , @robertbastian , and I came up with a set of principles for ABI stability. We have not yet fully figured out mechanisms

(Notation: @ is used here to mark an ABI name)

Principles:

For a given major version, FFI bindings are stable for whatever stability means in that language. E.g. JS may avoid versioning options bags by just adding new fields over time, whereas everyone else may need to use ICU4XCollatorOptionsV1, ICU4XCollatorOptionsV2.

ABI names are stable across versions, however they may be removed across versions. In other words, if you call the ABI function @MyStruct_foo() it will continue to work as long as the symbol exists.

The C backend is somewhat special; it should never be confusing around its relationship with the ABI. It is fine for the C backend to diverge from the ABI: we may have MyStruct_create mapping to @MyStruct_create_v1 and @MyStruct_create_v2 in a future version. We may also have MyStruct_create mapping to @MyStruct_make. What we cannot have is MyStruct_create mapping to @MyStruct_make and MyStruct_make mapping to @MyStruct_somethingElse.

In other words, we should never add a rename that would lead to a C backend name matching an ABI name that it does not map to.

To a lesser extent, we want to make sure the C++ and C backends are not too confusingly different.

Agreed on principles above: @sffc @Manishearth @robertbastian

@Manishearth
Copy link
Member Author

We also discussed having some global prefix/version suffix in the ABI, and did not come to consensus on this. This policy doesn't preclude doing this in the future.

@Manishearth
Copy link
Member Author

Yes, I think we have a clear path for those as well in the face of Diplomat renames but I'll type it up later.

Thinking about this more, this only works in backends like JS. The policy above allows us to do this kind of thing.

@Manishearth
Copy link
Member Author

Potential mechanisms for achieving this policy:

C aliases, Self-policing, limit non-blanket renames

We treat the C backend as higher level than just "ABI", though the code may still retain the ability to generate ABI headers for CPP-C2.

We use __attribute__((alias("foo")) for cases where the C name does not match the ABI name.

In ICU4X, we try to avoid having cases where the C backend name mismatches the ABI name. This does not preclude us from using renames: for example what we might do is have FixedDecimalFormatter, abi_renamed to ICU4XFixedDecimalFormatter, and c renamed to ICU4XFixedDecimalFormatter, but unrenamed (and namespaced) in other backends.

This means each rename involving C besides the default ICU4X-prefixing one should be carefully scrutinized. We use this scrutiny to avoid any clashes.

When we are

Obfuscated ABI names

We obfuscate ABI names such that it is abundantly clear that they are not the same as the real name. Something like a global c_rename = "__icu4x_internal_{}_v1", and we can bump suffixes in a local rename if we decide to reuse a "name".

(This is a partial design but people can fill this in)

@sffc sffc added this to the ICU4X 2.0 milestone Mar 8, 2024
@sffc
Copy link
Member

sffc commented Mar 8, 2024

I very much like an Obfuscated ABI Names approach. Advantages include

  1. Abundantly clear whether you are looking at API or ABI.
  2. More flexibility to rename/change functions without worrying about re-using or clashing with the previous names of things. We can and do want to change the shape (the number and types of arguments) of existing functions.

A few ways to do obfuscating:

Apply name mangling with argument and return value types

Basically, the signature of a function is a product of its arguments. If we encode this into the ABI name, we can get a globally unique signature.

Example: ICU4XFormatter_create__Locale_FormatterOptions_Result

Pros:

  • Guaranteed globally unique signature
  • No extra annotations needed

Cons:

  • Mangled names might not be very readable
  • @Manishearth pointed out that it is a big task to invent a name mangling scheme that works the way we want it to work

Explicit version

This approach could also solve the struct versioning problem. We add an attribute #[diplomat::version(X)] to every type. The C ABI always bakes the version number into all of the symbols. Other languages can choose to use or ignore the version number. For example, C and C++ structs can use the version number, but JS does not need to use it for structs.

Example: ICU4XFormatter_create__1

Pros:

  • Clean and readable names
  • Solves the struct versioning problem

Cons:

  • Requires adding a version attribute everywhere

Hash

We can pass the function signature through a hashing algorithm and then append the hash to the function name.

Example: ICU4XFormatter_create__a1b2c3d4e5f67890

Pros:

  • Fairly clean and readable names
  • No extra annotations needed

Cons:

  • The name cannot be manually constructed by a human; requires running the engine to figure out what the names will be

@sffc
Copy link
Member

sffc commented Mar 8, 2024

Another thing to note, since I don't think it was explicitly stated above: like with backwards-compatible data, we can build a DLL with both the old and the new symbols so that it can be dropped in-place to some binary that is expecting to find the old symbols.

I'm not advocating that we go out of our way to figure out exactly how to do this in the 2.0 release (I'm okay simply dropping the old symbols in this release), but when we have someone who needs it, it is a tractable problem to solve.

@Manishearth
Copy link
Member Author

This approach could also solve the struct versioning problem. We add an attribute #[diplomat::version(X)] to every type. The C ABI always bakes the version number into all of the symbols. Other languages can choose to use or ignore the version number. For example, C and C++ structs can use the version number, but JS does not need to use it for structs.

I would like the ABI and the C backend to be the same where possible and this is a divergence. That said, I'm not strongly opposed to this as long as it can be designed as a convenient alias for a pairing of abi_rename and rename. I am opposed to introducing Yet Another Way of doing renaming unless it can properly use the same infrastructure.

(We can support a {version} metavariable in the c_rename infra)

We can pass the function signature through a hashing algorithm and then append the hash to the function name.

This is still an argument-based name mangling scheme. This has the same "designing a good one is hard" con of the first name mangling approach. Ensuring stability for a hash based scheme where the set of hashables is growing over time and some of the things are equivalent (a bunch of our slice types are equivalent over FFI) is not easy and easy to get wrong.

I am very opposed to any automatic mangling scheme: this makes it much easier for things to automatically change under our feet without us realizing it. The current design is quite clear about what actions can change the abi names.

I very much like an Obfuscated ABI Names approach. Advantages include

Abundantly clear whether you are looking at API or ABI.
More flexibility to rename/change functions without worrying about re-using or clashing with the previous names of things. We can and do want to change the shape (the number and types of arguments) of existing functions.

I don't consider the first one to be too important. It's only a problem for one backend (C) and it's a backend where you probably want to be looking at the headers anyway.

I'm also not convinced of the utility of the second one: I really prefer this being an explicit choice. I'm not convinced we can come up and maintain with something that is reliable, really I'm not convinced of my ability to not accidentally break this when making changes to Diplomat.

@sffc
Copy link
Member

sffc commented Mar 13, 2024

(We can support a {version} metavariable in the c_rename infra)

I do not think a global {version} is the best approach. It causes the old FFI functions to be invalidated as soon as a new version is released. I would prefer the version tag, in whichever form we choose, to live as long as the signature of the function is ABI stable.

@Manishearth
Copy link
Member Author

Manishearth commented Mar 13, 2024

That's not what I'm saying: I'm saying that we can have a #[version] attribute that gets inherited the usual way but does nothing on its own, and you can reference its value in rename attributes. So we can set global1 version(1), and a global abi_rename = "icu4x_{type}_{method}_v{version}", and then we can piecewise apply version overrides to individual types and modules.

Footnotes

  1. Not entirely global; will need to be per-bridge-module since this stuff is consumed by the macro

@Manishearth
Copy link
Member Author

Discussion:

Concrete proposal: abi_rename = "icu4x_{type}_{method}_mv{mv}", with inherited #[methodversion] attribute that can be applied everywhere.

Types like CollatorOptions will be named CollatorOptionsV1 by default, but with #[diplomat::attr(struct_compat_blah_bikeshed, rename = "CollatorOptions")], and backends like JS with struct compat will be fine.

The ABI type "CollatorOptions" should be the latest and greatest.

May need defaults support in JS and Dart.

  • Rust: CollatorOptionsV1::default(), CollatorOptionsV2::default(), Collator::foo_v1(CollatorOptionsV1), Collator::foo_v2(CollatorOptionsV2)
  • ABI: icu4x_CollatorOptionsV1_default_mv1, icu4x_CollatorOptionsV2_default_mv1, icu4x_Collator_foo_v1_mv1(cov1), icu4x_Collator_foo_v2_mv1(cov2)
  • C++: CollatorOptionsV1::default(), CollatorOptionsV2::default(), Collator::foo1(), Collator::foo2() or overloads
  • C: as ABI
  • JS, Dart: CollatorOptions::default(), Collator::foo. We hide V1.

Achieved by:

  • global #[methodversion = 1] on all bridge modules forever
  • global abi_rename = "icu4x_{type}_{method}_mv{mv} on all bridge modules forever
    • this needs a bunch of work on the abi rename infra but is doable
    • also ideally abi_rename on a type does not clear abi_rename patterns
  • methods taking multiple versioned options are named foo_v1, etc.
  • #[diplomat::attr(non_exhaustive_structs, rename = "CollatorOptions")] on CollatorOptionsV2
  • #[diplomat::attr(non_exhaustive_structs, rename = "foo")] on Collator::foo_v2
  • #[diplomat::attr(non_exhaustive_structs, disable)] on CollatorOptionsV1 and Collator::foo_v1

The method version suffixes do not solve the options bag problem, but they are capable of solving the problem of replacing functions.

LGTM: @Manishearth, @sffc, @robertbastian

@Manishearth
Copy link
Member Author

Manishearth commented Mar 13, 2024

We may also want the default to be methodversion = 0.

@sffc sffc removed discuss Discuss at a future ICU4X-SC meeting discuss-triaged The stakeholders for this issue have been identified and it can be discussed out-of-band labels Mar 13, 2024
@sffc
Copy link
Member

sffc commented Mar 13, 2024

We may also want the default to be methodversion = 0.

I think methodversion = 1 is probably still better because

  1. foo_mv1 does not look like it is the second method version
  2. foo_mv0 could look like or be used for a pre-release / experimental method version

@Manishearth
Copy link
Member Author

On the diplomat side, this plan can be achieved more easily by:

  • global abi_rename = "icu4x_{0}_mv1 on all bridge modules forever
    • also ideally abi_rename on a type does not clear abi_rename patterns (could be done in the future, not 2.0)
  • For methods taking versionable options:
    • They are named foo_v1 in icu_capi
    • #[diplomat::attr(non_exhaustive_structs, rename = "foo")]
  • For versionable options bags:
    • They are named CollatorOptionsV1 in icu_capi
    • #[diplomat::attr(non_exhaustive_structs, rename = "CollatorOptions")]
  • When there is a V2:
  • #[diplomat::attr(non_exhaustive_structs, rename = "CollatorOptions")] on CollatorOptionsV2
  • #[diplomat::attr(non_exhaustive_structs, rename = "foo")] on Collator::foo_v2
  • #[diplomat::attr(non_exhaustive_structs, disable)] on CollatorOptionsV1 and Collator::foo_v1

On the plus side, this needs no new Diplomat work. However, ideally we get to 2.0 with:

  • support for Option fields in structs
  • Maybe: support for defaultable enums and fields specifically for JS/Dart/etc where structs and args may be null/undefined.

@robertbastian
Copy link
Member

In this scheme, are methods ever going to have _mv2?

@Manishearth
Copy link
Member Author

@robertbastian yes, methods may change for reasons unrelated to struct versioning

@Manishearth
Copy link
Member Author

(I confirmed this with Shane yesterday, he wanted to keep method versioning. I vaguely recall this being discussed during the meeting that decided this, the set of use cases was broader than struct versioning)

@Manishearth
Copy link
Member Author

The overall description of what mv is for is that we're hoping to have cross-ICU4X versions stable ABI1, but not stable FFI API, which may mean updating methods and using renames with mv.

Footnotes

  1. Where a C function of a given name has the same ABI. C functions may still be removed across major versions.

@robertbastian robertbastian self-assigned this Jul 11, 2024
@sffc
Copy link
Member

sffc commented Jul 11, 2024

For example, mv would allow us to change the signature of constructors to use baked data instead of all the workarounds we had to implement in 1.x. It allows us also to change function signatures, such as adding an additional argument, without creating ABI ambiguity.

It prevents someone bringing an old dylib and using it with newer code that uses the same function names with different signatures.

ICU4C does this, too, but it uses each major version in the ABI, like u_foo_75 for the ICU 75 version of u_foo. This proposal is better because the version number should only change if the method signature changed, which should be much more rare.

Manishearth pushed a commit that referenced this issue Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ffi Area: FFI, WebAssembly, Transpilation C-ffi-infra Component: Diplomat, horizontal FFI
Projects
Status: Done
Development

No branches or pull requests

4 participants