Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU4X struct construction with compiled_data has a run-time cost compared accessing static POD #5187

Open
hsivonen opened this issue Jul 8, 2024 · 14 comments
Assignees
Labels
2.0-breaking Changes that are breaking API changes A-performance Area: Performance (CPU, Memory) C-data-infra Component: provider, datagen, fallback, adapters

Comments

@hsivonen
Copy link
Member

hsivonen commented Jul 8, 2024

Steps to reproduce

Clone the repos https://github.com/hsivonen/rust-url , https://github.com/hsivonen/idna_adapter , and https://github.com/hsivonen/idna_mapping next to each other.

Check out the branch icu4x-baked-cost for idna_mapping.

Run cargo bench to_ascii_cow_plain in the idna directory of rust-url with the four cases consisting of:

  1. rust-url with branches icu4x-baked-cost-eager and icu4x-baked-cost-deferred
  2. idna_adapter with branches icu4x-baked-cost-unicode-rs and icu4x-baked-cost-icu4x

Actual results

When the idna_adapter branch is the unicode-rs version, eager vs. deferred branch of rust-url doesn't matter.

When the idna_adapter branch is the icu4x version, eager vs. deferred branch of rust-url matters: deferred is faster.

Discussion

This shows that eagerly instantiating compiled_data ICU4X objects via functions that are const still has a run-time cost that shows up in the benchmark that stays on an ASCII fast path that doesn't actually use the ICU4X objects. The unicode-rs case avoids this problem by there being no pointer chasing of static data until actual use, so the case that doesn't need any Unicode data doesn't pay any cost of the static data chasing.

This shows that users of ICU4X that don't need dynamic data loading still pay a cost for ICU4X having (even compiled out) dynamic data loading support compared to an imaginable case where ICU4X data was laid out as static POD the way unicode-rs data is.

Trying to defer ICU4X struct instantiation until first use adds code complexity to do perfectly: with loops and branches, this would end up involving having to track whether the instantiation is done and paying the cost of branching on that bookkeeping. Putting the ICU4X structs, which are const-constructible, in static doesn't work, because whether the ICU4X structs are Send and Sync depends on options on icu_provider, so the baked_data case can't count on the structs being Send and Sync without foiling the design option that allows them not to be.

Ideally, a code structure that const constructs compiled_data-flavored ICU4X structs and then doesn't use them would not have a measurable run-time performance penalty.

@hsivonen hsivonen added A-performance Area: Performance (CPU, Memory) C-data-infra Component: provider, datagen, fallback, adapters labels Jul 8, 2024
@hsivonen hsivonen changed the title ICU4X struct construction with complied_data has a run-time cost compared accessing static POD ICU4X struct construction with compiled_data has a run-time cost compared accessing static POD Jul 8, 2024
@sffc
Copy link
Member

sffc commented Jul 8, 2024

The relevant struct whose construction is being deferred is

/// An adapter between a Unicode back end an the `idna` crate.
pub struct Adapter {
    mapper: Uts46Mapper,
    canonical_combining_class: CanonicalCombiningClassMap,
    general_category: CodePointMapDataBorrowed<'static, GeneralCategory>,
    bidi_class: CodePointMapDataBorrowed<'static, icu_properties::BidiClass>,
    joining_type: CodePointMapDataBorrowed<'static, icu_properties::JoiningType>,
}

The first two use DataPayload, and the last three use &'static references. We know DataPayload has a small but nonzero cost, which is a motivator behind CodePointMapDataBorrowed being a separate type with its own compiled_data constructors. Since the other two structs (Uts46Mapper and CanonicalCombiningClassMap) are not locale-sensitive, we could add Borrowed versions of these with the free constructors, avoiding DataPayload (and Yoke).

@sffc
Copy link
Member

sffc commented Jul 8, 2024

Putting in the 2.0 milestone in case we want to consider doing a comprehensive treatment of all ICU4X types that could have fully borrowed constructors.

@robertbastian @Manishearth

@sffc sffc added this to the ICU4X 2.0 ⟨P1⟩ milestone Jul 8, 2024
@sffc sffc added the discuss Discuss at a future ICU4X-SC meeting label Jul 8, 2024
@hsivonen hsivonen added the 2.0-breaking Changes that are breaking API changes label Jul 10, 2024
@robertbastian
Copy link
Member

robertbastian commented Jul 11, 2024

Since the other two structs (Uts46Mapper and CanonicalCombiningClassMap) are not locale-sensitive, we could add Borrowed versions of these

Locale-sensitivity is not a requirement to have a borrowed constructor, only for that borrowed constructor to use a singleton instead of having to go through a DataPayload branch. We could make every single one of our compiled-data constructors return borrowed types in order to never having to branch on DataPayload after construction.

@sffc
Copy link
Member

sffc commented Jul 11, 2024

Yes, "has singleton data marker" is more precise than "not locale sensitive".

@robertbastian
Copy link
Member

No it's the same, but it's not a requirement.

@hsivonen
Copy link
Member Author

I started making borrowed versions of the normalizer structs. I pushed my non-compiling WIP to https://github.com/hsivonen/icu4x/tree/borrowednormalizer in case anyone wants to take a look while I'm away. I got stuck trying to clone/zero_from crate::provider::Baked::SINGLETON_CANONICAL_COMPOSITIONS_V1_MARKER.canonical_compositions in a const context.

@sffc sffc added discuss-priority Discuss at the next ICU4X meeting and removed discuss Discuss at a future ICU4X-SC meeting labels Jul 23, 2024
@sffc
Copy link
Member

sffc commented Jul 23, 2024

I'd like to discuss if we want this to block 2.0 or not. I think it can be safely done after 2.0 but might require deprecating functions.

@Manishearth
Copy link
Member

Manishearth commented Jul 23, 2024

I think it shouldn't block 2.0 unless we can come up with a way to do this by tweaking the DataPayload architecture somehow. I do think we should try thinking about that: for example we could totally have a feature gated thing that swaps DataPayload with an &'static ref, it already exists as a variant but the other variants could be disabled (either as a global icu_provider decision, or as a per-crate feature flag that works by swapping between DataPayload and StaticDataPayload which otherwise have identical APIs.

So let's block 2.0 on ideation and picking a solution, but if the solution can be done without a breaking change we leave it be.

@sffc
Copy link
Member

sffc commented Jul 23, 2024

If we give non-singleton baked data a way to return &'static things, it would prevent us from doing #5230 in a non-breaking way. So I think we should decide on the direction pre-2.0.

@Manishearth
Copy link
Member

Yep!

@hsivonen
Copy link
Member Author

hsivonen commented Aug 7, 2024

I think it shouldn't block 2.0 unless we can come up with a way to do this by tweaking the DataPayload architecture somehow.

That seems like a much bigger deal than making Char16Trie support ZeroFrom, which does not look to me like a big deal, since the only field of the struct is a ZeroVec.

@sffc
Copy link
Member

sffc commented Aug 8, 2024

  • @hsivonen - Manish said that DataPayload requires some design stuff. We could make Char16Trie support ZeroFrom to avoid tweaking architecture.
  • @Manishearth - If we can find a surgical fix for this specifically (Char16Trie is probably the main culprit), I support it
  • @sffc DecomposingNormalizer: Repeated calls to .get() is slowing things down.
  • @hsivonen Downstream consequence of creating DecomposingNormalizer is these constructors. ComposingNormalizer is harder since it has Char16Trie as a field. Got stuck around there.
  • @sffc - Currently you have
pub struct ComposingNormalizer {
    decomposing_normalizer: DecomposingNormalizer,
    canonical_compositions: DataPayload<CanonicalCompositionsV1Marker>,
}

pub struct DecomposingNormalizer {
    decompositions: DataPayload<CanonicalDecompositionDataV1Marker>,
    supplementary_decompositions: Option<SupplementPayloadHolder>,
    tables: DataPayload<CanonicalDecompositionTablesV1Marker>,
    supplementary_tables: Option<DataPayload<CompatibilityDecompositionTablesV1Marker>>,
    decomposition_passthrough_bound: u8, // never above 0xC0
    composition_passthrough_bound: u16,  // never above 0x0300
}

enum SupplementPayloadHolder {
    Compatibility(DataPayload<CompatibilityDecompositionSupplementV1Marker>),
    Uts46(DataPayload<Uts46DecompositionSupplementV1Marker>),
}

What you need is

pub struct ComposingNormalizerBorrowed<'a> {
    decomposing_normalizer: DecomposingNormalizerBorrowed<'a>,
    canonical_compositions: &'a CanonicalCompositionsV1<'a>,
}

pub struct DecomposingNormalizerBorrowed<'a> {
    decompositions: &'a CanonicalDecompositionDataV1<'a>,
    supplementary_decompositions: Option<SupplementPayloadHolderBorrowed<'a>>,
    tables: &'a CanonicalDecompositionTablesV1<'a>>,
    supplementary_tables: Option<&'a CompatibilityDecompositionTablesV1<'a>>,
    decomposition_passthrough_bound: u8, // never above 0xC0
    composition_passthrough_bound: u16,  // never above 0x0300
}

enum SupplementPayloadHolderBorrowed<'a> {
    Compatibility(&'a CompatibilityDecompositionSupplementV1<'a>),
    Uts46(&'a Uts46DecompositionSupplementV1<'a>),
}

@sffc sffc removed the discuss-priority Discuss at the next ICU4X meeting label Aug 15, 2024
@hsivonen
Copy link
Member Author

Normalizer PR in #5413 along the lines of the previous comment here.

@sffc
Copy link
Member

sffc commented Sep 17, 2024

@hsivonen Is this done?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.0-breaking Changes that are breaking API changes A-performance Area: Performance (CPU, Memory) C-data-infra Component: provider, datagen, fallback, adapters
Projects
Status: Being worked on
Development

No branches or pull requests

4 participants