Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Sedate IXDTF (extended ISO) string parsing function #2127

Open
sffc opened this issue Jun 27, 2022 · 20 comments · Fixed by #4646
Open

Add Sedate IXDTF (extended ISO) string parsing function #2127

sffc opened this issue Jun 27, 2022 · 20 comments · Fixed by #4646
Assignees
Labels
C-datetime Component: datetime, calendars, time zones discuss Discuss at a future ICU4X-SC meeting discuss-triaged The stakeholders for this issue have been identified and it can be discussed out-of-band good first issue Good for newcomers help wanted Issue needs an assignee S-medium Size: Less than a week (larger bug fix or enhancement) T-enhancement Type: Nice-to-have but not required

Comments

@sffc
Copy link
Member

sffc commented Jun 27, 2022

We should add a function that parses ISO-8601 strings. We should follow the Temporal grammar:

https://tc39.es/proposal-temporal/#sec-temporal-iso8601grammar

i.e., we should implement a string parser conforming to IETF Sedate that parses a string into its components (date, time, time zone, calendar), and use this parser in APIs and anywhere else we parse date strings. Think about where to put this code in the crate structure.

We should start with the DateTime production, and work our way up towards CalendarDateTime and TemporalZonedDateTimeString.

Examples of strings that are accepted as part of the DateTime grammar:

  • 2022-06-27T16:31:00.000
  • 2022-06-27t16:31:00.000
  • 2022-06-27 16:31:00.000
  • 20220627 16:31:00.000
  • 20220627 163100.000
  • 20220627 163100
  • 20220627 1631
@sffc sffc added T-enhancement Type: Nice-to-have but not required good first issue Good for newcomers help wanted Issue needs an assignee backlog C-datetime Component: datetime, calendars, time zones S-medium Size: Less than a week (larger bug fix or enhancement) labels Jun 27, 2022
@sffc sffc added this to the ICU4X 1.1 milestone Aug 9, 2022
@sffc sffc removed the backlog label Aug 9, 2022
@sffc
Copy link
Member Author

sffc commented Aug 9, 2022

Note: currently we have https://unicode-org.github.io/icu4x-docs/doc/icu_datetime/mock/fn.parse_gregorian_from_str.html

We should start with a design doc for this. We likely want to end up with a parser type named something like IetfDateTimeStringParser that is able to support the whole grammar.

@sffc
Copy link
Member Author

sffc commented Aug 9, 2022

Question: do we want to have specific, stricter parsers for each individual type, as well as the fully-featured one conforming to IETF Sedate?

@sffc
Copy link
Member Author

sffc commented Jun 22, 2023

We need to design how the parser will be architected so that it can be easily implemented.

Discuss with:

Optional:

@sffc sffc added the discuss-triaged The stakeholders for this issue have been identified and it can be discussed out-of-band label Jun 22, 2023
@robertbastian
Copy link
Member

Do we want to leverage an existing implementation, such as the time crate?

@nekevss
Copy link
Contributor

nekevss commented Feb 28, 2024

Hi! FWIW, we have an initial parsing implementation in temporal_rs. Although, it admittedly needs more testing and clean up.

@Manishearth
Copy link
Member

Do we want to leverage an existing implementation, such as the time crate?

Open to doing so with a Cargo feature.

Though I think this is the kind of thing that is okay for us to expose if we wish.

@sffc
Copy link
Member Author

sffc commented Feb 29, 2024

I think a SEDATE parser should be its own crate. Maybe it lives in our repo, maybe in the temporal_rs repo, maybe but maybe not in the time or chrono repo.

@sffc sffc changed the title Add ISO-8601 string parsing function Add Sedate IXDTF (extended ISO) string parsing function Feb 29, 2024
@sffc
Copy link
Member Author

sffc commented Feb 29, 2024

I changed the title of this issue to emphasize that we're talking about IXDTF, which is not functionality I'm aware of in Rust except for temporal_rs mentioned above.

@nekevss
Copy link
Contributor

nekevss commented Feb 29, 2024

Just to expand on my above comment. From what I can tell after reading through the IETF doc, I think the parser in temporal_rs can parse Sedate IXDTF or, at least, is fairly close to it. There might be a couple things missing, and the API definitely needs some work if the decision were to use that parser.

I'm pretty neutral on whether the functionality lives in icu4x, temporal_rs, or its own crate. I mostly figured that I would mention that it exists if there was interest in using the currently existing one as a base or just cleaning it up and using what's there. 😄

For reference, here are all the tests for that parser.

@sffc
Copy link
Member Author

sffc commented Feb 29, 2024

Thanks for the reference! Looking over that code, it seems to be well-written and suitable as a dependency of the icu4x project. I think it would be worthwhile pulling this code into an ixdtf crate (it looks like we are currently squatting that crate name). Seems harmless to include the Duration parsing code as well. The parser should return the raw IsoParseRecord type so that it isn't specific to the types in either icu_calendar or temporal_rs.

@nekevss
Copy link
Contributor

nekevss commented Mar 1, 2024

Sounds good. I'll move the code over to the directory. update it with the above comments., and submit a PR

@sffc
Copy link
Member Author

sffc commented Mar 15, 2024

Discussion: after @nekevss's PR is landed, we will add ixdtf as an optional, default dependency of icu_calendar and use where necessary.

LGTM: @sffc @Manishearth @robertbastian

@sffc
Copy link
Member Author

sffc commented Mar 27, 2024

#4646 basically fixes this issue but there were some minor follow-ups. I'm re-opening this issue to track that, or we can move those to another issue and re-close this one.

@sffc sffc reopened this Mar 27, 2024
@nekevss
Copy link
Contributor

nekevss commented Mar 27, 2024

I can definitely work on the follow ups for the ixdtf as needed. I think it makes sense to have a separate issue to track progress on any follow ups needed on ixdtf.

@sffc
Copy link
Member Author

sffc commented Apr 23, 2024

Schedule release review for the IXDTF crate

Start: 11:01

Options:

  1. Ship as 0.x when @sffc and @nekevss feel it is ready
  2. Ship as 0.x after an API review with the team
  3. Ship as 1.0 after an API review with the team

Discussion:

  • @nekevss - A 1.0 release would make sense if we can point to RFC 9557. It is soon to be published but it's not published yet.
  • @sffc - It sounds like we could start with 1 or 2 and be prepared to ship 1.0 when RFC 9557 is out.
  • @robertbastian - I'm happy with any conclusion of the domain experts (@sffc and @nekevss) because it doesn't put requirements on the ICU4X project itself

@sffc
Copy link
Member Author

sffc commented May 2, 2024

Checklist:

  • The crate should fully conform with the ICU4X Style Guide:
    • The crate should have a complete library header as shown in boilerplate.md with Clippy passing
    • The names of exported types should conform to the recommendations in the Style Guide
    • All Error types should be Copy
    • All constructors should take options structs by value
    • All constructors should take arguments in the following order: Provider, Locale, Options
    • All constructors should have the standard set of overloads for provider types
    • Runtime dependencies should be minimal and able to be disabled with Cargo features if possible
    • Any TODO, FIXME, todo!, unimplemented!, or other placeholders should either be resolved or link to an issue number. (It is okay to ship a small amount of code with tech debt comments, but anything having to do with code correctness should be resolved)
  • The crate should have a conventional Cargo.toml file:
    • Cargo.toml should use license, not license-file
    • The description should be useful
    • Correct repo link, authors, categories, include
    • Correct docs.rs and cargo-all-features metadata settings
    • A rust-version field with the MSRV of this crate in accordance with the current ICU4X policies on MSRV.
    • The crate should have an std feature if (and only if) it contains code that depends on std, such as file I/O or implementing the Error trait
    • Audit all features so that any foo/bar in a feature is foo?/bar when foo is an optional dep; if you intend to enable foo, add two entries
    • Use dep: for enabling dependencies
  • The crate should be fully documented
    • Every exported function should have docs coverage
    • There should be a crate-level example that illustrates a common use case for the component with the heading # Examples
    • All options and conditional code paths should have a corresponding docs test with the heading # Examples
    • All functions that are conditional on a Cargo feature should say so (last line before # Examples): ✨ *Enabled with the `alloc` Cargo feature.*
    • Compiled data constructors should say "with compiled data" in the first sentence and should have a Cargo feature alert following the above syntax.
  • The APIs should follow ICU4X style
    • All options bags should be Copy (and contain references if they need to). Exceptions can be made by discussion.
  • The data structs should fully follow ZeroVec style
    • Deserialization should not have a "zero-copy violation" in the make-testdata test
    • Constructors should avoid allocating memory in the common case
    • Opaque blobs of data should be avoided if possible (instead use VarZeroVec, ZeroMap, etc.)
    • Data structs should not be panicky to load/deserialize and conform to data_safety.md
  • The component should be fully integrated with ICU4X tooling
    • There should be an overview Criterion benchmark
    • There should be individual Criterion benchmarks for interesting or performance critical code paths
    • There should be at least one example plumbed with the icu_benchmark_macros
    • The component should be fully covered by FFI with no unjustified suppressions nor entries in missing_apis.txt
  • The component should encourage i18n best practices
    • The component should produce correct results in all locales as generated by icu4x-datagen
    • Where applicable, the component should be consistent with ECMA-402 and UTS#35
    • Any gaps in i18n quality should be fixed, or, if that is not possible, they should have tracking issues and a concrete, resourced path forward. The intent is to not ship components with known i18n correctness problems and no plan to fix them in an upcoming release
    • The API design should receive sign-off from a non-ICU4X i18n expert such as Markus Scherer
  • Add the new features to the changelog in the "Unreleased" section

@sffc
Copy link
Member Author

sffc commented May 2, 2024

  • @nekevss - Should we release the 1.0 when the RFC is out?
  • @sffc - I think we said yes; maybe we could incubate at a pre-1.0 for a release? It seems rushed if we do the 1.0 release so soon after the implementation landed, even if the RFC is done.
  • @sffc - How about this. Once temporal_rs and icu_calendar are both using IXDTF, we can consider that sufficient experience to be able to make a 1.0 release of ixdtf.

Plan:

1a. @sffc to finish the IXDTF checklist.
1b. @sffc to add ixdtf as a depedency of the icu_calendar crate and use it
1c. @nekevss to bring the docs up to par with the checklist requirements
2. Release ixdtf at 0.x
3. Add ixdtf as a depedency of the temporal_rs crate and use it
4. We consider icu_calendar and temporal_rs experience to be sufficient for making a 1.0 release of ixdtf

@sffc
Copy link
Member Author

sffc commented Jul 19, 2024

Note from #5260: ParserError should be ParseError

sffc pushed a commit that referenced this issue Aug 6, 2024
Updating the error of `ixdtf` from `ParserError` to `ParseError` was
mentioned in #2127 via feedback from #5260.
@robertbastian
Copy link
Member

What is missing here?

@sffc
Copy link
Member Author

sffc commented Sep 17, 2024

Graduation of the ixdtf crate, I think, but that should probably be moved to its own standalone issue.

I think we're pretty much done with the general ICU4X datetime parsing functions now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-datetime Component: datetime, calendars, time zones discuss Discuss at a future ICU4X-SC meeting discuss-triaged The stakeholders for this issue have been identified and it can be discussed out-of-band good first issue Good for newcomers help wanted Issue needs an assignee S-medium Size: Less than a week (larger bug fix or enhancement) T-enhancement Type: Nice-to-have but not required
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants