Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec for human readable date and time parsing #845

Closed
Tracked by #250
kgodey opened this issue Nov 26, 2021 · 14 comments
Closed
Tracked by #250

Spec for human readable date and time parsing #845

kgodey opened this issue Nov 26, 2021 · 14 comments
Assignees
Labels
type: enhancement New feature or request work: backend Related to Python, Django, and simple SQL work: documentation Improvements or additions to documentation work: frontend Related to frontend code in the mathesar_ui directory

Comments

@kgodey
Copy link
Contributor

kgodey commented Nov 26, 2021

Problem

We'd like to support parsing human readable dates and times when users are interacting with data with date/time types. For example, a user might want to filter on a column "within the next 3 months" or "2 hours ago". However, we don't have a clear spec for what formats to support and how to make sure parsing is consistent across the frontend and backend.

Proposed solution

We should write a spec for what formats we will support and how to make sure it's consistent between the frontend and backend. We want frontend parsing so that we can validate user input without being slowed down by a round trip to the backend. We also want backend parsing so that we can parse data in bulk imports.

The desired output is a spec on the wiki's Architecture section that outlines a solution that works for both frontend and backend. Once the spec is ready, we will create new issues to implement it.

Additional context

  • Please see discussion in Handle TIME data type in the backend #426
  • While writing the spec, keep in mind that we may not want identical parsers for the frontend and backend. For example, if a user is typing "2 days ago" in a filter in the frontend, that's happening in real time and we can derive a date from it. However, if the user is bulk importing data, we may not want to assume that the data is real time (e.g. they might export a CSV with "2 days ago" in it and then upload it to Mathesar 5 days later). Please think through use cases like that.
@kgodey kgodey added type: enhancement New feature or request status: triage labels Nov 26, 2021
@kgodey kgodey added this to the [08] Initial Data Types milestone Nov 26, 2021
@kgodey kgodey added ready Ready for implementation work: backend Related to Python, Django, and simple SQL work: documentation Improvements or additions to documentation and removed status: triage labels Nov 26, 2021
@kgodey kgodey added the work: frontend Related to frontend code in the mathesar_ui directory label Nov 26, 2021
@mathemancer
Copy link
Contributor

mathemancer commented Dec 10, 2021

Here's the docs page with all info about how PostgreSQL parses dates and times: https://www.postgresql.org/docs/13/datetime-appendix.html

Here's the page with info about date/time types: https://www.postgresql.org/docs/13/datatype-datetime.html

@dmos62
Copy link
Contributor

dmos62 commented Dec 13, 2021

So far we've been talking about human-language (human-readable is a bit of a misnomer, since a human can read ISO8601) relative time (e.g. 5 hours ago) and relative period (e.g. within last 5 hours) specification.

  • Do we want to do other similar stuff, like "last Thursday"? It would be good to enumerate what uses we want to support;
  • Notice how these relative specifications do not explicitly include part of the data necessary for full parsing:
    • for example, "two hours ago": what is the absolute point in time (the now) we're relating to here? We can silently imply that it's the time of parsing, but in case of batch import, we could have the user choose "the now" to be used per column;
    • "within the last week": does the week start/end Sunday or Monday? Parsing requires locale information;
    • "one day ago": do we mean the date or the date-time? This one's easy: we can look at column's type;
  • Notice that "within last hour" is not a type in Postgres: the Postgres interval type is "some length of time not relative to any point in time", like "1 hour".

It's good for the user to get a lot of feedback about what implications we're making about his specification. I'm sometimes surprised by a device defaulting to weeks starting on Sundays (relevant when parsing "since last week" for example). Similarly, it's good to communicate to a user to what type (is it a date, a time, a period, a relative time, a relative period) his specifications will be parsed to.

The most holistic approach might be to introduce new types for relative time/period specifications: "5 minutes ago" specifies a point in time relative to another unspecified point in time: so we save it as such (relative point in time) and let the user cast it to an absolute point in time if he wants to (he might not want to); if he does cast he can then specify the absolute point in time required to turn a relative point in time to an absolute.

@mathemancer
Copy link
Contributor

You've pointed out a number of things that we will need to specify.

One quick note about:

The most holistic approach might be to introduce new types for relative time/period specifications: "5 minutes ago" specifies a point in time relative to another unspecified point in time: so we save it as such (relative point in time) and let the user cast it to an absolute point in time if he wants to (he might not want to); if he does cast he can then specify the absolute point in time required to turn a relative point in time to an absolute.

I don't think the relative time has meaning beyond the INTERVAL type without the absolute time it's measured from. I.e., "5 minutes ago" is an interval "-5 minutes" summed with the current time. If we're not going to realize the absolute time, it's just an interval.

@dmos62
Copy link
Contributor

dmos62 commented Dec 15, 2021

If we're not going to realize the absolute time, it's just an interval.

Good point. I didn't realise INTERVAL can be negative.

@mathemancer
Copy link
Contributor

mathemancer commented Jan 27, 2022

After fiddling / struggling with the INTERVAL type quite a bit, my preferred option is to use a string conforming to ISO8601 as the canonical representation for all date, time, datetime (timestamp), and duration (interval) types. When querying / selecting, this would be what's returned by the API. For inserts, the front end could either send along the string as the user inputs it, or optionally do some validation and then submit either the original string or a modified version conforming to ISO8601.

Questions:

  • @pavish @seancolsen Are you okay with modifying and prettifying that format for display to the user? The alternative is to prettify it in PostgreSQL and display the result, but it's less flexible for the UI that way.
  • @kgodey @ghislaineguerin Should we support ISO week date system? If so, would it be okay to add that feature later (or when requested by a user)?

Ideas I tried / partially implemented, then discarded:

  • a custom class that has fields like "year", "month", "day", etc.
    • This is pretty complicated at the DB level if we want to avoid installing custom functions (I don't want to do that since it's a default type that we should support even without custom functions).
    • This requires shared knowledge between the UI and API about the expected fields. Perfectly fine, but unneeded in our case since there's a predefined ISO format
    • This would result in a kind of Pseudotype JSON column that contains a bunch of metadata that the front end would have to go through and only display the desired parts.
  • use the DB representation and parsing in both directions
    • super simple to implement (obviously)
    • front end doesn't need to do any parsing in either direction.
    • not very flexible for front end if we want to change format.

@seancolsen
Copy link
Contributor

@mathemancer

Are you okay with modifying and prettifying that format for display to the user?

Just to make sure I understand your question correctly, you're proposing that we standardize the API to always transfer date, time, date-time, and duration data as a string in conformance with ISO-8601. Correct? This would be opposed to, say, transferring a date as a unix timestamp number or as a formatted string like "Jan 27, 2022". If I'm understanding correctly, then yes! That sounds great!

@mathemancer
Copy link
Contributor

mathemancer commented Jan 27, 2022

@seancolsen Yes, that's what I'm proposing. Note that the ISO8601 format is pretty ugly by default.

Edit: The API will accept non-conforming strings and use the default PostgreSQL functionality to attempt to understand them (the PostgreSQL implementation is quite featureful). The returned value will always be ISO8601 conforming.

@ghislaineguerin
Copy link
Contributor

@mathemancer I think it’s ok to have it later unless it creates inconsistencies for the user. If when we introduce it later we break things then it might be better to have it as part of this. I mean for the week date system.

@mathemancer
Copy link
Contributor

@mathemancer I think it’s ok to have it later unless it creates inconsistencies for the user. If when we introduce it later we break things then it might be better to have it as part of this. I mean for the week date system.

It wouldn't break anything to introduce later. It would be strictly additive. The reason I want to hold off is that only a few countries care about week numbers, and it adds a bunch of pain to the implementation. However, it is a feature that Germans using the product would probably expect.

@mathemancer mathemancer self-assigned this Jan 27, 2022
@kgodey
Copy link
Contributor Author

kgodey commented Jan 27, 2022

I think it's fine to use ISO8601 representations in the API and rely on the frontend to do date/time parsing when interacting with the API.

We will still need to parse dates, times, and durations in the backend while doing imports. @mathemancer, it sounds like we're going to rely on default Postgres parsing here. Can you document what formats it supports or link to it when you write the spec? We should make sure that the frontend parsing is the same so that users can expect the same behavior however they enter dates.

It's fine to defer support for the ISO week date system, please create a ticket to track the work and put it in the future consideration milestone.

@pavish
Copy link
Member

pavish commented Jan 28, 2022

I also agree that it is better to always use one particular format, in this case ISO8601, for data returned by our APIs.

The frontend can format it for display however required.

@mathemancer
Copy link
Contributor

@mathemancer, it sounds like we're going to rely on default Postgres parsing here. Can you document what formats it supports or link to it when you write the spec?

Sure thing.

@kgodey kgodey added status: started and removed ready Ready for implementation labels Feb 25, 2022
@kgodey
Copy link
Contributor Author

kgodey commented Feb 25, 2022

@mathemancer Please close this once you're done with the spec updates.

@kgodey
Copy link
Contributor Author

kgodey commented Mar 18, 2022

This was resolved in mathesar-foundation/mathesar-wiki#40

@kgodey kgodey closed this as completed Mar 18, 2022
Repository owner moved this from Started to Done in [NO LONGER USED] Mathesar work tracker Mar 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request work: backend Related to Python, Django, and simple SQL work: documentation Improvements or additions to documentation work: frontend Related to frontend code in the mathesar_ui directory
Projects
No open projects
Development

No branches or pull requests

6 participants