-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser improvements #202
Parser improvements #202
Conversation
well, naming that's the hardest part :D I'll commit changes when option name is chosen. |
any ideas about that option name? |
No, I'm happy to leave it for now, we can consider changing it any time before a full 1.0 is released (some day). I'll merge this later. Thanks for the work! |
Change SeparatorSlashOrDash to separate Slash and Dash Add more separator types (Dot, Colon, Space, Quote, T, W)
Create Date and Time classes Add Sign class
Nice :) about next PR, I'm currently thinking how to properly implement timezone parsing. This current implementation doesn't actually parse it at all and just pass it to generic |
This is really great. Thanks a lot for the work! I pinged @mojombo on Twitter about giving you commit bit (or moving the repo to my account, which might make more sense in the long run), hopefully we can get that actioned at some point. |
I don't really need commit access. Just when I started reviewing this I noticed how poorly (IMHO) some things are implemented that I wanted to fix it once and for all :D So basically idea is that you would throw in any commonly used date/time format and it would parse it successfully. |
It appears that TZ database includes even abbreviations. And they can be accessed from TZInfo gem, only thing is that different abbreviations were used at different times so one can represent multiple offsets depending at which time and in which country you're in. I quickly generated all abbreviations to possible offsets gists/TimezoneAbbr_To_Offsets.yaml |
Current implementation have reached it's limits and it's very hard to add some specific new date/time format parsing, thus there's need for some refactoring. While general design/idea with Tokens and Tags are very good there's missing some steps in implementation and due that over years there have been made several hacks (especially in
pre_normalize
). It's not good to use so many regexp pre-processing if it all could be done with proper parser.Also currently codebase is quite complicated in some areas and thus it's not easy to contribute. It requires lot of time to properly investigate how it does work.
So my idea is to implement proper parsing and refactor some code parts. It would make it much easier to add new formats for parsing. Some formats which are parsed with current implementation - they does work because of pure luck I would say and overall this library seems quite hackish...
For example I wondered why this date
2013-08-05T22:55:03+03:00
even worked, then I noticed that05T22:55:03+03:00
is parsed as TimeZone and whole:text
is passed to genericTime.parse
. I think that's not good at all...Ok, now more about my idea. It's quite simple, first we divide whole input string by smallest possible tokens (like numbers, separators (dash, colon, etc.)) then we group them together so they make Date, Time and TimeZone (I haven't decided yet what would be best way for this grouping) and only finally we handle those groups based on which combination is most likely correct.
Also I think there are way too many handlers and probably most are unneeded and kinda duplicate code as in end it doesn't matter if it's ScalarMonth or RepeaterMonthName because that's same thing. Basically handlers should just create date/time from already parsed numbers.
Now about this PR. These are first changes in refactoring about which I was talking. I'll send some few more PRs, but it will take some time and changes probably will be quite big, even this PR isn't small, but here I barely changed anything.
About changes in this PR.
Handler.match
so pattern array can contain arrays of tags. It means it will match any of them (simple or), eg.Chronic::Handler.new([[:separator_on, :separator_at], :scalar], :handler)
will matchon 2
orat 2
SeparatorSlashOrDash
to separateSeparatorSlash
andSeparatorDash
\d
all the time, can do it only once-
and+
options
as 2nd arg to anyRepeater.new
otherwise it wasn't possible for them to receive any options:hours24
option to be able to force 24-hour clock (resolves Force 24h mode #201). When:hours24
istrue
then there's no ambiguous time and it will be always correct. If it'sfalse
then03:00:00
will be treated as 12-hour clock, but whennil
it's same behavior as currently was.Handler.match
and:hours24
Some changes in this PR aren't really used yet such as so many Separators, but they'll be needed in future.
All tests does pass and everything is backwards compatible. But still should review carefully as I consider any date/time format for which there's no tests as unsupported (because I don't know about them). Also I think it's good time to release new version now as current state is somewhat okay.