Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing of units from strings #298

Merged
merged 3 commits into from
Jan 25, 2020
Merged

Conversation

c42f
Copy link
Contributor

@c42f c42f commented Jan 17, 2020

Generalize the handling of @u_str and the unit extension module
context to allow the same machinery to be used for a new parseunit
function which does the same thing at runtime.

Fixes #214 in a basic kind of way, and fixes #272 as a side effect of generalizing @u_str.

I'm fairly happy with the implementation here, but I wonder whether there's a better naming for parseunit. For example, we could possibly do something like overload Base.parse(::Unit, str) instead if that's appropriate. Arguably this could extend more naturally to Base.parse(::Quantity, str)). Thoughts?

Generalize the handling of `@u_str` and the unit extension module
context to allow the same machinery to be used for a new `parseunit`
function which does the same thing at runtime.

Fixes PainterQubits#272 as a side effect of generalizing `@u_str`.
src/user.jl Outdated
parseunit(unitmodule::Module, str) = parseunit([unitmodule], str)
function parseunit(unitmods, str)
ex = Meta.parse(str)
eval(lookup_units(unitmods, ex))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I believe this use of eval is a rare circumstance where it's probably warranted: we want to evaluate the expression containing the small number of allowed_funcs where we've already sanitized the expression ex and fairly carefully resolved all symbols ourselves. In addition, we want to use Unitful.eval in particular, taking the allowed_funcs bindings from the scope of Unitful.

@codecov-io
Copy link

codecov-io commented Jan 17, 2020

Codecov Report

Merging #298 into master will increase coverage by 0.17%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #298      +/-   ##
=========================================
+ Coverage   78.23%   78.4%   +0.17%     
=========================================
  Files          15      15              
  Lines        1075    1079       +4     
=========================================
+ Hits          841     846       +5     
+ Misses        234     233       -1
Impacted Files Coverage Δ
src/Unitful.jl 100% <ø> (ø) ⬆️
src/user.jl 95.52% <100%> (+0.94%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b3d7357...6561a3b. Read the comment docs.

test/runtests.jl Outdated
@test parseunit("m") == m
@test parseunit("m,s") == (m,s)
@test parseunit("1.0") == 1.0
@test parseunit("m/s") == m/s

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess a test about ms should exist as well, to see whether it is interpreted as miliseconds?

How would you type the units of meters times seconds? In general in the tests presented here I don't see any test regarding parsing a multiplication of units (which is a reasonable way to make units as well).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: I guess by using multiplication sign *, but still this could/should be one test case as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could add that. As you can see, I just converted the existing u_str tests (which use the same infrastructure).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The likelihood that someone would make a change causing "ms" to parse as meters times seconds is basically zero. You'd really have to go out of your way to make that happen. I don't mind if we add it though.

src/user.jl Outdated
parseunit([unit_module(s),] string)

Parse a string as a unit type. The format for `string` must be a valid Julia
expression, and any identifiers will be looked up in the context `unit_module`
Copy link

@Datseris Datseris Jan 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose some doc improvement here: after valid Julia expression add "(i.e. multiplication between units is specified with *, division between units with / and space characters have no meaning (if between operators)". After this parenthesis add a full stop, to make the sentence easier to read, and continue with "Any identifiers...".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "The string will be parsed with standard Julia syntax" ? What you've described is not special to Unitful.

I was actually a bit more confused by the first sentence. Perhaps that could read "Parse a string for Unitful quantities, units, or dimensions."

@Datseris
Copy link

As a user or a developer of another Julia package that wants to add support for Unitful (namely NCDatasets.jl), how would I extend this function to handle cases more used in my field, e.g. using spaces to separate units that are multiplied?

@c42f
Copy link
Contributor Author

c42f commented Jan 17, 2020

how would I extend this function to handle cases more used in my field

So you are using NetCDF, yes? Do you know exactly what the rules are for parsing valid NetCDF unit strings? My impression is that they're defined by the implementation of the udunits package, which supposedly implements the formal grammar available at https://www.unidata.ucar.edu/software/udunits/udunits-current/doc/udunits/udunits2lib.html#Grammar.

So the robust and complete way to do it would be to write a proper parser which follows that exact grammar. But this seems potentially NetCDF specific so it's not clear that it belongs in Unitful itself.

For some cases you might get away with simple string substitutions to turn your NetCDF unit strings into julia syntax, which could then be passed to parseunit with an appropriate unit module. Though I admit this is not very satisfying and could lead to confusing error messages for invalid unit strings.

@Datseris
Copy link

Datseris commented Jan 17, 2020

I've commented this in the issue page, but here is an idea: make parseunit defined as :

parseunit([modules], str, preprocess = identity)

# inside parse unit:
s = preprocess(str)
# then give s to the macro

This would allow users and package developers to give a pre-processing function in a natural way, without Unitful.jl ever having to worry about what whackadoodle functions the users would use?

And for some pre-processing functions, if you find them helpful enough, you could add them in the docs of Unitful.jl?

@c42f
Copy link
Contributor Author

c42f commented Jan 17, 2020

But I'm not sure how this is better than simply writing parseunit(modules, preprocess(str)) which is more orthogonal and compositional?

@Datseris
Copy link

I guess the first way explicitly shows how you would extend the parsing functionality, while the second has to be explicitly mentioned in the docs? But sure, we can move forward with your way.

@Datseris
Copy link

Datseris commented Jan 17, 2020

orthogonal to the current discussion: can the first argument of parseunit be also made a Vector{<:Unit} so that only units in this vector are searched for matching?

@ajkeller34
Copy link
Collaborator

Thanks so much for this long-desired functionality!

I'm fairly happy with the implementation here, but I wonder whether there's a better naming for parseunit. For example, we could possibly do something like overload Base.parse(::Unit, str) instead if that's appropriate. Arguably this could extend more naturally to Base.parse(::Quantity, str). Thoughts?

The new implementation is really nice. It is a little funny that parseunit can return a Units or a Quantity, but that's not a problem to me. I suppose something along the lines of what you're describing would be a natural extension of Base.parse, and make parsing Quantitys feel a bit more first-class.

It seems like Base.parse(type, str) expects a concrete type as the first argument, for example parse(Complex, "1.0 + 2.0im") does not work. That raises two questions:

  1. Would we also require a concrete type, or would parse(Quantity, "1.0m") be expected to work?
  2. Should e.g. parse(typeof(u"1.0cm"), "2.0km") work?

I suppose since parse(Complex{Float64}, "1+2im") evidently works and converts to Complex{Float64}, probably the answer to the second question is that it should work and result in a Quantity in units of cm. The answer to the first question is not as obvious to me, since we would want to maintain a similar level of convenience to parseunit(str) but maybe there were clearly articulated reasons why parse(Complex, str) was not implemented. I was not able to find any relevant discussion. I would lean towards implementing parse(Quantity, str) and parse(Units, str).

Finally, would you imagine implementing three-arg versions, Base.parse(::Module, Quantity, str) or Base.parse(::Module, Units, str), similar to the two-arg version of parseunit?

@c42f
Copy link
Contributor Author

c42f commented Jan 23, 2020

parseunit can return a Units or a Quantity

Oh right, somehow I missed this! I guess an alternative spelling could be uparse in somewhat-analogy to ustrip and uconvert; it can be a "unit aware" parsing function.

maybe there were clearly articulated reasons why parse(Complex, str) was not implemented

I'm not sure about the reasoning though I think it's unclear what the concrete return type should be for cases like parse(Complex, "1"). Either it defaults to a fixed type like ComplexF64 or it's inferred using some kind of rules. If it's inferred then parse becomes inherently type unstable (as is the case for parseunit).

I would lean towards implementing parse(Quantity, str) and parse(Units, str)

Yes I think I may agree with this. Units is not exported which makes the latter somewhat inconvenient spelling. Should we export it?

Finally, would you imagine implementing three-arg versions, Base.parse(::Module, Quantity, str) or Base.parse(::Module, Units, str), similar to the two-arg version of parseunit?

I think the output type should always come first (that's the Base.parse convention), so it would be best to have Base.parse(Quantity, str) for consistency. That way generic code can still arguably use parse(T, str) with T being a Quantity or not. Then we need to decide whether the unit module context is passed as a keyword or not. There's some precedent for that; Base supports parse(Int, str, base=16), so I think we could have parse(Quantity, str, reference_units=[Unitful, ...]) or some such thing.

@ajkeller34
Copy link
Collaborator

Just some brief feedback:

  • 👍 to a keyword argument for parse. Perhaps unitful_modules?
  • 👍 to uparse, nice name.
  • I was sort of wavering on parse(::Type{Quantity}, str). Providing just one type-unstable way to parse Quantitys (uparse) seems clean and this way parse wouldn't have to stray too far from Base behavior, in that Quantity is not a concrete type. Ultimately though, I can see the need for a function like uparse that would complain if unexpected input were encountered. For example you could imagine trying to parse a Quantity and winding up with Units if the numbers were missing, etc. So I think we should do it.
  • I'm on board with exporting Units.

Let me know how much you want to tackle here. I really appreciate what you've done already. Perhaps we can change parseunit to uparse, and then the parse stuff could go into a separate PR, if you prefer?

@c42f
Copy link
Contributor Author

c42f commented Jan 23, 2020

I had a bit of a look at what it would take to do "something sensible" with Base.parse. I do think there's value in it, but I also think it's a fairly wide design space and it might be a lot more work to explore properly.

For example, what do we do about MixedUnits? I got a surprise when a half baked version of parse(Units, "dB") ended up with an error because dB is not a Units. Is there an appropriate abstract type here which could be used by the naive user?

* Perhaps `unitful_modules`

Right, though it might make sense to support more general "unit contexts" in the future other than fixing on modules? For example it could be nice to support @Datseris's idea of a vector as a collection of allowed units.

* Ultimately though, I can see the need for a function like `uparse` that would complain if unexpected input were encountered.

We could have uparse(Units, str) for that, and uparse(str) as the base version which just tries to figure out what you mean?

Let me know how much you want to tackle here. I really appreciate what you've done already.

Thanks, I appreciate it! Sometimes a simple looking thing turns out to be a deeper rabbit hole than you might expect. For now I'd be fairly happy to simply rename parseunit to uparse and develop further from there. Maybe swapping the order of arguments to look more like parse though might be appropriate.

@c42f
Copy link
Contributor Author

c42f commented Jan 24, 2020

Ok, I've renamed parseunits to uparse and I think that's an improvement.

This might be good to merge now. I did rearrange a bit and added a unit_context keyword argument for potential consistency with an associated version of parse(). Happy to tweak that or swap out for a better name if people can think of one.

@ajkeller34 ajkeller34 merged commit 0b7b443 into PainterQubits:master Jan 25, 2020
@ajkeller34
Copy link
Collaborator

Beautiful, thank you!

@alhirzel
Copy link

HECK YEAH, thank you @c42f!

@c42f c42f deleted the cjf/unit-parsing branch February 3, 2020 06:19
@caseykneale
Copy link

uparse just saved my day!

@pranshumalik14
Copy link

It'll also be useful to have this kind of type comparison: uparse("1cm") <: u"m". Instead, what we get is: ERROR: TypeError: in <:, expected Type, got a value of type Quantity{Int64,𝐋,Unitful.FreeUnits{(cm,),𝐋,nothing}}

@alhirzel
Copy link

alhirzel commented Dec 12, 2020

@pranshumalik14 depending on what you want to do, you may be able to rely on Unitful to make a conversion for you assuming that the result of uparse is a Length. For instance, the following works and returns true:

dimension(uparse("1cm")) == dimension(1u"m")

and the following return approximately 0.01:

ustrip(Float64, u"m", uparse("1cm"))
ustrip(Float64, u"m", uparse("0.03280839895ft"))

@henry2004y
Copy link

Is this described in the document? If not then it would be nice to mention it as it is very useful for other packages that depends on Unitful.jl! I found uparse only through a discourse question link.

@sostock
Copy link
Collaborator

sostock commented Dec 14, 2020

No, uparse is currently not documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

@u_str macro hygiene? Add function for run-time parsing
9 participants