Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EBNFUnitFormat won't accept the letter "e" as a unit alias. #250

Open
wnreynoldsWork opened this issue Aug 21, 2019 · 15 comments
Open

EBNFUnitFormat won't accept the letter "e" as a unit alias. #250

wnreynoldsWork opened this issue Aug 21, 2019 · 15 comments

Comments

@wnreynoldsWork
Copy link

The following code:

`import java.io.PrintStream;
import java.io.UnsupportedEncodingException;
import java.util.Locale;
import java.util.ResourceBundle;

import tech.units.indriya.format.EBNFUnitFormat;
import tech.units.indriya.format.SymbolMap;
import tech.units.indriya.unit.Units;

/**

  • Simple code to demonstrate use of aliases with the EBNFUnitFormat
    */
    public class TestEBNF {

    private static final String BUNDLE_NAME = "tech.units.indriya.format.messages"; //$NON-NLS-1$

    /**

    • @param args the command line arguments
      */
      public static void main(String[] args) throws UnsupportedEncodingException {
      // TODO code application logic here
      SymbolMap map = SymbolMap.of(ResourceBundle.getBundle(BUNDLE_NAME, Locale.ROOT));
      map.alias(Units.COULOMB.multiply(1.60217662e-19), "e");

      EBNFUnitFormat unitParser = EBNFUnitFormat.getInstance(map);

      PrintStream utfOut = new PrintStream(System.out, true, "UTF-8");
      utfOut.println("Dimension of e: " + unitParser.parse("e").getDimension());

    }

}`

Fails with the following error:

`Exception in thread "main" javax.measure.format.MeasurementParseException: tech.units.indriya.internal.format.TokenException: Encountered " "e" "e "" at line 1, column 1.
Was expecting one of:
"(" ...
...
<FLOATING_POINT> ...
...
<NAT_LOG> ...
<UNIT_IDENTIFIER> ...
...
<FLOATING_POINT> ...
...

at tech.units.indriya.format.EBNFUnitFormat.parse(EBNFUnitFormat.java:272)
at tech.units.indriya.format.EBNFUnitFormat.parse(EBNFUnitFormat.java:281)
at tech.units.indriya.format.EBNFUnitFormat.parse(EBNFUnitFormat.java:286)
at com.stellarscience.unitparsertest.TestEBNF.main(TestEBNF.java:35)`

The grammar for the EBNF format indicates that a single initial character is allowed, and that all letters are allowed initial characters.

@keilw
Copy link
Member

keilw commented Aug 22, 2019

I think "e" is a special case because it also exists in the textual representation of numbers like "1.60217662e-19". If you have a unit called "e" the "1.60217662e" would make it difficult to tell apart whether it is a number or a quantity.

@keilw keilw added the analysis label Aug 22, 2019
@wnreynoldsWork
Copy link
Author

I don't think it's that hard, given that you seem to be using a JavaCC parser. I've implemented a number of parsers that handle floating point numbers and variables named 'e' or 'E' using ANTLR. I can send a simple example if that would help. If you could make your JavaCC grammar (I couldn't find it in the repo), I'm pretty sure I could provide an example.

@wnreynoldsWork
Copy link
Author

Here's a fragment of an ANTLR grammar that will parse strings, including 'e' into a WORD token as well as full floating point numbers with exponential notation. Again, if you could provide the JavaCC grammar, I think I could add something like this so that you could distinguish 'e' in a floating point number from 'e' as a variable name.

`
// Number parsing
INTEGER: PLUS_OR_MINUS? DIGITS;

REAL
: (INTEGER POINT? DIGITS? (E INTEGER)) | (INTEGER POINT) | (INTEGER? POINT DIGITS)
;

E
: 'e' | 'E'
;

PLUS_OR_MINUS
: (PLUS|MINUS)
;

PLUS
: '+'
;

MINUS
: '-'
;

POINT
: '.'
;

WORD: LETTER (LETTER|DIGIT|'_'|'-'|'.'|'/'|'\')*;
LETTER: [a-zA-Z];
DIGITS: DIGIT+;
DIGIT: [0-9];
`

@keilw
Copy link
Member

keilw commented Aug 22, 2019

We haven't been using this for a long time, EBNFUnitParser was maintained manually. Do you think we could use the example mentioned here with something like https://github.com/phax/ph-javacc-maven-plugin?

@wnreynoldsWork
Copy link
Author

I've never used it, but I don't see why not. We used the maven antlr plugin as well as some custom ant tasks to build the parsers in our project.

@keilw
Copy link
Member

keilw commented Aug 22, 2019

You think Ant is necessary? It may not be a problem either because at least in https://github.com/unitsofmeasurement/unit-api we already use it for the Java Mondule system (there could be newer Maven plugins for that soon)

@wnreynoldsWork
Copy link
Author

I don't know if ant is necessary. Our application did some pretty involved stuff (we used antlr to parse a proprietary grammar format, converting the proprietary grammar to a an antlr grammar and then built a parser from the generated using antlr, so we had to create some custom ant tasks).

Just running JavaCC (or ANTLR) once to create parser code for the application should be fairly straightforward (with the caveat that I have not used JavaCC myself).

@keilw
Copy link
Member

keilw commented Aug 22, 2019

This could be used as a starting point btw, it was not directly used in JSR 275 either but the codebase still exists: https://github.com/unitsofmeasurement/jsr-275/blob/master/src/main/javacc/javax/measure/unit/format/UnitParser.jj

A generated output should more or less act as drop-in replacement for UnitFormatParser (which still shows traces of JavaCC, but it had long been manually updated)

@wnreynoldsWork
Copy link
Author

Bingo. I'll have a look.

@keilw
Copy link
Member

keilw commented Aug 22, 2019

Thanks, btw, for anything beyond code snippets and exchanging issue tickets, because Indriya is the RI of JSR 385, to raise actual PRs you may have to join the JCP at least as Associate Member, or are you already? See https://github.com/unitsofmeasurement/unit-api/wiki/Contribution-Guide

@wnreynoldsWork
Copy link
Author

I am not, I'm just some random guy on the internet. I'll have a look.

@wnreynoldsWork
Copy link
Author

Still reviewing this grammar (and learning JavaCC), but it seems the problem is the grammar supports log(unit), ln(unit) and e^unit. While the parser accepts these constructs, it does not seem to process them correctly (they return incorrect dimensions).

`import java.io.PrintStream;
import java.io.UnsupportedEncodingException;
import java.util.Locale;
import java.util.ResourceBundle;

import tech.units.indriya.format.EBNFUnitFormat;
import tech.units.indriya.format.SymbolMap;

/**

  • Simple code to demonstrate use of aliases with the EBNFUnitFormat
    */
    public class TestEBNF {

    private static final String BUNDLE_NAME = "tech.units.indriya.format.messages"; //$NON-NLS-1$

    /**

    • @param args the command line arguments
      */
      public static void main(String[] args) throws UnsupportedEncodingException {
      // TODO code application logic here
      SymbolMap map = SymbolMap.of(ResourceBundle.getBundle(BUNDLE_NAME, Locale.ROOT));

      EBNFUnitFormat unitParser = EBNFUnitFormat.getInstance(map);

      PrintStream utfOut = new PrintStream(System.out, true, "UTF-8");
      utfOut.println("Unit of lnW: " + unitParser.parse("log(W)"));
      utfOut.println("Dimension of lnW: " + unitParser.parse("log(W)").getDimension());
      utfOut.println("Unit of e^W: " + unitParser.parse("e^W"));
      utfOut.println("Dimension of e^W: " + unitParser.parse("e^W").getDimension());
      utfOut.println("Unit of W: " + unitParser.parse("W"));
      utfOut.println("Dimension of W: " + unitParser.parse("W").getDimension());

    }

}
`

The output of this test code is:

Unit of lnW: [W?] Dimension of lnW: [L]²·[M]/[T]³ Unit of e^W: [W?] Dimension of e^W: [L]²·[M]/[T]³ Unit of W: W Dimension of W: [L]²·[M]/[T]³

I don't know how to interpret [W?] (this is printing out in unicode), but the dimensions are certainly incorrect. Is supporting these transcendental units part of the spec? Would it be possible to eliminate ln, log and e^ from the unit parser?

@wnreynoldsWork
Copy link
Author

Still reviewing this.

The problem is that defining an alias 'e' conflicts with the reserved character 'e' used for Euler's number. The parser allows constructs like 'e^identifier'. With a unit aliased 'e', the parser doesn't know what to do, and throws an error. BTW, is not listed as a reserved character in the javadoc for EBNFUnitFormat ("log", "log" and "ln" are).

So how to solve?

  1. (above) get rid of log, ln and e as reserved words. Update javadoc.
  2. Instead of using 'e' as a reserved character and looking for constructs like e^thing, replace with exp(thing). The sequence 'exp' becomes reserved, but this is less problematic, imho. Fix parser to support exp, log, ln, update javadoc.
  3. Leave unchanged, update javadoc, fix parser to support e^, log, ln.

I'll try to get some small demo parsers of each case along with a maven build script. Note that I will not try to correct the rest of the codebase to support transcendentals of units.

I've asked the bosses about whether I can sign the stuff necessary to be a JSR member, will keep you posted. If I can't, I'll just send the demo code directly to Werner instead of doing a PR.

@wnreynoldsWork
Copy link
Author

Ok, I've prepared a repo that demonstrates how to build JavaCC UnitParser grammars using Maven. I've provided several simplified grammars that eliminate features from EBNF like exponents, logarithms, sums and the mix construct. Candidly, I don't quite understand how these constructs work in a unit parsing context - see my comments above. At the very least, they should be documented with use cases - I didn't see anything describing them in the JSR document at https://download.oracle.com/otndocs/jcp/units-2_0-pr-spec/index.html.

This repo also provides a simple demo of setting up aliases using the EBNF parser. See the README.

https://github.com/wnreynoldsWork/EBNFParserDemos

@keilw keilw changed the title EBNFUnitParser won't accept the letter "e" as a unit alias. EBNFUnitFormat won't accept the letter "e" as a unit alias. Sep 27, 2020
@keilw
Copy link
Member

keilw commented Sep 27, 2020

EBNFUnitFormat is an implementation detail, so the parsing details would not have a place in the Specification of the Unit API, if anywhere then a future version of https://github.com/unitsofmeasurement/uom-guide.
We might include a generated grammar in a version beyond MR1, but I guess for 2.1 it is a bit risky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants