EBNFUnitFormat won't accept the letter "e" as a unit alias. #250

wnreynoldsWork · 2019-08-21T23:31:35Z

The following code:

`import java.io.PrintStream;
import java.io.UnsupportedEncodingException;
import java.util.Locale;
import java.util.ResourceBundle;

import tech.units.indriya.format.EBNFUnitFormat;
import tech.units.indriya.format.SymbolMap;
import tech.units.indriya.unit.Units;

/**

Simple code to demonstrate use of aliases with the EBNFUnitFormat
*/
public class TestEBNF {

private static final String BUNDLE_NAME = "tech.units.indriya.format.messages"; //$NON-NLS-1$

/**
- @param args the command line arguments
  */
  public static void main(String[] args) throws UnsupportedEncodingException {
  // TODO code application logic here
  SymbolMap map = SymbolMap.of(ResourceBundle.getBundle(BUNDLE_NAME, Locale.ROOT));
  map.alias(Units.COULOMB.multiply(1.60217662e-19), "e");
  
  EBNFUnitFormat unitParser = EBNFUnitFormat.getInstance(map);
  
  PrintStream utfOut = new PrintStream(System.out, true, "UTF-8");
  utfOut.println("Dimension of e: " + unitParser.parse("e").getDimension());
}

}`

Fails with the following error:

`Exception in thread "main" javax.measure.format.MeasurementParseException: tech.units.indriya.internal.format.TokenException: Encountered " "e" "e "" at line 1, column 1.
Was expecting one of:
"(" ...
...
<FLOATING_POINT> ...
...
<NAT_LOG> ...
<UNIT_IDENTIFIER> ...
...
<FLOATING_POINT> ...
...

at tech.units.indriya.format.EBNFUnitFormat.parse(EBNFUnitFormat.java:272)
at tech.units.indriya.format.EBNFUnitFormat.parse(EBNFUnitFormat.java:281)
at tech.units.indriya.format.EBNFUnitFormat.parse(EBNFUnitFormat.java:286)
at com.stellarscience.unitparsertest.TestEBNF.main(TestEBNF.java:35)`

The grammar for the EBNF format indicates that a single initial character is allowed, and that all letters are allowed initial characters.

The text was updated successfully, but these errors were encountered:

keilw · 2019-08-22T10:54:21Z

I think "e" is a special case because it also exists in the textual representation of numbers like "1.60217662e-19". If you have a unit called "e" the "1.60217662e" would make it difficult to tell apart whether it is a number or a quantity.

wnreynoldsWork · 2019-08-22T16:18:54Z

I don't think it's that hard, given that you seem to be using a JavaCC parser. I've implemented a number of parsers that handle floating point numbers and variables named 'e' or 'E' using ANTLR. I can send a simple example if that would help. If you could make your JavaCC grammar (I couldn't find it in the repo), I'm pretty sure I could provide an example.

wnreynoldsWork · 2019-08-22T16:59:59Z

Here's a fragment of an ANTLR grammar that will parse strings, including 'e' into a WORD token as well as full floating point numbers with exponential notation. Again, if you could provide the JavaCC grammar, I think I could add something like this so that you could distinguish 'e' in a floating point number from 'e' as a variable name.

`
// Number parsing
INTEGER: PLUS_OR_MINUS? DIGITS;

REAL
: (INTEGER POINT? DIGITS? (E INTEGER)) | (INTEGER POINT) | (INTEGER? POINT DIGITS)
;

E
: 'e' | 'E'
;

PLUS_OR_MINUS
: (PLUS|MINUS)
;

PLUS
: '+'
;

MINUS
: '-'
;

POINT
: '.'
;

WORD: LETTER (LETTER|DIGIT|'_'|'-'|'.'|'/'|'\')*;
LETTER: [a-zA-Z];
DIGITS: DIGIT+;
DIGIT: [0-9];
`

keilw · 2019-08-22T17:57:13Z

We haven't been using this for a long time, EBNFUnitParser was maintained manually. Do you think we could use the example mentioned here with something like https://github.com/phax/ph-javacc-maven-plugin?

wnreynoldsWork · 2019-08-22T17:59:08Z

I've never used it, but I don't see why not. We used the maven antlr plugin as well as some custom ant tasks to build the parsers in our project.

keilw · 2019-08-22T18:05:18Z

You think Ant is necessary? It may not be a problem either because at least in https://github.com/unitsofmeasurement/unit-api we already use it for the Java Mondule system (there could be newer Maven plugins for that soon)

wnreynoldsWork · 2019-08-22T18:09:18Z

I don't know if ant is necessary. Our application did some pretty involved stuff (we used antlr to parse a proprietary grammar format, converting the proprietary grammar to a an antlr grammar and then built a parser from the generated using antlr, so we had to create some custom ant tasks).

Just running JavaCC (or ANTLR) once to create parser code for the application should be fairly straightforward (with the caveat that I have not used JavaCC myself).

keilw · 2019-08-22T18:10:30Z

This could be used as a starting point btw, it was not directly used in JSR 275 either but the codebase still exists: https://github.com/unitsofmeasurement/jsr-275/blob/master/src/main/javacc/javax/measure/unit/format/UnitParser.jj

A generated output should more or less act as drop-in replacement for UnitFormatParser (which still shows traces of JavaCC, but it had long been manually updated)

wnreynoldsWork · 2019-08-22T18:12:24Z

Bingo. I'll have a look.

keilw · 2019-08-22T18:14:53Z

Thanks, btw, for anything beyond code snippets and exchanging issue tickets, because Indriya is the RI of JSR 385, to raise actual PRs you may have to join the JCP at least as Associate Member, or are you already? See https://github.com/unitsofmeasurement/unit-api/wiki/Contribution-Guide

wnreynoldsWork · 2019-08-22T18:21:35Z

I am not, I'm just some random guy on the internet. I'll have a look.

wnreynoldsWork · 2019-08-22T20:15:53Z

Still reviewing this grammar (and learning JavaCC), but it seems the problem is the grammar supports log(unit), ln(unit) and e^unit. While the parser accepts these constructs, it does not seem to process them correctly (they return incorrect dimensions).

`import java.io.PrintStream;
import java.io.UnsupportedEncodingException;
import java.util.Locale;
import java.util.ResourceBundle;

import tech.units.indriya.format.EBNFUnitFormat;
import tech.units.indriya.format.SymbolMap;

/**

Simple code to demonstrate use of aliases with the EBNFUnitFormat
*/
public class TestEBNF {

private static final String BUNDLE_NAME = "tech.units.indriya.format.messages"; //$NON-NLS-1$

/**
- @param args the command line arguments
  */
  public static void main(String[] args) throws UnsupportedEncodingException {
  // TODO code application logic here
  SymbolMap map = SymbolMap.of(ResourceBundle.getBundle(BUNDLE_NAME, Locale.ROOT));
  
  EBNFUnitFormat unitParser = EBNFUnitFormat.getInstance(map);
  
  PrintStream utfOut = new PrintStream(System.out, true, "UTF-8");
  utfOut.println("Unit of lnW: " + unitParser.parse("log(W)"));
  utfOut.println("Dimension of lnW: " + unitParser.parse("log(W)").getDimension());
  utfOut.println("Unit of e^W: " + unitParser.parse("e^W"));
  utfOut.println("Dimension of e^W: " + unitParser.parse("e^W").getDimension());
  utfOut.println("Unit of W: " + unitParser.parse("W"));
  utfOut.println("Dimension of W: " + unitParser.parse("W").getDimension());
}

}
`

The output of this test code is:

Unit of lnW: [W?] Dimension of lnW: [L]²·[M]/[T]³ Unit of e^W: [W?] Dimension of e^W: [L]²·[M]/[T]³ Unit of W: W Dimension of W: [L]²·[M]/[T]³

I don't know how to interpret [W?] (this is printing out in unicode), but the dimensions are certainly incorrect. Is supporting these transcendental units part of the spec? Would it be possible to eliminate ln, log and e^ from the unit parser?

wnreynoldsWork · 2019-08-22T21:13:21Z

Still reviewing this.

The problem is that defining an alias 'e' conflicts with the reserved character 'e' used for Euler's number. The parser allows constructs like 'e^identifier'. With a unit aliased 'e', the parser doesn't know what to do, and throws an error. BTW, is not listed as a reserved character in the javadoc for EBNFUnitFormat ("log", "log" and "ln" are).

So how to solve?

(above) get rid of log, ln and e as reserved words. Update javadoc.
Instead of using 'e' as a reserved character and looking for constructs like e^thing, replace with exp(thing). The sequence 'exp' becomes reserved, but this is less problematic, imho. Fix parser to support exp, log, ln, update javadoc.
Leave unchanged, update javadoc, fix parser to support e^, log, ln.

I'll try to get some small demo parsers of each case along with a maven build script. Note that I will not try to correct the rest of the codebase to support transcendentals of units.

I've asked the bosses about whether I can sign the stuff necessary to be a JSR member, will keep you posted. If I can't, I'll just send the demo code directly to Werner instead of doing a PR.

wnreynoldsWork · 2019-08-24T21:42:33Z

Ok, I've prepared a repo that demonstrates how to build JavaCC UnitParser grammars using Maven. I've provided several simplified grammars that eliminate features from EBNF like exponents, logarithms, sums and the mix construct. Candidly, I don't quite understand how these constructs work in a unit parsing context - see my comments above. At the very least, they should be documented with use cases - I didn't see anything describing them in the JSR document at https://download.oracle.com/otndocs/jcp/units-2_0-pr-spec/index.html.

This repo also provides a simple demo of setting up aliases using the EBNF parser. See the README.

https://github.com/wnreynoldsWork/EBNFParserDemos

keilw · 2020-09-27T18:00:20Z

EBNFUnitFormat is an implementation detail, so the parsing details would not have a place in the Specification of the Unit API, if anywhere then a future version of https://github.com/unitsofmeasurement/uom-guide.
We might include a generated grammar in a version beyond MR1, but I guess for 2.1 it is a bit risky.

wnreynoldsWork mentioned this issue Aug 21, 2019

Surprising parse behavior of quotient units. #249

Closed

keilw added the analysis label Aug 22, 2019

keilw changed the title ~~EBNFUnitParser won't accept the letter "e" as a unit alias.~~ EBNFUnitFormat won't accept the letter "e" as a unit alias. Sep 27, 2020

keilw added deferred format labels Sep 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EBNFUnitFormat won't accept the letter "e" as a unit alias. #250

EBNFUnitFormat won't accept the letter "e" as a unit alias. #250

wnreynoldsWork commented Aug 21, 2019

keilw commented Aug 22, 2019

wnreynoldsWork commented Aug 22, 2019

wnreynoldsWork commented Aug 22, 2019

keilw commented Aug 22, 2019

wnreynoldsWork commented Aug 22, 2019

keilw commented Aug 22, 2019

wnreynoldsWork commented Aug 22, 2019

keilw commented Aug 22, 2019 •

edited

Loading

wnreynoldsWork commented Aug 22, 2019

keilw commented Aug 22, 2019 •

edited

Loading

wnreynoldsWork commented Aug 22, 2019

wnreynoldsWork commented Aug 22, 2019

wnreynoldsWork commented Aug 22, 2019

wnreynoldsWork commented Aug 24, 2019

keilw commented Sep 27, 2020

EBNFUnitFormat won't accept the letter "e" as a unit alias. #250

EBNFUnitFormat won't accept the letter "e" as a unit alias. #250

Comments

wnreynoldsWork commented Aug 21, 2019

keilw commented Aug 22, 2019

wnreynoldsWork commented Aug 22, 2019

wnreynoldsWork commented Aug 22, 2019

keilw commented Aug 22, 2019

wnreynoldsWork commented Aug 22, 2019

keilw commented Aug 22, 2019

wnreynoldsWork commented Aug 22, 2019

keilw commented Aug 22, 2019 • edited Loading

wnreynoldsWork commented Aug 22, 2019

keilw commented Aug 22, 2019 • edited Loading

wnreynoldsWork commented Aug 22, 2019

wnreynoldsWork commented Aug 22, 2019

wnreynoldsWork commented Aug 22, 2019

wnreynoldsWork commented Aug 24, 2019

keilw commented Sep 27, 2020

keilw commented Aug 22, 2019 •

edited

Loading

keilw commented Aug 22, 2019 •

edited

Loading