CSV I/O are failing #1049

msteindorfer · 2017-01-27T17:46:50Z

On commit cba868b , the tests in lang::rascal::tests::library::lang::csv::CSVIOTests are failing.

What it boils down to is that the TypeFactory seems to get a name that it rejects and thus throws and exception (cf. https://github.com/usethesource/rascal-value/blob/master/src/main/java/org/rascalmpl/value/type/TypeFactory.java#L262).

When commenting out lines 262-263 then the test cases play out nicely. Thus my question is: do we have a faulty specification or implementation?

Note, the behaviour occured when working on separate branch (cf. issue #1042). I'm not sure if what triggered the current or if a path now exercised that didn't used to get tested before.

The text was updated successfully, but these errors were encountered:

msteindorfer · 2017-01-30T11:01:09Z

After digging deeper: the method readInferAndBuild from CSV IO (https://github.com/usethesource/rascal/blob/master/src/org/rascalmpl/library/lang/csv/IO.java#L167) is used that further calls normalizeLabel (https://github.com/usethesource/rascal/blob/master/src/org/rascalmpl/library/lang/csv/IO.java#L525).

Normalize label 'escapes' column names by prepending a backslash. The 'escaped' column names are later used to create a tuple, which fails the valid identifier check.

I would be glad if any of you could have a look. Based from my current finding it seems a CSV serialization problem that should be independent from the new binary relation encoding. The question is, why was the same fault not triggered before?

DavyLandman · 2017-01-30T15:37:16Z

on master this happens:

rascal>writeFile(|file:///tmp/test.csv|, "int,str\n1,2\n")
ok
rascal>getCSVType(|file:///tmp/test.csv|)
readCSV inferred the relation type: rel[int \int,int \str]
type[value]: type(
  set(tuple([
        label(
          "\\int",
          int()),
        label(
          "\\str",
          int())
      ])),
  ())

DavyLandman · 2017-01-30T15:42:56Z

perhaps you should compare the pom.xml file with origin/master?

for example:

                <dependency>
                        <groupId>org.rascalmpl</groupId>
                        <artifactId>value</artifactId>
-                       <version>0.6.3-SNAPSHOT</version>
+                       <version>0.5.5-SNAPSHOT</version>
                </dependency>

                        <groupId>com.github.ben-manes.caffeine</groupId>
                        <artifactId>caffeine</artifactId>
-                       <version>2.3.5</version>
+                       <version>1.3.1</version>

and dependencies removed etc. so that should be corrected first. since we are looking at a bigger difference between master than intended.

msteindorfer · 2017-01-31T10:07:55Z

Update on the issue: the current master branch of accepts any label on a tuple or relation type unchecked by using a certain constructor that does no enforce the isIdentifier check (https://github.com/usethesource/rascal-value/blob/master/src/main/java/org/rascalmpl/value/type/TypeFactory.java#L283).

This has the effects that certain type construction snippets fail ...

final Type relationType = getTypeFactory().relType(keyType, keyLabel, valType, valLabel);

... and others succeed ...

final Type tupleType = getTypeFactory().tupleType(new Type[] {keyType, valType}, new String[] {keyLabel, valLabel});

final Type relationType = getTypeFactory().relTypeFromTuple(tupleType);

jurgenvinju · 2017-01-31T10:52:48Z

good find. since the new reification implementation we use another factory method which does not do the additional check. the check should be done indeed.

it seems that the bug you are looking at is independent from your contribution/merge, but some code was reverted back due to your merge?

msteindorfer · 2017-01-31T11:00:44Z

No code was reverted back, but the binary relation data type lazily calculates the precise dynamic type according to the snippets presented above.

While debugging the code I was also surprised again by the way dynamic types and (static) labels on tuples, relations and maps interact, which is another potential source of problems and should be discussed separately.

jurgenvinju · 2017-01-31T12:45:06Z

ah yes, now I understand. The lub of tuple type loses labels if they are not exactly the same for both types.

No discussion necessary: the labels should disappear completely from dynamic types since they break structural equality and canonical representation. Labels for relations and maps should become a strict static compiler feature of Rascal and not of vallang/rascal-values. That can be done just before or after we drop the interpreter which depends on the dynamic labels.

msteindorfer · 2017-01-31T12:59:56Z

Labels were not lost in this case. However they need a lot of special attention to be propagated. Thus I agree with your comment above.

jurgenvinju · 2017-01-31T13:01:59Z

You don't use the Type.lub method then? and do we now know a fix for the problem?

jurgenvinju · 2017-01-31T13:03:59Z

and I think in this case we will do so much lub that it needs a short-cut if (this == other) return this;

msteindorfer · 2017-01-31T13:14:33Z

The former: yes I do use Type.lub.
The latter: yes, I know how to fix it. It's a matter of specifying and enforcing what are valid labels and what not. See issue #1050.

That the CSV tests currently fail is legit, because they uncover a real issue that we need to fix.

The difference really is that the first following snippet internally checks validity (calling isIdentifier) for labels and fails (because labels are prefixed with '' originating from csv/IO.java):

final Type relationType = getTypeFactory().relType(keyType, keyLabel, valType, valLabel);

Whereas the second snippet works because it doesn't check validity of labels (no invocation of isIdentifier:

final Type tupleType = getTypeFactory().tupleType(new Type[] {keyType, valType}, new String[] {keyLabel, valLabel});

final Type relationType = getTypeFactory().relTypeFromTuple(tupleType);

To summarize: a) either the prefixing of column names with '\' is wrong (while reading in CSV files), or b) we have reflect that we do allow labels starting with '\' in the validity check (see isIdentifier).

jurgenvinju · 2017-01-31T14:33:58Z

ok great. it's (a), something is wrong there.

Relates to issues #1042 and #1049.

msteindorfer · 2017-02-01T13:02:00Z

Quoting @jurgenvinju:

the prefixing can be ignored because there is no clash with Rascal parameters if the code that uses the labels is not Rascal. The prefixing is strictly a Rascal syntax issue on the surface, there exist no labels internally which start with a \.

msteindorfer added bug question labels Jan 27, 2017

DavyLandman assigned jurgenvinju Jan 27, 2017

msteindorfer mentioned this issue Jan 31, 2017

Specification for valid identifiers of labeled tuples, relations, and maps #1050

Closed

msteindorfer added a commit that referenced this issue Feb 1, 2017

Merge branch 'trie-relations'.

be769a1

Relates to issues #1042 and #1049.

msteindorfer closed this as completed in e199c76 Feb 1, 2017

DavyLandman added a commit that referenced this issue Feb 2, 2017

Fixes #1049 for the compiler clone

9aaa69d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSV I/O are failing #1049

CSV I/O are failing #1049

msteindorfer commented Jan 27, 2017

msteindorfer commented Jan 30, 2017

DavyLandman commented Jan 30, 2017

DavyLandman commented Jan 30, 2017

msteindorfer commented Jan 31, 2017

jurgenvinju commented Jan 31, 2017

msteindorfer commented Jan 31, 2017

jurgenvinju commented Jan 31, 2017

msteindorfer commented Jan 31, 2017

jurgenvinju commented Jan 31, 2017

jurgenvinju commented Jan 31, 2017

msteindorfer commented Jan 31, 2017 •

edited

Loading

jurgenvinju commented Jan 31, 2017

msteindorfer commented Feb 1, 2017

CSV I/O are failing #1049

CSV I/O are failing #1049

Comments

msteindorfer commented Jan 27, 2017

msteindorfer commented Jan 30, 2017

DavyLandman commented Jan 30, 2017

DavyLandman commented Jan 30, 2017

msteindorfer commented Jan 31, 2017

jurgenvinju commented Jan 31, 2017

msteindorfer commented Jan 31, 2017

jurgenvinju commented Jan 31, 2017

msteindorfer commented Jan 31, 2017

jurgenvinju commented Jan 31, 2017

jurgenvinju commented Jan 31, 2017

msteindorfer commented Jan 31, 2017 • edited Loading

jurgenvinju commented Jan 31, 2017

msteindorfer commented Feb 1, 2017

msteindorfer commented Jan 31, 2017 •

edited

Loading