Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV I/O are failing #1049

Closed
msteindorfer opened this issue Jan 27, 2017 · 13 comments
Closed

CSV I/O are failing #1049

msteindorfer opened this issue Jan 27, 2017 · 13 comments
Assignees

Comments

@msteindorfer
Copy link
Member

On commit cba868b , the tests in lang::rascal::tests::library::lang::csv::CSVIOTests are failing.

What it boils down to is that the TypeFactory seems to get a name that it rejects and thus throws and exception (cf. https://github.com/usethesource/rascal-value/blob/master/src/main/java/org/rascalmpl/value/type/TypeFactory.java#L262).

When commenting out lines 262-263 then the test cases play out nicely. Thus my question is: do we have a faulty specification or implementation?

Note, the behaviour occured when working on separate branch (cf. issue #1042). I'm not sure if what triggered the current or if a path now exercised that didn't used to get tested before.

@msteindorfer
Copy link
Member Author

After digging deeper: the method readInferAndBuild from CSV IO (https://github.com/usethesource/rascal/blob/master/src/org/rascalmpl/library/lang/csv/IO.java#L167) is used that further calls normalizeLabel (https://github.com/usethesource/rascal/blob/master/src/org/rascalmpl/library/lang/csv/IO.java#L525).

Normalize label 'escapes' column names by prepending a backslash. The 'escaped' column names are later used to create a tuple, which fails the valid identifier check.

I would be glad if any of you could have a look. Based from my current finding it seems a CSV serialization problem that should be independent from the new binary relation encoding. The question is, why was the same fault not triggered before?

@DavyLandman
Copy link
Member

on master this happens:

rascal>writeFile(|file:///tmp/test.csv|, "int,str\n1,2\n")
ok
rascal>getCSVType(|file:///tmp/test.csv|)
readCSV inferred the relation type: rel[int \int,int \str]
type[value]: type(
  set(tuple([
        label(
          "\\int",
          int()),
        label(
          "\\str",
          int())
      ])),
  ())

@DavyLandman
Copy link
Member

perhaps you should compare the pom.xml file with origin/master?

for example:

                <dependency>
                        <groupId>org.rascalmpl</groupId>
                        <artifactId>value</artifactId>
-                       <version>0.6.3-SNAPSHOT</version>
+                       <version>0.5.5-SNAPSHOT</version>
                </dependency>
                        <groupId>com.github.ben-manes.caffeine</groupId>
                        <artifactId>caffeine</artifactId>
-                       <version>2.3.5</version>
+                       <version>1.3.1</version>

and dependencies removed etc. so that should be corrected first. since we are looking at a bigger difference between master than intended.

@msteindorfer
Copy link
Member Author

Update on the issue: the current master branch of accepts any label on a tuple or relation type unchecked by using a certain constructor that does no enforce the isIdentifier check (https://github.com/usethesource/rascal-value/blob/master/src/main/java/org/rascalmpl/value/type/TypeFactory.java#L283).

This has the effects that certain type construction snippets fail ...

final Type relationType = getTypeFactory().relType(keyType, keyLabel, valType, valLabel);

... and others succeed ...

final Type tupleType = getTypeFactory().tupleType(new Type[] {keyType, valType}, new String[] {keyLabel, valLabel});

final Type relationType = getTypeFactory().relTypeFromTuple(tupleType);

@jurgenvinju
Copy link
Member

good find. since the new reification implementation we use another factory method which does not do the additional check. the check should be done indeed.

it seems that the bug you are looking at is independent from your contribution/merge, but some code was reverted back due to your merge?

@msteindorfer
Copy link
Member Author

No code was reverted back, but the binary relation data type lazily calculates the precise dynamic type according to the snippets presented above.

While debugging the code I was also surprised again by the way dynamic types and (static) labels on tuples, relations and maps interact, which is another potential source of problems and should be discussed separately.

@jurgenvinju
Copy link
Member

ah yes, now I understand. The lub of tuple type loses labels if they are not exactly the same for both types.

No discussion necessary: the labels should disappear completely from dynamic types since they break structural equality and canonical representation. Labels for relations and maps should become a strict static compiler feature of Rascal and not of vallang/rascal-values. That can be done just before or after we drop the interpreter which depends on the dynamic labels.

@msteindorfer
Copy link
Member Author

Labels were not lost in this case. However they need a lot of special attention to be propagated. Thus I agree with your comment above.

@jurgenvinju
Copy link
Member

You don't use the Type.lub method then? and do we now know a fix for the problem?

@jurgenvinju
Copy link
Member

and I think in this case we will do so much lub that it needs a short-cut if (this == other) return this;

@msteindorfer
Copy link
Member Author

msteindorfer commented Jan 31, 2017

The former: yes I do use Type.lub.
The latter: yes, I know how to fix it. It's a matter of specifying and enforcing what are valid labels and what not. See issue #1050.

That the CSV tests currently fail is legit, because they uncover a real issue that we need to fix.

The difference really is that the first following snippet internally checks validity (calling isIdentifier) for labels and fails (because labels are prefixed with '' originating from csv/IO.java):

final Type relationType = getTypeFactory().relType(keyType, keyLabel, valType, valLabel);

Whereas the second snippet works because it doesn't check validity of labels (no invocation of isIdentifier:

final Type tupleType = getTypeFactory().tupleType(new Type[] {keyType, valType}, new String[] {keyLabel, valLabel});

final Type relationType = getTypeFactory().relTypeFromTuple(tupleType);

To summarize: a) either the prefixing of column names with '\' is wrong (while reading in CSV files), or b) we have reflect that we do allow labels starting with '\' in the validity check (see isIdentifier).

@jurgenvinju
Copy link
Member

ok great. it's (a), something is wrong there.

msteindorfer added a commit that referenced this issue Feb 1, 2017
@msteindorfer
Copy link
Member Author

Quoting @jurgenvinju:

the prefixing can be ignored because there is no clash with Rascal parameters if the code that uses the labels is not Rascal. The prefixing is strictly a Rascal syntax issue on the surface, there exist no labels internally which start with a \.

DavyLandman added a commit that referenced this issue Feb 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants