Fix passing str variable to readFrame/readMatrix #598

MarcusParadies · 2023-09-04T08:44:47Z

Fixes #582

I have also added some end2end tests, but don't really know where to put them. Any suggestions?

corepointer

Clever move, adding type inference to do the type inference ;-) I like it. Concerning the test cases - I don't see any in the PR yet. You could put them to the read kernel tests, to the function call tests or to the type inference tests? Since you're mainly fixing the read call I'd say it would be ok to put it to the first option but I don't really have a strong opinion about it.

corepointer · 2023-09-04T10:18:06Z

src/ir/daphneir/DaphneInferTypesOpInterface.cpp

+        // If an individual value type was specified per column
+        // (fmd.isSingleValueType == false), then this silently uses the
+        // type of the first column.


This comment about isSingleValueType is not correct and can be removed.

Just moved it over, but I can also remove it alltogether.

The comment referred to reading a matrix from a file that has a schema defined. So it is not utterly wrong. You could move it a few lines up to the code that handles reading matrices.

corepointer · 2023-09-04T10:20:04Z

src/compiler/lowering/SpecializeGenericFunctionsPass.cpp

@@ -188,6 +188,7 @@ namespace {
        std::multimap<std::string, func::FuncOp> specializedVersions;
        std::set<func::FuncOp> visited;
        std::set<func::FuncOp> called;
+        std::set<func::FuncOp> templateFunctions;


This extra set for templateFunctions (and its unconditional use in line 299) is not clear to me right away (I did not test-run the code, maybe it would become clear then). Could you explain why this is needed?

Initially, I thought the check isFunctionTemplate would already do the trick, but apparantely it does something else. I basically remove the "template function", which is specialized from considerations in later passes (i.e., doing a type inference on such a function doesn't make much sense imho). Template functions are those, which are specialized, that's why I add them unconditionally. According to the test suite this should be fine, but I feel this will require more thorough testing (independent from the reported bug).

In general there are quite a few question marks in this pass. For example the constant parameter propagation leads to X functions with identical code (except for the constant parameter of course) when invoking the UDF X times.

I've played around with the code a bit now and see what you mean. The generation of a separate function for each invocation (even if it is the same parameter) is a bit weird and could be improved.

corepointer · 2023-09-04T10:23:00Z

src/compiler/lowering/SpecializeGenericFunctionsPass.cpp

@@ -408,7 +410,7 @@ void SpecializeGenericFunctionsPass::runOnOperation() {
        entryFunctions.push_back(entry.second);
    }
    for(const auto &function : entryFunctions) {
-        if(isFunctionTemplate(function) || visited.count(function))
+        if(isFunctionTemplate(function) || visited.count(function) || templateFunctions.count(function))


Yes, the invocations of count() were there before, so doing it the same is quite alright and tbh I'm unsure if using empty() is just a matter of taste or if it would yield a (minor?) performance benefit.

count has logarithmic complexity whereas empty has constant complexity, so there should be a slight advantage here. In general, I tried to change as little as possible although the entire pass probably needs a bigger rewrite anyways.

The count actually makes sense here, as we're checking for that specific function in the set and not if there are any functions at all.

MarcusParadies · 2023-09-05T07:30:27Z

Clever move, adding type inference to do the type inference ;-) I like it. Concerning the test cases - I don't see any in the PR yet. You could put them to the read kernel tests, to the function call tests or to the type inference tests? Since you're mainly fixing the read call I'd say it would be ok to put it to the first option but I don't really have a strong opinion about it.

Done.

pdamme

Thanks for this fix @MarcusParadies. Overall, it looks very good to me. Moving the retrieval of the file meta data from the DaphneDSL parser to the type/property inference is a very good idea, since it gives us more flexibility.

Required before the merge:

DaphneInferTypesOpInterface.cpp: daphne::ReadOp::inferTypes() does not only infer the (value) type, but also the column labels in case of a frame. However, frame label inference is done by another interface (DaphneInferFrameLabelsOpInterface). I recommend implementing that interface for ReadOp as well. Even though it means reading the metadata file once more, I think it would be better from a division-of-concerns point of view. Note that we followed the same principle with DaphneInferShapeOpInterface and DaphneInferSparsityOpInterface.

Optional improvements:

While the immediate problem of passing str variable to readFrame/readMatrix does not work #582 is fixed by this PR, the problem is slightly more general: the file name used in readMatrix()/readFrame() may remain unknown until some point. With the current code, the error reported in passing str variable to readFrame/readMatrix does not work #582 can still be caused, e.g., by the following script:
```
i = 1;
filename = "test/api/cli/io/ReadCsv" + i + ".csv";
m = readMatrix(filename);
print(m);
```
This is because here, the file name is not known yet when the first inference pass starts, but only after the first round of constant propagation. I think we could easily tolerate this case by making the type inference for ReadOp return an unknown value type if the file name is not constant (yet). For the script above, the type inference should be successful after a few passes then.
It would be great to add a comment line to testReadFrame.daphne and testReadMatrix.daphne that clarifies what is the crucial point that is tested here. We do that in many/most script-level test cases as it may help to understand why a test case was written that way later. I.e., it's about passing the path to the UDF, such that it is not known at parse-time.

Regarding the changes to SpecializeGenericFunctionsPass.cpp: I was also a bit surprised, like @corepointer. But yes, this pass anyway needs a rewrite.

pdamme · 2023-09-05T15:28:30Z

By the way, the place for the script-level test cases is fine :) .

MarcusParadies · 2023-09-07T06:52:30Z

I incorporated everything so far. It might be a good idea to think about limiting what can be passed as path name to the read functions. We can always have cases where we cannot compile-time-know the string path, which would lead currently to errors since no type deduction etc. can be performed.

For now it might be most reasonable to restrict the usage of string literals representing paths to be "compile-time-known" (whatever that specifically means). For trivial cases, like the one posted by @pdamme, the compiler might be able to do that, but think about dynamic path generation in loops... my 2 cents: restrict that (for now) and throw a proper, meaningful error message.

corepointer · 2023-09-07T09:06:07Z

Generating paths in a loop to read files with sequence numbers is the prime concern of this issue to be resolved for one of our use case partners.

pdamme

Thanks for the improvements, @MarcusParadies, looks good to me. The required point about the frame label inference has been addressed. So I think, given the initial issue description, this is a valid fix and could be merged.

Regarding (non-)compile-time known paths: The example script above could be made working (*). But, yes, there will always be cases where we cannot know the path at compile-time, and we will need a way to cope with that. For now, the idea of an understandable error message sounds good to me.

Indeed, generating paths in loops has been hinted by @m-birke in #582. I assume that means something like

for(i in 1:10) {
  m = readMatrix("path/to/my/file_" + i + ".csv");
  print(m);
}

That would be a bit more difficult to get working, since the value of i is not a compile-time constant. Some possible solutions include:

passing a hint on the value type to readMatrix() (could mean inconsistencies with the actual file meta data)
using another abstraction for loading data sets split into multiple files, such that it is done by just one read-call
dynamic recompilation for each loop iteration

But I think all of that is beyond the scope of this PR.

(*) How to make the example in testReadMatrix_DynamicPath.daphne work: The expression "test/api/cli/io/ReadCsv" + i + ".csv" should be resolved at compile-time through constant folding, which is implemented for various DaphneIR ops in src/ir/daphneir/DaphneDialect.cpp.

Running the scripts as bin/daphne --explain property_inference test/api/cli/io/testReadMatrix_DynamicPath.daphne yields the following IR output:

IR after inference:
module {
  func.func @main() {
    %0 = "daphne.constant"() {value = false} : () -> i1
    %1 = "daphne.constant"() {value = true} : () -> i1
    %2 = "daphne.constant"() {value = ".csv"} : () -> !daphne.String
    %3 = "daphne.constant"() {value = 1 : si64} : () -> si64
    %4 = "daphne.constant"() {value = "test/api/cli/io/ReadCsv"} : () -> !daphne.String
    %5 = "daphne.cast"(%3) : (si64) -> !daphne.String
    %6 = "daphne.concat"(%4, %5) : (!daphne.String, !daphne.String) -> !daphne.String
    %7 = "daphne.concat"(%6, %2) : (!daphne.String, !daphne.String) -> !daphne.String
    %8 = "daphne.read"(%7) : (!daphne.String) -> !daphne.Matrix<?x?x!daphne.Unknown>
    "daphne.print"(%8, %1, %0) : (!daphne.Matrix<?x?x!daphne.Unknown>, i1, i1) -> ()
    "daphne.return"() : () -> ()
  }
}

Constant folding is implemented for ConcatOp (daphne.concat). However, constant folding only works if all inputs are already constants. In this case, %5 is not a constant, but the result of a daphne.cast. CastOp (daphne.cast) also supports constant folding, but the case for casting to a string hasn't been implemented yet. I think this part of the constant folding is the missing piece to make the example work.

pdamme · 2023-09-08T18:38:41Z

Nevertheless, making read with a generated file name inside a loop work is something we should address soon, maybe using one of the possible approaches mentioned above.

corepointer · 2023-09-08T23:04:05Z

Another workaround that just came to my mind to mitigate the problem when generating path names in a loop would be to unroll the loop at compile time. In the example above all required knowledge to generate paths 1 to 10 is already in the code.

This commit fixes a crash when passing "dynamic strings" that have not yet been materialized in the paring stage to the getMetaData() function. The current solution is to move this functionality out of the parsing stage and into the type inference stage of the DAPHNE compiler. Incorporated review suggestions - Implement DaphneInferFrameLabelsOpInterface for ReadOp - Added comment lines to test scripts - Fixed IsConstant Helper function for string types - Added further example for dynamic path computation and fixed null string problem - still does not work since the string cannot be (yet) computed at compile time Fixes daphne-eu#582 Closes daphne-eu#598

This commit fixes two compiler warnings. - an unnecessary declaration - a control flow issue that would resultin a missing return value

corepointer · 2023-09-11T08:04:51Z

LGTM - all tests passing here.
Thx @MarcusParadies for fixing this and @pdamme for the additional review.

corepointer reviewed Sep 4, 2023

View reviewed changes

pdamme self-requested a review September 5, 2023 14:37

pdamme requested changes Sep 5, 2023

View reviewed changes

pdamme approved these changes Sep 8, 2023

View reviewed changes

MarcusParadies and others added 2 commits September 11, 2023 09:39

[MINOR] Fixed compiler warnings

96fbecb

This commit fixes two compiler warnings. - an unnecessary declaration - a control flow issue that would resultin a missing return value

corepointer force-pushed the 582-fix-passing-str-variable-to-readFrame-and-readMatrix branch from 6c5a2c0 to 96fbecb Compare September 11, 2023 08:03

corepointer merged commit 96fbecb into daphne-eu:main Sep 11, 2023
2 checks passed

MarcusParadies deleted the 582-fix-passing-str-variable-to-readFrame-and-readMatrix branch September 11, 2023 09:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix passing str variable to readFrame/readMatrix #598

Fix passing str variable to readFrame/readMatrix #598

MarcusParadies commented Sep 4, 2023

corepointer left a comment

corepointer Sep 4, 2023

MarcusParadies Sep 4, 2023

corepointer Sep 4, 2023

MarcusParadies Sep 5, 2023

corepointer Sep 4, 2023

MarcusParadies Sep 4, 2023

corepointer Sep 4, 2023

corepointer Sep 4, 2023

MarcusParadies Sep 4, 2023

corepointer Sep 4, 2023

MarcusParadies commented Sep 5, 2023

pdamme left a comment

pdamme commented Sep 5, 2023

MarcusParadies commented Sep 7, 2023

corepointer commented Sep 7, 2023

pdamme left a comment

pdamme commented Sep 8, 2023

corepointer commented Sep 8, 2023

corepointer commented Sep 11, 2023

Fix passing str variable to readFrame/readMatrix #598

Fix passing str variable to readFrame/readMatrix #598

Conversation

MarcusParadies commented Sep 4, 2023

corepointer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcusParadies commented Sep 5, 2023

pdamme left a comment

Choose a reason for hiding this comment

pdamme commented Sep 5, 2023

MarcusParadies commented Sep 7, 2023

corepointer commented Sep 7, 2023

pdamme left a comment

Choose a reason for hiding this comment

pdamme commented Sep 8, 2023

corepointer commented Sep 8, 2023

corepointer commented Sep 11, 2023