-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix passing str variable to readFrame/readMatrix #598
Fix passing str variable to readFrame/readMatrix #598
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever move, adding type inference to do the type inference ;-) I like it. Concerning the test cases - I don't see any in the PR yet. You could put them to the read kernel tests, to the function call tests or to the type inference tests? Since you're mainly fixing the read call I'd say it would be ok to put it to the first option but I don't really have a strong opinion about it.
// If an individual value type was specified per column | ||
// (fmd.isSingleValueType == false), then this silently uses the | ||
// type of the first column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment about isSingleValueType is not correct and can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just moved it over, but I can also remove it alltogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment referred to reading a matrix from a file that has a schema defined. So it is not utterly wrong. You could move it a few lines up to the code that handles reading matrices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -188,6 +188,7 @@ namespace { | |||
std::multimap<std::string, func::FuncOp> specializedVersions; | |||
std::set<func::FuncOp> visited; | |||
std::set<func::FuncOp> called; | |||
std::set<func::FuncOp> templateFunctions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extra set for templateFunctions (and its unconditional use in line 299) is not clear to me right away (I did not test-run the code, maybe it would become clear then). Could you explain why this is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially, I thought the check isFunctionTemplate
would already do the trick, but apparantely it does something else. I basically remove the "template function", which is specialized from considerations in later passes (i.e., doing a type inference on such a function doesn't make much sense imho). Template functions are those, which are specialized, that's why I add them unconditionally. According to the test suite this should be fine, but I feel this will require more thorough testing (independent from the reported bug).
In general there are quite a few question marks in this pass. For example the constant parameter propagation leads to X functions with identical code (except for the constant parameter of course) when invoking the UDF X times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've played around with the code a bit now and see what you mean. The generation of a separate function for each invocation (even if it is the same parameter) is a bit weird and could be improved.
@@ -408,7 +410,7 @@ void SpecializeGenericFunctionsPass::runOnOperation() { | |||
entryFunctions.push_back(entry.second); | |||
} | |||
for(const auto &function : entryFunctions) { | |||
if(isFunctionTemplate(function) || visited.count(function)) | |||
if(isFunctionTemplate(function) || visited.count(function) || templateFunctions.count(function)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the invocations of count() were there before, so doing it the same is quite alright and tbh I'm unsure if using empty() is just a matter of taste or if it would yield a (minor?) performance benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
count
has logarithmic complexity whereas empty
has constant complexity, so there should be a slight advantage here. In general, I tried to change as little as possible although the entire pass probably needs a bigger rewrite anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The count actually makes sense here, as we're checking for that specific function in the set and not if there are any functions at all.
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this fix @MarcusParadies. Overall, it looks very good to me. Moving the retrieval of the file meta data from the DaphneDSL parser to the type/property inference is a very good idea, since it gives us more flexibility.
Required before the merge:
DaphneInferTypesOpInterface.cpp
:daphne::ReadOp::inferTypes()
does not only infer the (value) type, but also the column labels in case of a frame. However, frame label inference is done by another interface (DaphneInferFrameLabelsOpInterface
). I recommend implementing that interface forReadOp
as well. Even though it means reading the metadata file once more, I think it would be better from a division-of-concerns point of view. Note that we followed the same principle withDaphneInferShapeOpInterface
andDaphneInferSparsityOpInterface
.
Optional improvements:
- While the immediate problem of passing str variable to readFrame/readMatrix does not work #582 is fixed by this PR, the problem is slightly more general: the file name used in
readMatrix()
/readFrame()
may remain unknown until some point. With the current code, the error reported in passing str variable to readFrame/readMatrix does not work #582 can still be caused, e.g., by the following script:This is because here, the file name is not known yet when the first inference pass starts, but only after the first round of constant propagation. I think we could easily tolerate this case by making the type inference fori = 1; filename = "test/api/cli/io/ReadCsv" + i + ".csv"; m = readMatrix(filename); print(m);
ReadOp
return an unknown value type if the file name is not constant (yet). For the script above, the type inference should be successful after a few passes then. - It would be great to add a comment line to
testReadFrame.daphne
andtestReadMatrix.daphne
that clarifies what is the crucial point that is tested here. We do that in many/most script-level test cases as it may help to understand why a test case was written that way later. I.e., it's about passing the path to the UDF, such that it is not known at parse-time.
Regarding the changes to SpecializeGenericFunctionsPass.cpp
: I was also a bit surprised, like @corepointer. But yes, this pass anyway needs a rewrite.
By the way, the place for the script-level test cases is fine :) . |
I incorporated everything so far. It might be a good idea to think about limiting what can be passed as path name to the read functions. We can always have cases where we cannot compile-time-know the string path, which would lead currently to errors since no type deduction etc. can be performed. For now it might be most reasonable to restrict the usage of string literals representing paths to be "compile-time-known" (whatever that specifically means). For trivial cases, like the one posted by @pdamme, the compiler might be able to do that, but think about dynamic path generation in loops... my 2 cents: restrict that (for now) and throw a proper, meaningful error message. |
Generating paths in a loop to read files with sequence numbers is the prime concern of this issue to be resolved for one of our use case partners. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvements, @MarcusParadies, looks good to me. The required point about the frame label inference has been addressed. So I think, given the initial issue description, this is a valid fix and could be merged.
Regarding (non-)compile-time known paths: The example script above could be made working (*). But, yes, there will always be cases where we cannot know the path at compile-time, and we will need a way to cope with that. For now, the idea of an understandable error message sounds good to me.
Indeed, generating paths in loops has been hinted by @m-birke in #582. I assume that means something like
for(i in 1:10) {
m = readMatrix("path/to/my/file_" + i + ".csv");
print(m);
}
That would be a bit more difficult to get working, since the value of i
is not a compile-time constant. Some possible solutions include:
- passing a hint on the value type to
readMatrix()
(could mean inconsistencies with the actual file meta data) - using another abstraction for loading data sets split into multiple files, such that it is done by just one
read
-call - dynamic recompilation for each loop iteration
But I think all of that is beyond the scope of this PR.
(*) How to make the example in testReadMatrix_DynamicPath.daphne
work: The expression "test/api/cli/io/ReadCsv" + i + ".csv"
should be resolved at compile-time through constant folding, which is implemented for various DaphneIR ops in src/ir/daphneir/DaphneDialect.cpp
.
Running the scripts as bin/daphne --explain property_inference test/api/cli/io/testReadMatrix_DynamicPath.daphne
yields the following IR output:
IR after inference:
module {
func.func @main() {
%0 = "daphne.constant"() {value = false} : () -> i1
%1 = "daphne.constant"() {value = true} : () -> i1
%2 = "daphne.constant"() {value = ".csv"} : () -> !daphne.String
%3 = "daphne.constant"() {value = 1 : si64} : () -> si64
%4 = "daphne.constant"() {value = "test/api/cli/io/ReadCsv"} : () -> !daphne.String
%5 = "daphne.cast"(%3) : (si64) -> !daphne.String
%6 = "daphne.concat"(%4, %5) : (!daphne.String, !daphne.String) -> !daphne.String
%7 = "daphne.concat"(%6, %2) : (!daphne.String, !daphne.String) -> !daphne.String
%8 = "daphne.read"(%7) : (!daphne.String) -> !daphne.Matrix<?x?x!daphne.Unknown>
"daphne.print"(%8, %1, %0) : (!daphne.Matrix<?x?x!daphne.Unknown>, i1, i1) -> ()
"daphne.return"() : () -> ()
}
}
Constant folding is implemented for ConcatOp
(daphne.concat
). However, constant folding only works if all inputs are already constants. In this case, %5
is not a constant, but the result of a daphne.cast
. CastOp
(daphne.cast
) also supports constant folding, but the case for casting to a string hasn't been implemented yet. I think this part of the constant folding is the missing piece to make the example work.
Nevertheless, making |
Another workaround that just came to my mind to mitigate the problem when generating path names in a loop would be to unroll the loop at compile time. In the example above all required knowledge to generate paths 1 to 10 is already in the code. |
This commit fixes a crash when passing "dynamic strings" that have not yet been materialized in the paring stage to the getMetaData() function. The current solution is to move this functionality out of the parsing stage and into the type inference stage of the DAPHNE compiler. Incorporated review suggestions - Implement DaphneInferFrameLabelsOpInterface for ReadOp - Added comment lines to test scripts - Fixed IsConstant Helper function for string types - Added further example for dynamic path computation and fixed null string problem - still does not work since the string cannot be (yet) computed at compile time Fixes daphne-eu#582 Closes daphne-eu#598
This commit fixes two compiler warnings. - an unnecessary declaration - a control flow issue that would resultin a missing return value
6c5a2c0
to
96fbecb
Compare
LGTM - all tests passing here. |
Fixes #582
I have also added some end2end tests, but don't really know where to put them. Any suggestions?