Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UNITY] Fix the symbolic var handling #15973

Merged
merged 1 commit into from
Oct 24, 2023
Merged

Conversation

tqchen
Copy link
Member

@tqchen tqchen commented Oct 24, 2023

This PR fix a bug that only appears on windows on symbolic var handling. For the function parameter var with Tuple struct info, directly iterate over that variable will cause out of bound error(which is properly handled in linux but not necessarily in windows due to how iteration is handled).

Note that the var itself is not iterable and likely it relied on a sugar that var[i] dispatches to tuple_getitem and when it gets out of bound the exception get caught by the python iterator.

This PR fixes it by directly iterate over the struct info array in the annotation.

This PR fix a bug that only appears on windows on symbolic var handling.
For the function parameter var with Tuple struct info, directly
iterate over that variable will cause out of bound error(which is
properly handled in linux but not necessarily in windows due to how iteration is handled).

Note that the var itself is not iterable and likely it relied on a sugar that
var[i] dispatches to tuple_getitem and when it gets out of bound the exception
get caught by the python iterator.

This PR fixes it by directly iterate over the struct info array in the annotation.
@tqchen
Copy link
Member Author

tqchen commented Oct 24, 2023

This fixes a bug introduced in #15563 and was revealed recently in windows that involves transform with symbolic shape

@junrushao
Copy link
Member

junrushao commented Oct 24, 2023

I was a bit curious why the handling is different on Windows? Does it mean OOB handling is somewhat different? Is this gonna affect all usecases in iterating an Array or so?

Copy link
Member

@junrushao junrushao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna get this hotfix in quickly to unblock this issue: mlc-ai/mlc-llm#1112

@junrushao junrushao merged commit bcdbc3e into apache:unity Oct 24, 2023
14 of 15 checks passed
@tqchen
Copy link
Member Author

tqchen commented Oct 24, 2023

This is due to iterating over a Var(instead of array), and likely other system treated exception as stop iteration (we were lucky)

@Lunderberg
Copy link
Contributor

I was a bit curious why the handling is different on Windows? Does it mean OOB handling is somewhat different? Is this gonna affect all usecases in iterating an Array or so?

I'm curious, too, as I don't see a reason why the iteration would be platform-dependent.

The failure mode I know of isn't platform-dependent, but normalization-dependent. If a relax expression will produce a relax.Tuple (e.g. The result of R.split), but doesn't yet have a TupleStructInfo from normalization, then this __getitem__ call will always return a value. Because the iteration never throws StopIteration, the loop proceeds indefinitely. I have run into issues in iterating over relax objects that produce a tuple, but which haven't yet been normalized to have a TupleStructInfo.

@Lunderberg
Copy link
Contributor

This is due to iterating over a Var(instead of array), and likely other system treated exception as stop iteration (we were lucky)

I think this should be legal behavior, as a python function shouldn't need to be aware of whether its input is an explicit relax.Tuple, a relax.Var with relax.TupleStructInfo, or a relax.Call producing a relax.Tuple, and should be able to handle all three cases with the same user-facing interface.

Long-term, I think we could make it more robust by moving the struct-info inference from the Normalizer into object constructors for the majority of cases. That way, we would avoid the case of having an undefined StructInfo, when the struct info is required to iterate over an object.

@Lunderberg
Copy link
Contributor

Though in this case, I'm not sure why this check would be platform-dependent. This already provides a struct info for the parameter whenever accessing an instance with known struct info, and checks for the bounds, and so [param.struct_info for param in my_tuple] should be equivalent to my_tuple.struct_info.fields.

@tqchen
Copy link
Member Author

tqchen commented Oct 24, 2023

The code is not platform dependent and the error is still throws but python's iterator do not treat this as a stop signal in windows. Regardless of the behavior, it is not a good idea to rely on throw(that is unrelated to StopIteration) to handle iteratables. Especially when the object is not intended to be used as an Iterable

Moving to object constructor would incur non-trivial cost since normalization is state dependent. It is good to setup expectation that object constructor is a constructor and not couple that with normalization, which would complicate the expectations when setting up the object and is also impossible as some normalization generates extra objects that needs a builder context.

For this particular case, I think relying on iterating over a Var or Expr is deangerous (they are not intended to be iterables).

Coming back to a general rationale. It is better to have well-defined (and easy to reason) restricted behavior than general behavior that are harder to reason about (and sometimes cause unexpected behaviors ), assuming the restricted behavior (although sometimes slightly more verbose) can solve the problem. The simplicity in code also helps debugging when others look at the behavior (one had to dig into stack trace deeply to find out the cause when the general behavior is used).

So when handling function input, which has a tuple struct info, iterating over the struct info fields(that is known to be an array and is iterable) is much safer

@Lunderberg
Copy link
Contributor

For this particular case, I think relying on iterating over a Var or Expr is deangerous (they are not intended to be iterables).

If that is the case, we should either remove the definition of ExprWithOp.__getitem__, or implement ExprWithOp.__iter__ that throws a type error. Any python object that defines __getitem__ is iterable, and produces obj[i] with i ranging from zero to the first value that results in IndexError being thrown. However, either method of removing iteration over a relax.Expr would result in breaking user code.

As a middle option, I think we could improve this case by providing an explicit __iter__ that checks whether the object has TupleStructInfo. If it does, we allow iteration over a well-defined range, removing the dependency on a thrown exception. If it doesn't, then we throw a TypeError immediately without iterating. Any user code that relies on the current behavior will continue to work, and any code that would either produce an infinite loop or produce out-of-bounds tuple access will instead throw an earlier error message.

@tqchen
Copy link
Member Author

tqchen commented Oct 24, 2023

ExprWithOp.__iter__ throw is a better option here i think since iterating over Expr can leads to surprises.

@Lunderberg
Copy link
Contributor

It is good to setup expectation that object constructor is a constructor and not couple that with normalization

I should have clarified that it was only the initialization of the Expr::struct_info_ field that I think should be part of an object constructor, and not the other normalization steps (e.g. flattening of SeqExpr) that are clearly context-dependent. @slyubomirsky and I were looking into it last week, and we're not aware of any location that relies on non-local information for shape inference.

@Lunderberg
Copy link
Contributor

since iterating over Expr can leads to surprises.

Maybe this is a lack of imagination on my side, but for expressions that have TupleStructInfo, both the tuple dimension and the type of each element are known, and so I'm not coming up with any potential surprises that would result from this. Certainly it would be good to guard against the known failure modes of out-of-bounds access or unknown internal types, but requiring TupleStructInfo seems like a sufficiently narrow contract to avoid surprises.

@tqchen
Copy link
Member Author

tqchen commented Oct 24, 2023

it is mainly about the possible surprises and confusions in this case (again simple predicable behavior vs general, perhaps predicable behavior that needs extra hops of reasoning and engineering support), as we would need to take an extra set of reasonings to be able to deduce that we can iterate over a Expr with TupleStructInfo and it is also non obvious from typing. So in such cases, directly iterate over the struct info fields is still more readable and clear

@Lunderberg
Copy link
Contributor

Maybe then there's a different set of expectations that we're bringing in. For me, I expect a language-level variable to be usable in any context where an expression could have been used directly (e.g. print(5) and x=5; print(x) should be equivalent). As a result, I expect a relax.Var to be usable in any context where its bound value could have been used directly. Being able to iterate over a relax.Tuple, but not being able to iterate over a relax.Var that is bound to a relax.Tuple, would therefore be very surprising.

@tqchen
Copy link
Member Author

tqchen commented Oct 24, 2023

there is a difference between compile time cosntruct and runtime construct, for runtime it certainly makes sense (and I agree with your example, which sets expecations of the runtime behavior), e.g. an Object in runtime needs to be subscrible it is an Array regardless if it is an Var or Tuple Expr in compile time.

We are looking at compile time construct atm and manipulate the data structure for analysis, in such cases, it is good to clarify that normal Expr is not iterable and this also helps to setup a clear expectation. For exmaple, in compile time x is a variable and when we print the IRNode of x it can certainly be different from printing the ConstantLiteral 5

@Lunderberg
Copy link
Contributor

I agree that there's a difference, especially with respect to printing the AST. My worry is that requiring distinct APIs in order to interact with each of those distinct types means that each function becomes incredibly fragile to use. Because the same user-facing concept of "a tuple of relax expressions" may be represented as (a,b), as relax.Tuple([a,b]), or as builder.emit(relax.Tuple([a,b])), every function may have a different requirement, and a user cannot reliably predict which option is required for a specific function. The more we can make these three representations act equivalently, by allowing for common python idioms in the implementation, the less a user needs to be aware of the internal distinctions, and the easier it is for a new user to get started.

I've put together two draft PRs, #15975 and #15976, with the two possible fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants