Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checks for loops being created on nodes #2029

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

fnRaihanKibria
Copy link

Addressing issue #1932

The cause of the crash is a buffer overrun in ShaderGraph::topologicalSort() in the line _nodeOrder[count++] = node;. This is caused by a loop edge being present on the 'color_mix' node of the graph object this was called on, which made the sort algorithm malfunction.

Unfortunately I was unable to determine in the time I have available why this loop is created. I didn't want to add a change that hides the underlying problem, which still needs to be fixed. Instead, this PR adds some checks to two locations in the code that throw exceptions when a case of a loop being created is detected, to aid in fixing the real issue and maybe help diagnosing other problems in the future. With this PR the graph editor does not crash any more, instead an error message "Upstream node 'color_mix' has itself as downstream node, creating a loop" is printed to console.

Copy link

linux-foundation-easycla bot commented Sep 27, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@jstone-lucasfilm
Copy link
Member

This looks like a great first step, thanks @fnRaihanKibria! I'm CC'ing @niklasharrysson for his expertise with this code, and we can discuss next steps.

@jstone-lucasfilm
Copy link
Member

@fnRaihanKibria Have you looked into the new errors in our test suite that occur with these proposed changes? Independent of future review from the MaterialX team, these errors will be important to resolve!

Here's a snippet of text from the errors that we're seeing with these changes:

-------------------------------------------------------------------------------
GenReference: OSL Reference
-------------------------------------------------------------------------------
D:\a\MaterialX\MaterialX\source\MaterialXTest\MaterialXRenderOsl\GenReference.cpp(23)
...............................................................................

D:\a\MaterialX\MaterialX\source\MaterialXTest\MaterialXRenderOsl\GenReference.cpp(137): FAILED:
  CHECK( failedGeneration == false )
with expansion:
  true == false

You should be able to debug this issue by running the MaterialXTest project in a Debug build, where you can place breakpoints or print statements to learn more about what's happening.

@fnRaihanKibria
Copy link
Author

I get the same test failure locally on Windows. Log file contents show one of the new exceptions being thrown for the "GenReference: OSL Reference" test:

Skip generating reference for'surfacematerial'
Skip generating reference for'volumematerial'
Error generating OSL reference for 'constant_float' : 
Tried to create looping connection on node constant_float from output: constant_float_value to input: constant_float_out
Error generating OSL reference for 'constant_color3' : 
Tried to create looping connection on node constant_color3 from output: constant_color3_value to input: constant_color3_out
Error generating OSL reference for 'constant_color4' : 
Tried to create looping connection on node constant_color4 from output: constant_color4_value to input: constant_color4_out
Error generating OSL reference for 'constant_vector2' : 
Tried to create looping connection on node constant_vector2 from output: constant_vector2_value to input: constant_vector2_out
Error generating OSL reference for 'constant_vector3' : 
Tried to create looping connection on node constant_vector3 from output: constant_vector3_value to input: constant_vector3_out
Error generating OSL reference for 'constant_vector4' : 
Tried to create looping connection on node constant_vector4 from output: constant_vector4_value to input: constant_vector4_out
Error generating OSL reference for 'constant_boolean' : 
Tried to create looping connection on node constant_boolean from output: constant_boolean_value to input: constant_boolean_out
Error generating OSL reference for 'constant_integer' : 
Tried to create looping connection on node constant_integer from output: constant_integer_value to input: constant_integer_out
Error generating OSL reference for 'constant_matrix33' : 
Tried to create looping connection on node constant_matrix33 from output: constant_matrix33_value to input: constant_matrix33_out
Error generating OSL reference for 'constant_matrix44' : 
Tried to create looping connection on node constant_matrix44 from output: constant_matrix44_value to input: constant_matrix44_out
Error generating OSL reference for 'constant_string' : 
Tried to create looping connection on node constant_string from output: constant_string_value to input: constant_string_out
Skip generating reference for'constant_filename'
Skip generating reference for unimplemented node 'curveadjust_float'
Skip generating reference for unimplemented node 'curveadjust_color3'
Skip generating reference for unimplemented node 'curveadjust_color4'
Skip generating reference for unimplemented node 'curveadjust_vector2'
Skip generating reference for unimplemented node 'curveadjust_vector3'
Skip generating reference for unimplemented node 'curveadjust_vector4'
Skip generating reference for unimplemented node 'mix_displacementshader'
Skip generating reference for unimplemented node 'mix_volumeshader'
Skip generating reference for'dot_filename'

Let me check if I can find out why.

@fnRaihanKibria
Copy link
Author

The problem was that the loop check was too loose. The failing unit test had a situation where a graph node's input sockets where connected to its own output sockets (caused by a bypass during optimization), which led to the exception because it's technically the same node. I added a check to ignore such cases for graph nodes, which fixes the test. The bug in this ticket still occurs and still throws because it runs into the (now stricter) exception case.

@jstone-lucasfilm
Copy link
Member

Thanks for addressing those issues, @fnRaihanKibria, and I'd be interested in thoughts from @niklasharrysson on the correctness of this pull request.

@niklasharrysson
Copy link
Contributor

Thank you for finding this @fnRaihanKibria, and for your proposed fix.

I will investigate this further, and see if I can find the cause for the loop being created.

@niklasharrysson
Copy link
Contributor

The cause of this issue is a limitation in the support for cascading nodegraphs.

When there are multiple nodegraphs connected to each other these may be flattened into a single large ShaderGraph for codegen. But there are no handling of the case where nodes internal to these graphs may share the same name. So when connections are constructed and nodes are referenced by name this breaks down..

A typical case is copy/paste of a graph as in #1932, where all nodes in the copy share names with the original.

So the symptom is that a cycle is found in the graph, but it's actually caused by nodes not being uniquely named.

I think the best fix for this is to stop flattening nodegraphs, and preserving the nodegraph hierarchies, when creating the codegen ShaderGraph. There are other benefits of this as well, for example an issue we've discussed before with limiting the number of uniforms that are generated for a shader (if nodegraphs are not flattened it's much easier to keep internal parameters private).

I will take a closer look at getting this work started as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants