Skip to content

Fix pydot tests #2019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 10, 2025
Merged

Fix pydot tests #2019

merged 5 commits into from
Jun 10, 2025

Conversation

lkk7
Copy link
Contributor

@lkk7 lkk7 commented Jul 16, 2024

No description provided.

Copy link

codecov bot commented Jul 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.78%. Comparing base (0418dab) to head (551d6b5).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2019      +/-   ##
==========================================
- Coverage   84.81%   84.78%   -0.03%     
==========================================
  Files          46       46              
  Lines        8368     8366       -2     
  Branches     1962     1962              
==========================================
- Hits         7097     7093       -4     
- Misses        803      806       +3     
+ Partials      468      467       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@lkk7
Copy link
Contributor Author

lkk7 commented Jul 17, 2024

The test fails for the same reason it failed for me locally: https://github.com/common-workflow-language/cwltool/actions/runs/9964048376/job/27541027817

The reason is that cwltool returns these strings with a path attached to them, while it wasn't expected.
It doesn't seem to be pydot related, does it?

@mr-c
Copy link
Member

mr-c commented Jul 17, 2024

@lkk7 Interesting, I've triggered a re-run of the CI on the main branch to see if there has been a regression related to one of our dependencies since July 12th

https://github.com/common-workflow-language/cwltool/actions/runs/9905754128

@mr-c
Copy link
Member

mr-c commented Jul 17, 2024

@lkk7 When we pin to a pydot before version 3, all the CI tests still pass: https://github.com/common-workflow-language/cwltool/actions/runs/9976419287/job/27568552139

@lkk7
Copy link
Contributor Author

lkk7 commented Aug 19, 2024

That's an old issue, but could we retry the CI now? Or has this been solved already?

@mr-c mr-c force-pushed the fix-tests-pydot branch from c5ce229 to 80e64f9 Compare August 20, 2024 07:32
@mr-c
Copy link
Member

mr-c commented Aug 20, 2024

@lkk7 Since you started this PR we capped the pydot version to be less than 3. So now that we've rebased the PR I had to add a revert of that pydot version cap commit. We can now compare the pending test results (with pydot 3.x) to the results without pydot 3.x.

@mr-c
Copy link
Member

mr-c commented Aug 20, 2024

@lkk7 Looks like there is a behavior change with pydot 3.x (3.0.1, specifically)

    [
        (
  -         '"command_line_tool"',
  -         '"file_output"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#command_line_tool"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#file_output"',
        ),
        (
  -         '"file_input"',
  -         '"nested_workflow"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#file_input"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#nested_workflow"',
        ),
        (
  -         '"nested_workflow"',
  -         '"operation"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#nested_workflow"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#operation"',
        ),
        (
  -         '"operation"',
  -         '"command_line_tool"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#operation"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#command_line_tool"',
        ),
        (
  -         '"operation"',
  -         '"string_output"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#operation"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#string_output"',
        ),
    ]

https://github.com/common-workflow-language/cwltool/actions/runs/10468140236/job/28988377402?pr=2019#step:8:82

@lkk7
Copy link
Contributor Author

lkk7 commented Aug 20, 2024

I can't pinpoint the exact fix yet, but the bug appears in cwlrdf.py in this function:

def printdot(
    wf: Process,
    ctx: ContextType,
    stdout: IO[str],
) -> None:
    cwl_viewer: CWLViewer = CWLViewer(printrdf(wf, ctx, "n3"))
    stdout.write(cwl_viewer.dot().replace(f"{wf.metadata['id']}#", ""))  ###  <<< HERE 

It's all about those file paths.

wf.metadata['id'] equals 'file:///<your_path>/three_step_color.cwl', and in pydot 2.0.0 it's removed from the string as expected.

In 3.0.0+ the cwl_viewer.dot() contains these paths in a changed form: file:"///<your_path>/three_step_color.cwl#string_output". There is an additional quote that causes the replacement not to work.

@ferdnyc, Could you confirm my suspicion. I know you were not involved in this one, but you made the change that affect this (ea0131517db2a0dbf58ec0f8bb3bd7b40bbd279d). The context of this bug is not important, but – file:///<path>#<...> changed to file:"///<path>#<...>". Is it because now pydot thinks it's a port, like node1:port1?

@mr-c
Copy link
Member

mr-c commented Jun 4, 2025

ping @lkk7 @@ferdnyc , thank you

@lkk7
Copy link
Contributor Author

lkk7 commented Jun 7, 2025

After a year, I've looked at this again, and I think again that this is because of the handling of port syntax, (the : sign which is used in the failing test).

The thing happens here in cwltool:

def printdot(
    wf: Process,
    ctx: ContextType,
    stdout: IO[str],
) -> None:
    cwl_viewer: CWLViewer = CWLViewer(printrdf(wf, ctx, "n3"))
    stdout.write(cwl_viewer.dot().replace(f"{wf.metadata['id']}#", ""))

As we see in the failed test log posted above:

        (
  -         '"command_line_tool"',
  -         '"file_output"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#command_line_tool"',
  +         'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#file_output"',
        ),
...

The problem occurs specifically here:

stdout.write(cwl_viewer.dot().replace(f"{wf.metadata['id']}#", ""))`

The replacement isnt working anymore, because it wants to erase file:///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#command_line_tool, but it encounters file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#command_line_tool".

I'm not sure if we can just revert back to the previous behavior which ignored it, because theoretically it's unsafe. But it's possible that the quoting behavior is too tight and I'll mention it in the upcoming PR.

An ugly alternative is to adjust the .replace() call, here in cwltool, to just work around the problem.

@ferdnyc
Copy link
Contributor

ferdnyc commented Jun 9, 2025

Apologies, just seeing this for the first time. The port conjecture is likely; the way to overcome that would be to explicitly quote the ID string. (So, pydot.Node('"file:///<path>#command_line_tool"') rather than pydot.Node("file:///<path>#command_line_tool").) A quoted string shouldn't be parsed apart into an id:port value, because the correct quoting there would be '"id":"port"'.

That's consistent with graphviz's own parser, since:

graph G {
  # Either of these will break (syntax error),
  # since it can't parse slashes in an unquoted port value
  file:///some/path#fragment [label="\N"];
  file:///some/path [label="\N"];
}
graph G {
  # These are all exactly equivalent,
  # any will create a node named "file"
  file:somepath [label="\N"];
  "file":"somepath" [label="\N"];
  file:"///some/path#fragment" [label="\N"];
}
graph G {
  # This will create a node named "file:///some/path#fragment"
  "file:///some/path#fragment" [label="\N"];
}

@ferdnyc
Copy link
Contributor

ferdnyc commented Jun 9, 2025

Without knowing more about the context, I can't say where the behavior changed, but if Pydot 2.x was processing input differently it was probably doing it wrong in ways that would break on valid syntax. Like node IDs with :port components, which even though they're completely superfluous (ports are only meaningful for edges, not nodes) are still valid according to the grammar. A lot of that stuff was cleaned up for 3.x.

@lkk7 lkk7 force-pushed the fix-tests-pydot branch from 5dd4d08 to 1ed32af Compare June 9, 2025 19:55
@lkk7
Copy link
Contributor Author

lkk7 commented Jun 9, 2025

@ferdnyc Thanks for the suggestion! Of course we can quote it. Why didn't I think of that?

@lkk7
Copy link
Contributor Author

lkk7 commented Jun 9, 2025

@mr-c

Could you approve the workflow? The latest commit should make the tests pass.

Disclaimer: This change only satisfies the tests, which aren't 100% exhaustive.
If you want the tested output to match the expected output exactly) (which it doesn't right now, as there are some small differences), then the code needs to be refactored. Maybe the expected graph is not up-to-date or something. But this is outside the scope of the current issue, as it has been this way for a long time.
And generally, it works. But I'm not that experienced in cwltool to check it in detail.

@ferdnyc
Copy link
Contributor

ferdnyc commented Jun 9, 2025

@lkk7

Why didn't I think of that?

Because then I wouldn't get the chance to sound Clever™!

@ferdnyc
Copy link
Contributor

ferdnyc commented Jun 9, 2025

Without knowing more about the context, I can't say where the behavior changed, but if Pydot 2.x was processing input differently it was probably doing it wrong in ways that would break on valid syntax. Like node IDs with :port components, which even though they're completely superfluous (ports are only meaningful for edges, not nodes) are still valid according to the grammar. A lot of that stuff was cleaned up for 3.x.

Actually, it looks like these are edges, since even Pydot 4.x won't actually output the port part of a Node definition (as I said, it's meaningless); but the difference is in how the Edge endpoints are output (not parsed).

Pydot does a lot of DWIMming. So if the user creates a technically-invalid edge endpoint like file:///some/path#fragment, instead of throwing a syntax error like Graphviz would, we'll just add the necessary quotes for you.

But whereas Pydot 2.x, 3.x, and 4.x would all output a simple set of endpoints like pydot.Edge("a:p1", "b:p2") the same way:

>>> import pydot
>>> pydot.__version__
'2.0.0'
>>> e = pydot.Edge("a:p1", "b:p2")
>>> e.to_string()
'a:p1 -- b:p2;'

The difference is in how they DWIM endpoints that do require quoting.

Pydot 2.x just threw quotes around the entire endpoint string, making it unnecessarily difficult to create endpoints with port values:

>>> import pydot
>>> pydot.__version__
'2.0.0'
>>> e = pydot.Edge("a:p2!", "b:p4!")
>>> e.to_string()
'"a:p2!" -- "b:p4!";'
>>> e = pydot.Edge("a?:p1", "b?:p2")
>>> e.to_string()
'"a?:p1" -- "b?:p2";'

Pydot 3.x+ is smarter, and knows how to properly quote an id:port-format endpoint:

>>> import pydot
>>> pydot.__version__
'4.0.1.dev0'
>>> e = pydot.Edge("a:p2!", "b:p4!")
>>> e.to_string()
'a:"p2!" -- b:"p4!";'
>>> e = pydot.Edge("a?:p2", "b?:p4")
>>> e.to_string() 
'"a?":p2 -- "b?":p4;'

...which does mean that now, when you don't want that colon to represent the id:port split, you need to wrap the entire ID in quotes.

@mr-c
Copy link
Member

mr-c commented Jun 9, 2025

@mr-c\n\nCould you approve the workflow? The latest commit should make the tests pass

Done! But I see test errors

@ferdnyc
Copy link
Contributor

ferdnyc commented Jun 9, 2025

(Also, note that they will all — including 4.0.1.dev0 — screw up an id:port:compass-format endpoint unless it's properly quoted going in.) This is a bug:

>>> import pydot
>>> pydot.__version__
'4.0.1.dev0'
>>> e = pydot.Edge("a?:p2:nw", "b?:p4:se")
>>> e.to_string()
'"a?:p2":nw -- "b?:p4":se;'
>>> # The workaround, for now
>>> e = pydot.Edge('"a?":p2:nw', '"b?":p4:se')
>>> e.to_string()
'"a?":p2:nw -- "b?":p4:se;'

Fortunately, that's a relatively obscure feature, mostly used by other code that fully quotes all inputs by default.

@ferdnyc
Copy link
Contributor

ferdnyc commented Jun 9, 2025

@mr-c\n\nCould you approve the workflow? The latest commit should make the tests pass

Done! But I see test errors

cwltool/cwlviewer.py: note: In member "_set_inner_edges" of class "CWLViewer":
cwltool/cwlviewer.py:102:21: error: Module has no attribute
"quote_id_if_necessary"  [attr-defined]
                        pydot.quote_id_if_necessary(str(inner_edge_row["so...
                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~

Mmm, to appease Mypy this should probably use pydot.core.quote_id_if_necessary.

@lkk7
Copy link
Contributor Author

lkk7 commented Jun 10, 2025

I think it was about the pydot.pyi missing the new quote_id_if_necessary. Added it.
If that doesn't help, then maybe on old pydot version is cached somewhere in the CI process. Or, ferdnyc is right as always.

@mr-c

@mr-c

This comment was marked as outdated.

Copy link
Member

@mr-c mr-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @lkk7 & @ferdnyc !

@mr-c mr-c enabled auto-merge (squash) June 10, 2025 09:08
@mr-c mr-c disabled auto-merge June 10, 2025 09:16
@mr-c mr-c merged commit 6976e99 into common-workflow-language:main Jun 10, 2025
44 of 45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants