Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental fast validator code path, using cwl-utils #1720

Merged
merged 31 commits into from
Sep 13, 2022
Merged

Conversation

tetron
Copy link
Member

@tetron tetron commented Aug 26, 2022

This is a work in progress.

Requires schema salad codegen improvements in common-workflow-language/schema_salad#587 which will need to be applied to cwl-utils.

On a very large workflow I was testing with, the validation time went
120 seconds to 20 seconds.

Requires schema salad codegen improvements, these are pending

On a very large workflow I was testing with, the validation time went
120 seconds to 20 seconds.

This is a work in progress.
@tetron
Copy link
Member Author

tetron commented Aug 26, 2022

To-do:

  • fix the bugs that are breaking conformance tests (16 failures from the draft v1.2.1 tests as of this writing)
  • fixing type annotations

@codecov
Copy link

codecov bot commented Aug 26, 2022

Codecov Report

Merging #1720 (2aa44d0) into main (1d23218) will decrease coverage by 8.92%.
The diff coverage is 91.01%.

@@            Coverage Diff             @@
##             main    #1720      +/-   ##
==========================================
- Coverage   66.61%   57.68%   -8.93%     
==========================================
  Files          89       99      +10     
  Lines       15856    11479    -4377     
  Branches     4190     2310    -1880     
==========================================
- Hits        10562     6622    -3940     
- Misses       4206     4368     +162     
+ Partials     1088      489     -599     
Impacted Files Coverage Δ
setup.py 0.00% <ø> (ø)
cwltool/update.py 72.50% <75.00%> (-0.05%) ⬇️
cwltool/context.py 97.05% <83.33%> (-0.54%) ⬇️
cwltool/load_tool.py 86.66% <93.05%> (+1.53%) ⬆️
cwltool/argparser.py 95.21% <100.00%> (+0.41%) ⬆️
cwltool/main.py 74.29% <100.00%> (+2.31%) ⬆️
cwltool/cwltool/cwlviewer.py
cwltool/cwltool/workflow.py
cwltool/cwltool/loghandler.py
cwltool/cwltool/__main__.py
... and 112 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@lgtm-com
Copy link

lgtm-com bot commented Aug 26, 2022

This pull request introduces 1 alert when merging f6d8198 into 0e2ced5 - view on LGTM.com

new alerts:

  • 1 for Wrong name for an argument in a class instantiation

@tetron
Copy link
Member Author

tetron commented Aug 27, 2022

With parser from latest codegen fixes, now passing the entire v1.2.1-proposed conformance suite.

@lgtm-com
Copy link

lgtm-com bot commented Aug 27, 2022

This pull request introduces 1 alert when merging b145e3c into 0e2ced5 - view on LGTM.com

new alerts:

  • 1 for Wrong name for an argument in a class instantiation

@mr-c
Copy link
Member

mr-c commented Aug 27, 2022

Great progress!

Requires schema salad codegen improvements in common-workflow-language/schema_salad#587 which will need to be applied to cwl-utils.

Can we get a draft cwl-utils PR with the new parsers?

@tetron
Copy link
Member Author

tetron commented Aug 29, 2022

Can we get a draft cwl-utils PR with the new parsers?

As soon as I can make mypy happy

@tetron
Copy link
Member Author

tetron commented Aug 31, 2022

We probably also want a CI action that run the conformance tests with --fast-validator

@lgtm-com
Copy link

lgtm-com bot commented Sep 1, 2022

This pull request introduces 1 alert when merging b00b17e into 2a2216b - view on LGTM.com

new alerts:

  • 1 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Sep 1, 2022

This pull request introduces 1 alert when merging cf846bc into 2a2216b - view on LGTM.com

new alerts:

  • 1 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Sep 2, 2022

This pull request introduces 1 alert when merging 3adbcbc into 2a2216b - view on LGTM.com

new alerts:

  • 1 for Unused import

@mr-c
Copy link
Member

mr-c commented Sep 2, 2022

For the docs, we need to explain if there are any downsides. (Are error messages less helpful? Are some types of errors not caught by this code path?)

@tetron
Copy link
Member Author

tetron commented Sep 2, 2022

For the docs, we need to explain if there are any downsides. (Are error messages less helpful? Are some types of errors not caught by this code path?)

There are two main gaps currently:

  1. It doesn't do link checking (so things like a dangling 'source' field won't be reported, it'll fail at runtime)
  2. Runtime errors won't have file/row/column at all

Validation errors do include file/row/column but are somewhat less polished.

@mr-c
Copy link
Member

mr-c commented Sep 2, 2022

For the docs, we need to explain if there are any downsides. (Are error messages less helpful? Are some types of errors not caught by this code path?)

There are two main gaps currently:

1. It doesn't do link checking (so things like a dangling 'source' field won't be reported, it'll fail at runtime)

2. Runtime errors won't have file/row/column at all

Validation errors do include file/row/column but are somewhat less polished.

Confirmed by editing tests/echo_broken_outputs.cwl to be a v1.2 file:

$ cwltool --fast-validator --validate tests/echo_broken_outputs.cwl
INFO /home/michael/cwltool/env3.10/bin/cwltool 3.1.20220902151754
INFO Resolved 'tests/echo_broken_outputs.cwl' to 'file:///home/michael/cwltool/tests/echo_broken_outputs.cwl'
ERROR Tool definition failed validation:
- tried CommandLineTool but
  Trying 'CommandLineTool'
    the `outputs` field is not valid because:
      Expected a list, was <class 'NoneType'>
- tried ExpressionTool but
  Not a ExpressionTool
- tried Workflow but
  Not a Workflow
- tried Operation but
  Not a Operation
- tried array<CommandLineTool | ExpressionTool | Workflow | Operation> but
  Expected a list, was <class 'dict'>
$ cwltool  --validate tests/echo_broken_outputs.cwl
INFO /home/michael/cwltool/env3.10/bin/cwltool 3.1.20220902151754
INFO Resolved 'tests/echo_broken_outputs.cwl' to 'file:///home/michael/cwltool/tests/echo_broken_outputs.cwl'
ERROR Tool definition failed validation:
tests/echo_broken_outputs.cwl:10:1: "outputs" section is not valid.

A candidate for a unit test

@lgtm-com
Copy link

lgtm-com bot commented Sep 2, 2022

This pull request introduces 1 alert when merging 80271c4 into f43ca87 - view on LGTM.com

new alerts:

  • 1 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Sep 2, 2022

This pull request introduces 1 alert when merging 8c1c82d into f43ca87 - view on LGTM.com

new alerts:

  • 1 for Unused import

@mr-c
Copy link
Member

mr-c commented Sep 2, 2022

ooh, it doesn't catch the empty requirements section in tests/wc-tool-bad-reqs.cwl

$ cwltool  --validate tests/wc-tool-bad-reqs.cwl
INFO /home/michael/cwltool/env3.10/bin/cwltool 3.1.20220902151754
INFO Resolved 'tests/wc-tool-bad-reqs.cwl' to 'file:///home/michael/cwltool/tests/wc-tool-bad-reqs.cwl'
ERROR Tool definition failed validation:
tests/wc-tool-bad-reqs.cwl:6:1: If 'requirements' is present then it must be a list or
                                map/dictionary, not empty.
$ cwltool --fast-validator --validate tests/wc-tool-bad-reqs.cwl
INFO /home/michael/cwltool/env3.10/bin/cwltool 3.1.20220902151754
INFO Resolved 'tests/wc-tool-bad-reqs.cwl' to 'file:///home/michael/cwltool/tests/wc-tool-bad-reqs.cwl'
tests/wc-tool-bad-reqs.cwl is valid CWL.

and default file objects are not checked either

$ cwltool  --validate tests/wf/default-dir5.cwl
INFO /home/michael/cwltool/env3.10/bin/cwltool 3.1.20220902151754
INFO Resolved 'tests/wf/default-dir5.cwl' to 'file:///home/michael/cwltool/tests/wf/default-dir5.cwl'
tests/wf/default-dir5.cwl:20:13: Warning: Field `location` contains undefined reference to
                                 `file:///home/michael/cwltool/tests/wf/inp1`
tests/wf/default-dir5.cwl:20:13: Warning: Field `location` contains undefined reference to
                                 `file:///home/michael/cwltool/tests/wf/inp1`
tests/wf/default-dir5.cwl is valid CWL.
$ cwltool --fast-validator --validate tests/wf/default-dir5.cwl
INFO /home/michael/cwltool/env3.10/bin/cwltool 3.1.20220902151754
INFO Resolved 'tests/wf/default-dir5.cwl' to 'file:///home/michael/cwltool/tests/wf/default-dir5.cwl'
tests/wf/default-dir5.cwl is valid CWL.

And extensions don't work either. We should error on combining --fast-validator and --enable-ext

$ cwltool  --enable-ext --validate tests/wf/generator/pytoolgen.cwl
INFO /home/michael/cwltool/env3.10/bin/cwltool 3.1.20220902151754
INFO Resolved 'tests/wf/generator/pytoolgen.cwl' to 'file:///home/michael/cwltool/tests/wf/generator/pytoolgen.cwl'
tests/wf/generator/pytoolgen.cwl is valid CWL.

$ cwltool --fast-validator --enable-ext --validate tests/wf/generator/pytoolgen.cwl
INFO /home/michael/cwltool/env3.10/bin/cwltool 3.1.20220902151754
INFO Resolved 'tests/wf/generator/pytoolgen.cwl' to 'file:///home/michael/cwltool/tests/wf/generator/pytoolgen.cwl'
ERROR Tool definition failed validation:
- tried CommandLineTool but
  Not a CommandLineTool
- tried ExpressionTool but
  Not a ExpressionTool
- tried Workflow but
  Not a Workflow
- tried Operation but
  Not a Operation
- tried array<CommandLineTool | ExpressionTool | Workflow | Operation> but
  Expected a list, was <class 'dict'>

It is also intolerant of extra id fields, like in https://www.commonwl.org/v1.2/CommandLineTool.html#CommandInputRecordSchema found in tests/wf/schemadef-bug-1473.cwl

@tetron tetron marked this pull request as ready for review September 9, 2022 20:13
@tetron tetron marked this pull request as draft September 10, 2022 16:23
@tetron tetron marked this pull request as ready for review September 12, 2022 13:27
Copy link
Member

@mr-c mr-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

odd that we are missing test coverage at https://github.com/common-workflow-language/cwltool/pull/1720/files#diff-7bc82a2b4dbfbc386a1c1d22dac00140d782b959e858cd99978fc3ee193d15f2R273 and beyond ; can you look into that? Is test coverage not being collected in the --fast-parser github action run of the conformance tests?

README.rst Outdated Show resolved Hide resolved
tetron and others added 2 commits September 13, 2022 12:13
Co-authored-by: Michael R. Crusoe <1330696+mr-c@users.noreply.github.com>
@tetron
Copy link
Member Author

tetron commented Sep 13, 2022

I don't actually know how we enable coverage data to be collected?

@mr-c
Copy link
Member

mr-c commented Sep 13, 2022

I don't actually know how we enable coverage data to be collected?

cat > "${CWLTOOL_WITH_COV}" <<EOF

@tetron
Copy link
Member Author

tetron commented Sep 13, 2022

Well now it's failing on "pinging codecov" 🤦

@mr-c
Copy link
Member

mr-c commented Sep 13, 2022

Well now it's failing on "pinging codecov" facepalm

Do you have a link? I'm not seeing that..

@tetron
Copy link
Member Author

tetron commented Sep 13, 2022

Do you have a link? I'm not seeing that..

It was one of the earlier test iterations (py36 I think), but it seems to have just been a transient failure.

@mr-c
Copy link
Member

mr-c commented Sep 13, 2022

Coverage is being recorded for cwltool/cwltool/*.py and cwltool/*.py https://app.codecov.io/gh/common-workflow-language/cwltool/pull/1720 ; this is probably a source of confusion

@mr-c
Copy link
Member

mr-c commented Sep 13, 2022

The diff coverage is 91.01%.

Much better! (it was previously 33%)

@mr-c mr-c enabled auto-merge (squash) September 13, 2022 18:05
@mr-c mr-c merged commit cc3b260 into main Sep 13, 2022
@mr-c mr-c deleted the pa/cwlutil-validator branch September 13, 2022 18:11
GlassOfWhiskey pushed a commit that referenced this pull request Sep 15, 2022
On a very large workflow I was testing with, the validation time went
120 seconds to 20 seconds.

* conformance testing: use CWLTOOL_OPTIONS
* CWLTOOL_OPTIONS: ignore an empty string
* Add loadingContext.skip_resolve_all with note
* conformance tests: report CWLTOOL_OPTIONS
* CI: include extra options in coverge classname
* conformance coverage: normalize paths
* Bump cwl-utils version requirement

Co-authored-by: Michael R. Crusoe <1330696+mr-c@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants