Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] Validate interface variable names are python-valid #2538

Merged

Conversation

ddl-rliu
Copy link
Contributor

Tracking issue

flyteorg/flyte#5511

Why are the changes needed?

Rationale: flyte inputs names are "like" python variable names, which should not use special characters like . (see my.value in the repro code

Expected behavior: The workflow should actually fail at compile time at step 1. (e.g. perform extra validation)

What changes were proposed in this pull request?

Workflow should fail at compile time at step 1, when python-invalid variable name is used like "my.value".

How was this patch tested?

Setup process

import logging

from flytekit import ContainerTask, kwtypes, task, workflow

logger = logging.getLogger(__file__)

calculate_ellipse_area_shell = ContainerTask(
    name="ellipse-area-metadata-shell",
    input_data_dir="/var/inputs",
    output_data_dir="/var/outputs",
    inputs=kwtypes(**{"my.value": float, "c": float}),
    outputs=kwtypes(area=float, metadata=str),
    image="ghcr.io/flyteorg/rawcontainers-shell:v2",
    command=[
        "./calculate-ellipse-area.sh"
    ],
)

@task
def report_all_calculated_areas(
    area_shell: float,
    metadata_shell: str,
):
    logger.info(f"shell: area={area_shell}, metadata={metadata_shell}")

@workflow
def wf(a: float, b: float):
    area_shell, metadata_shell = calculate_ellipse_area_shell(**{"my.value": a, "c": b})

    # Report on all results in a single task to simplify comparison
    report_all_calculated_areas(
        area_shell=area_shell,
        metadata_shell=metadata_shell,
    )

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

@ddl-rliu ddl-rliu changed the title [fix] Validate interface variable names [fix] Validate interface variable names are python-valid Jun 27, 2024
@ddl-rliu ddl-rliu force-pushed the rliu.validate_interface_var_names branch 2 times, most recently from cc4d830 to 631dc40 Compare June 27, 2024 23:18
Comment on lines 74 to 75
if not k.isidentifier():
raise ValueError(f'Input must be valid Python identifier: {k!r}')
Copy link
Contributor Author

@ddl-rliu ddl-rliu Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds validation on input naming in Interface. Note that this is already done on the outputs:

variables = [k for k in outputs.keys()]
...
collections.namedtuple(..., variables) # collections.namedtuple checks
# the variable names are valid python identifiers

So it is sensible to add similar validation to the inputs, which must also by valid python types.

eapolinario
eapolinario previously approved these changes Jun 27, 2024
@@ -70,6 +71,8 @@ def __init__(
self._inputs: Union[Dict[str, Tuple[Type, Any]], Dict[str, Type]] = {} # type: ignore
if inputs:
for k, v in inputs.items():
if not k.isidentifier():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I was about to mention isidentifier. 😄

with pytest.raises(ValueError, match=r"Input must be valid Python identifier:"):
_ = Interface({"my.input": int}, {})

with pytest.raises(ValueError, match=r"Type names and field names must be valid identifiers:"):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message is odd here, but it is the standard error message from collections.namedtuple see: https://github.com/flyteorg/flytekit/pull/2538/files#r1657915913 https://github.com/python/cpython/blob/main/Lib/collections/__init__.py#L401-L402

Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>
@eapolinario
Copy link
Collaborator

@ddl-rliu , can you fix the lint error in CI? You can run make lint from the root to repro it locally.

Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>
@ddl-rliu ddl-rliu force-pushed the rliu.validate_interface_var_names branch from 8c476c9 to 8240ea2 Compare June 27, 2024 23:37
Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>
@ddl-rliu
Copy link
Contributor Author

@eapolinario done!

Copy link

codecov bot commented Jun 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.40%. Comparing base (647b071) to head (fd6a5d7).
Report is 3 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (647b071) and HEAD (fd6a5d7). Click for more details.

HEAD has 13 uploads less than BASE | Flag | BASE (647b071) | HEAD (fd6a5d7) | |------|------|------| ||15|2|
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2538       +/-   ##
===========================================
- Coverage   91.30%   72.40%   -18.91%     
===========================================
  Files          78      182      +104     
  Lines        3968    18576    +14608     
  Branches        0     3659     +3659     
===========================================
+ Hits         3623    13450     +9827     
- Misses        345     4479     +4134     
- Partials        0      647      +647     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@eapolinario eapolinario merged commit 2abf095 into flyteorg:master Jun 28, 2024
46 of 48 checks passed
bgedik pushed a commit to bgedik/flytekit that referenced this pull request Jul 3, 2024
* Validate interface variable names

Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>

* Remove unused import

Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>

* Fix lint error

Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>

---------

Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>
Signed-off-by: bugra.gedik <bugra.gedik@predera.ai>
fiedlerNr9 pushed a commit that referenced this pull request Jul 25, 2024
* Validate interface variable names

Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>

* Remove unused import

Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>

* Fix lint error

Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>

---------

Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>
Signed-off-by: Jan Fiedler <jan@union.ai>
mao3267 pushed a commit to mao3267/flytekit that referenced this pull request Jul 29, 2024
* Validate interface variable names

Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>

* Remove unused import

Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>

* Fix lint error

Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>

---------

Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>
Signed-off-by: mao3267 <chenvincent610@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants