Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kedro airflow create produces very long task ids when using unnamed nodes #397

Closed
astrojuanlu opened this issue Oct 17, 2023 · 2 comments
Closed

Comments

@astrojuanlu
Copy link
Member

Description

As per title.

Context

TBC

Steps to Reproduce

# conf/airflow/catalog.yml
active_modelling_pipeline.regressor:
  filepath: data/06_models/regressor_active.pkl
  type: pickle.PickleDataSet
  versioned: true
candidate_modelling_pipeline.regressor:
  filepath: data/06_models/regressor_candidate.pkl
  type: pickle.PickleDataSet
  versioned: true
companies:
  filepath: data/01_raw/companies.csv
  type: pandas.CSVDataSet
model_input_table:
  filepath: data/03_primary/model_input_table.pq
  type: pandas.ParquetDataSet
preprocessed_companies:
  filepath: data/02_intermediate/preprocessed_companies.pq
  type: pandas.ParquetDataSet
preprocessed_shuttles:
  filepath: data/02_intermediate/preprocessed_shuttles.pq
  type: pandas.ParquetDataSet
reviews:
  filepath: data/01_raw/reviews.csv
  type: pandas.CSVDataSet
shuttles:
  filepath: data/01_raw/shuttles.xlsx
  load_args:
    engine: openpyxl
  type: pandas.ExcelDataSet

Then $ kedro airflow create --target-dir=dags/ --env=airflow produces tasks like these:

...
        "active-modelling-pipeline-evaluate-model-active-modelling-pipeline-regressor-active-modelling-pipeline-x-test-active-modelling-pipeline-y-test-none": KedroOperator(
            task_id="active-modelling-pipeline-evaluate-model-active-modelling-pipeline-regressor-active-modelling-pipeline-x-test-active-modelling-pipeline-y-test-none",
            package_name=package_name,
            pipeline_name=pipeline_name,
            node_name="active_modelling_pipeline.evaluate_model([active_modelling_pipeline.regressor,active_modelling_pipeline.X_test,active_modelling_pipeline.y_test]) -> None",
            project_path=project_path,
            env=env,
        ),
...

Than then cannot be imported into Airflow:

filepath                                          | error                                                              
==================================================+====================================================================
/Users/juan_cano/airflow/dags/spaceflights_dag.py | Traceback (most recent call last):                                 
                                                  |   File                                                             
                                                  | "/Users/juan_cano/.micromamba/envs/airflow310/lib/python3.10/site-p
                                                  | ackages/airflow/models/baseoperator.py", line 805, in __init__     
                                                  |     validate_key(task_id)                                          
                                                  |   File                                                             
                                                  | "/Users/juan_cano/.micromamba/envs/airflow310/lib/python3.10/site-p
                                                  | ackages/airflow/utils/helpers.py", line 55, in validate_key        
                                                  |     raise AirflowException(f"The key has to be less than           
                                                  | {max_length} characters")                                          
                                                  | airflow.exceptions.AirflowException: The key has to be less than   
                                                  | 250 characters                                                     
                                                  |       

Your Environment

(TBC)

  • Kedro version used (pip show kedro or kedro -V):
  • Kedro plugin and kedro plugin version used (pip show kedro-airflow):
  • Python version used (python -V):
  • Operating system and version:
@astrojuanlu astrojuanlu changed the title kedro airflow create produces very long task ids kedro airflow create produces very long task ids when using namespaced pipelines Oct 17, 2023
@sbrugman
Copy link
Contributor

sbrugman commented Dec 20, 2023

Note that this is only the case when the nodes have no explicit name, and node.name defaults to the signature.
Either way, annoying behaviour, but at least specifying the name is a workable solution.

@astrojuanlu
Copy link
Member Author

Interesting, thanks a lot. Opened an issue to track that kedro-org/kedro#3575

I guess this is a feature rather than a bug then. I'm closing.

@astrojuanlu astrojuanlu closed this as not planned Won't fix, can't repro, duplicate, stale Jan 30, 2024
@astrojuanlu astrojuanlu changed the title kedro airflow create produces very long task ids when using namespaced pipelines kedro airflow create produces very long task ids when using unnamed nodes Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants