Skip to content

Conversation

@ambika-garg
Copy link
Contributor

@ambika-garg ambika-garg commented Oct 20, 2023

This PR adds a custom operator AzureSynapsePipelineOperator to the Microsoft provider. This operator simplifies the execution of Azure Synapse pipelines, allowing Airflow users to trigger Azure Synapse Pipelines directly from their DAGs..

Usage example:

import datetime
from airflow import DAG
from airflow.providers.microsoft.azure.operators.synapse import AzureSynapsePipelineOperator

with DAG(
    dag_id="synapse_analytics_dag",
    schedule_interval=None,
    start_date=datetime.datetime(2021, 1, 1),
) as dag:
    run_synapse_pipeline = AzureSynapseRunPipelineOperator(
        azure_synapse_conn_id="azure_synapse_connection",
        task_id="run_synapse_pipeline",
        pipeline_name="synapse_pipeline_name",
        azure_synapse_workspace_dev_endpoint="azure_synapse_workspace_dev_endpoint"
    )

    run_synapse_pipeline
```

@ambika-garg ambika-garg force-pushed the feature-AzureSynapseRunPipelineOperator branch from ecfcd8d to 928fdef Compare October 23, 2023 19:22
CONTRIBUTING.rst Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you use Windows. As I could see there is LF (common posix line ending character) changes to CRLF (windows).

You need to configure your git client: https://docs.github.com/en/get-started/getting-started-with-git/configuring-git-to-handle-line-endings?platform=windows

In additional it might required to make additional changes into your editor, Airflow uses .editorconfig configuration, so if your editor support it (e.g. PyCharm support it out of the box), you could set it up by follow instructions: https://editorconfig.org/

In addition it might required to run Static Checks against your changes, see Instruction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, after configuring git client to handle line endings, I am unable to start Breeze, it is exists with the error 127.
Refer to screenshot.
image

@ambika-garg ambika-garg force-pushed the feature-AzureSynapseRunPipelineOperator branch 2 times, most recently from 3243f2b to d9b1d96 Compare November 5, 2023 17:49
@ambika-garg ambika-garg force-pushed the feature-AzureSynapseRunPipelineOperator branch 3 times, most recently from 84fe0b8 to d50e553 Compare November 10, 2023 17:48
@ambika-garg ambika-garg force-pushed the feature-AzureSynapseRunPipelineOperator branch from 10a03a1 to 30f6a3d Compare November 12, 2023 21:01
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation for a new file?
can't the hook be in the existed synapse.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My motivation is to keep code modular and easy to maintain. The synapse.py file deals with SparkClient and its functions, where as the synapse_pipeline.py file deals with ArtifactsClient and its functions. So both classes have different dependencies.
Although I had 2 options:

  1. Include every function of AzureSynapsePipelineHook in AzureSynapseHook class, But that would add on lots of conditional statements, difficulty in dependency management and separate naming conventions for functions with same functionality but different parameters and return values like(get_conn()) function, further leading to confusions when code is extended.
  2. Make Separate classes for each type of service, where each class has its own set of dependencies and responsibility. Also, this approach provides consistent naming convention for functions within each class, improving code readability.

I went with 2nd approach. That's why I made a new file. Also, I think making new hook class with separate dependencies is easier way to code extension .

Please let me know your opinion on this, since you have diverse background on development.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Airflow we use mixed governance model for providers which means that the company behind the provider have shared responsibility over it. Sadly in case of Microsoft the Azure team is not involved (Unlike Google and Aws) as a result we have no way of knowing what is Azure stand point of this subject.
In that case my point of view is to "fallback" to the common practice of how we do things in other providers - you can see for example the effort we did in Amazon/Aws #20139 to have organized files where the motivation is ease of find.

So in the case in front of us and given the information you shared I still think we should use synapse.py.

I welcome Azure team to be more involved in the project should they decide to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the #20139 , understood the general protocols followed in Airflow. So, I'll be moving AzureSynapsePipelineHook class in synapse.py. Also, I'll discuss internally if Azure team can involve more in Airflow project.

@ambika-garg ambika-garg force-pushed the feature-AzureSynapseRunPipelineOperator branch 2 times, most recently from 0df0a8c to a9493f4 Compare November 13, 2023 14:26
@ambika-garg ambika-garg force-pushed the feature-AzureSynapseRunPipelineOperator branch 3 times, most recently from 10a03a1 to c644d8d Compare November 13, 2023 15:02
ambika-garg and others added 6 commits November 13, 2023 15:49
        * Add a hook to interact with Azure Synapse Analytics

        * Add a operator to trigger Synapse pipeline from DAG and operator link

        * Add unit tests for operator and hook

        * Update provider.yaml to support new operator, operator link and hook

        * Update provider_dependencies to install azure-synapse-artifacts
       * Add Mock Synapse Workspace URL

       * Set Default wait_for_termination to False
    * Fix all imports for the class

    * Remove the file from provider.yaml

    * Delete the synapse_pipeline.py file
@ambika-garg ambika-garg force-pushed the feature-AzureSynapseRunPipelineOperator branch from a599393 to 24d3626 Compare November 13, 2023 20:53
@Taragolis Taragolis merged commit 5983506 into apache:main Nov 16, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Nov 16, 2023

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Nov 20, 2023
@ephraimbuddy ephraimbuddy added this to the Airflow 2.8.0 milestone Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools area:providers area:system-tests changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) provider:microsoft-azure Azure-related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants