Setup mlflow before KedroContext #292

stephanecollot · 2022-03-10T14:40:47Z

Hello,

I have a custom KedroContext (where I initialise a Spark session) and I would like to log things into mlflow at this moment.
But if I log things there, it doesn't take into consideration my mlflow.yml parameters.

I tried to call in my KedroContext:

        mlflow_config = get_mlflow_config()
        mlflow_config.setup()

But I got the following error:
RuntimeError: There is no active Kedro session

The text was updated successfully, but these errors were encountered:

stephanecollot · 2022-03-10T16:38:20Z

If you have any idea for a workaround, feel free to share it!

Galileo-Galilei · 2022-03-15T21:36:54Z

Hi @stephanecollot,

sorry for the response delay. Actually this is a tricky question and I don't have a clear solution, but I might guide you to find a workaround which may suits your use case.

A bit of history

In earlier kedro versions (e.g. 0.15.X), the KedroContext was the only place where you could add custom code to interact with kedro during the execution (e.g. when you launch kedro run). This was not convenient because :

it created cluttered and hard to maintain custom ProjectContext
different logic where hard to compose: if I created a MlflowContext and you a SparkContext, one should inherit from the other to enable to compose the 2 logic. If we have more than 2 custom logic, it became completly intractable in practice.

With kedro=0.16.X, kedro introduced hooks. This adds the possibility to compose and distribute easily different custom code ; in return we lost the ability to inject code anywhere during the run but rather at some predefined places (especially before / after pipeline and node execution).

In general, it is no longer recommended to add custom logic inside the context BUT their documentation still suggest that it is the recommended way to initialise a spark context. I think it is no longer the case, but the reason while they are doing this instead of a hook is because they need to access a spark.yml config file, which is hard to retrieve inside hooks, even if there are solutions for this from kedro>=0.17.X.

On execution order

When launching kedro run, the KedroSession is instantiated first, and during its instantation the ProjectContext is instantiated. This explains why you will never be able to retrieve configuration (and "setup" mlflow here) because the Session simply does not exist at this moment, and obviously is not activated yet.

Potential solutions

Solution 1: Keep your custom context and log in mlflow in a hook

Create a custom hook:

class MyMlflowHook:
    """Namespace for grouping all model-tracking hooks with MLflow together."""

    @hook_impl
    def before_pipeline_run(self, run_params: Dict[str, Any]) -> None:
        """Hook implementation to start an MLflow run
        with the same run_id as the Kedro pipeline run.
        """
        mlflow.log_xxx("<whatever>")

You can register it in settings.py

This will be triggered after the ProjectContext initialisation, but it may feels a uncomfortable to navigate between the context and the hook to log what you need.

Solution 2: Move everything inside a hook

class SparkMlflowHook:
    """Namespace for grouping all model-tracking hooks with MLflow together."""

    @hook_impl
    def before_pipeline_run(self, run_params: Dict[str, Any]) -> None:
        """Hook implementation to start an MLflow run
        with the same run_id as the Kedro pipeline run.
        """
       # FIRST, get the session
        session = get_current_session()
        context= session.load_context()
        config_loader= context.config_loader()

       # SECOND, initialize the spark session
        # Load the spark configuration in spark.yaml using the config loader
        parameters = config_loader.get("spark*", "spark*/**")
        spark_conf = SparkConf().setAll(parameters.items())

        # Initialise the spark session
        spark_session_conf = (
            SparkSession.builder.appName(self.package_name)
            .enableHiveSupport()
            .config(conf=spark_conf)
        )
        _spark_session = spark_session_conf.getOrCreate()
        _spark_session.sparkContext.setLogLevel("WARN")

       # THIRD, log in mlflow (no need to use "start_run" since kedro-mlflow hook has already be executed just before
        mlflow.log_xxx("xxx")

This implies recreating the context, but can you tell me it it suits your need?

stephanecollot · 2022-03-18T09:57:28Z

Thanks for this very detailed and interesting answer.

To be more specific I would like to log in MLflow the spark configuration and spark application id.

I'm going to try your solution 2.
I'm using kedro==0.17.4

Galileo-Galilei · 2022-03-28T18:41:52Z

Hi @stephanecollot did you manage to make it work? As stated above, this is not really a bug and I can't do anything so I'll close this issue, but I can help you to achieve what you want.

stephanecollot · 2022-03-28T22:17:35Z

Hi,

Thank you a lot, it works like a charm!

Cheers

Galileo-Galilei closed this as completed Mar 29, 2022

This was referenced Apr 12, 2022

Remove logic to activate and deactivate KedroSession kedro-org/kedro#1431

Closed

[KED-2143] Adding a ConfigLoader instance into hook specs params kedro-org/kedro#506

Closed

Galileo-Galilei moved this to ✅ Done in kedro-mlflow roadmap Oct 29, 2024

Galileo-Galilei added this to kedro-mlflow roadmap Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setup mlflow before KedroContext #292

Setup mlflow before KedroContext #292

stephanecollot commented Mar 10, 2022

stephanecollot commented Mar 10, 2022

Galileo-Galilei commented Mar 15, 2022

stephanecollot commented Mar 18, 2022

Galileo-Galilei commented Mar 28, 2022

stephanecollot commented Mar 28, 2022

Setup mlflow before KedroContext #292

Setup mlflow before KedroContext #292

Comments

stephanecollot commented Mar 10, 2022

stephanecollot commented Mar 10, 2022

Galileo-Galilei commented Mar 15, 2022

A bit of history

On execution order

Potential solutions

Solution 1: Keep your custom context and log in mlflow in a hook

Solution 2: Move everything inside a hook

stephanecollot commented Mar 18, 2022

Galileo-Galilei commented Mar 28, 2022

stephanecollot commented Mar 28, 2022