-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setup mlflow before KedroContext #292
Comments
If you have any idea for a workaround, feel free to share it! |
Hi @stephanecollot, sorry for the response delay. Actually this is a tricky question and I don't have a clear solution, but I might guide you to find a workaround which may suits your use case. A bit of historyIn earlier
With In general, it is no longer recommended to add custom logic inside the context BUT their documentation still suggest that it is the recommended way to initialise a spark context. I think it is no longer the case, but the reason while they are doing this instead of a hook is because they need to access a On execution orderWhen launching Potential solutionsSolution 1: Keep your custom context and log in mlflow in a hookCreate a custom hook: class MyMlflowHook:
"""Namespace for grouping all model-tracking hooks with MLflow together."""
@hook_impl
def before_pipeline_run(self, run_params: Dict[str, Any]) -> None:
"""Hook implementation to start an MLflow run
with the same run_id as the Kedro pipeline run.
"""
mlflow.log_xxx("<whatever>") You can register it in This will be triggered after the ProjectContext initialisation, but it may feels a uncomfortable to navigate between the context and the hook to log what you need. Solution 2: Move everything inside a hookclass SparkMlflowHook:
"""Namespace for grouping all model-tracking hooks with MLflow together."""
@hook_impl
def before_pipeline_run(self, run_params: Dict[str, Any]) -> None:
"""Hook implementation to start an MLflow run
with the same run_id as the Kedro pipeline run.
"""
# FIRST, get the session
session = get_current_session()
context= session.load_context()
config_loader= context.config_loader()
# SECOND, initialize the spark session
# Load the spark configuration in spark.yaml using the config loader
parameters = config_loader.get("spark*", "spark*/**")
spark_conf = SparkConf().setAll(parameters.items())
# Initialise the spark session
spark_session_conf = (
SparkSession.builder.appName(self.package_name)
.enableHiveSupport()
.config(conf=spark_conf)
)
_spark_session = spark_session_conf.getOrCreate()
_spark_session.sparkContext.setLogLevel("WARN")
# THIRD, log in mlflow (no need to use "start_run" since kedro-mlflow hook has already be executed just before
mlflow.log_xxx("xxx") This implies recreating the context, but can you tell me it it suits your need? |
Thanks for this very detailed and interesting answer. To be more specific I would like to log in MLflow the spark configuration and spark application id. I'm going to try your solution 2. |
Hi @stephanecollot did you manage to make it work? As stated above, this is not really a bug and I can't do anything so I'll close this issue, but I can help you to achieve what you want. |
Hi, Thank you a lot, it works like a charm! Cheers |
Hello,
I have a custom KedroContext (where I initialise a Spark session) and I would like to log things into mlflow at this moment.
But if I log things there, it doesn't take into consideration my mlflow.yml parameters.
I tried to call in my KedroContext:
But I got the following error:
RuntimeError: There is no active Kedro session
The text was updated successfully, but these errors were encountered: