-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track the globals parameters used in the DataCatalog when using the TemplatedConfigLoader #253
Comments
Hello @nblumoe, I have the very same use case for a while and I have been thinking on how to make this possible but this is quite hard for several reasons :
In a nutshell, I plan to address this in the future, but I have other priorities at the moment for release 0.8.0: I really want to improve the model serving through the plugin since it seems to be a more demanded feature. I can't give an exact timeline, but I don't see this feature be implemented before several months. |
Thanks for looking into this! Does your first bullet point indicate that kedro should be able to handle params in the catalog? I didn't have luck with this yet: # parameters.yml
timestamp: 2021-10-13
# catalog.yml
# reduced to essential data, this is not a complete catalog entry
my_data:
sql: select * from my_table where timestamp = ${params.timestamp}
|
Oh sorry I thought you were already using this feature from Kedro. The object you are looking for is the # globals.yml <- this is what you are looking for
timestamp: 2021-10-13 # catalog.yml
# reduced to essential data, this is not a complete catalog entry
my_data:
sql: select * from my_table where timestamp = ${timestamp} # You have the right syntax The problem for mlflow tracking is that I do not want to log your entire |
Some good news: after some trials and errors, I think I have found a way to make it work. However, to avoid migration costs, I will only implement this feature after |
I will implement this feature, but only after kedro move to |
Hello, Problem
Therefore: We can't track a global as explained in FAQ re TemplatedConfigLoader unless it's an input parameter of a node (which is not always the case) Use-caseI set a dataset selector in Current solutionI had to hack a bit:
Desired behaviourIt would be great if I could set somehow extra parameters (e.g. the ones set by globals that control Example: Define a Alternatively: track all parameters in the catalog even if not used in a node? |
Hi @kalofolias, sorry for the late reply. I'd be really happy to make it work, because this annoys me too. I just did not find a way to do it properly. A Tracking all the parameters does not seem to be the right default, but maybe I shoudl add the possibility to "opt in" to this solution in case someone really wants it since we have no other solution for now. |
Current state:
|
Description
Allow parameters to be used in the catalog and track them to MLflow.
Context
Some data sources might be parameterised (e.g. via SQL
SELECT * FROM my_data WHERE date = <DATE-PARAM>
) and this should get tracked to MLflow too.Possible Implementation
Instead of just checking for params usage on Nodes, kedro-mlflow would also need to track params being used elsewhere. Could it just track all params, independently from where they are used.
I am not sure if kedro even allows such parameterised data sources in the catalog, thus this might required an upstream change on kedro first.
The text was updated successfully, but these errors were encountered: