-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to Specify a "root" for DataCatalog.from_config
#2965
Comments
To note, even if using a |
DataCatalog.from_config
xref #1934 |
cross reference an user question - pinging the magic bot to convert this into linen link...
|
In our internal discussions it was noted by @yetudada that, regardless of what name we give to this parameter, we would still be making
@yetudada pointed out that Intake has a Personally I don't think it would be productive to rehash the discussion we already had, since we explored and evaluated a number of options and no new ideas have been presented. It might be the case that no perfect, clean solution exists. On the other hand, my initial concern that the list of hardcoded key names in the session is still not transparent #2965 (comment) hasn't been addressed yet. As a reminder, it is kedro/kedro/framework/context/context.py Lines 91 to 92 in bbed0f1
So, to make |
Another comment from @Yetunde: #2819 (comment)
|
Forgive me for rehashing as I try to wrap my head around this, but if I understand things correctly, it seems like the core of this issue is that the code for loading the data catalog doesn't have access to the project configuration, or at least sometimes it will not? And this is a deliberate design decision because the goal is to decouple the catalog instantiation from the runtime? It seems this is most often going to be an issue with I can't say I fully understand what's going on under the hood, but generally |
hi @jasonmhite , thanks for chiming in:
that's correct. We're trying to solve for a set of requirements that is very difficult to satisfy, namely:
|
@astrojuanlu Is it possible to push it into the Dataset implementation? You could extend the base class to add a member that tries to resolve the project root directory if it is available from the context , otherwise resolve to the cwd. Then it becomes the responsibility of the implementation to actually use that base directory. It wouldn't fix the issue automatically, but at least it would then be possible to update a dataset implementation to handle things correctly and wouldn't break anything. |
We discussed this again today with @yetudada, @idanov and @merelcht. This is how the error manifests: @jasonmhite we considered your idea but we thought that we'd rely on all datasets conforming to an implicit convention, which might introduce confusion the moment someone doesn't implement that extra parameter for the datasets. Historical perspective: the paths are converted to absolute because of https://jira.quantumblack.com/browse/KED-1796 (internal link)
More relevant issues: #412 We decided to try to improve the error message in this case, and if we do add a parameter then give it a name that relates to the Current Working Directory. Note that this error doesn't even appear if |
Decision turned into #3248, closing this ticket. |
Description
Related #2924 (comment)
Context
Currently using
context.catalog
(framework) wlil do some magic path conversion. When user useDataCatalog
alone they don't have the ability to define where is the "root" directory.It contradicts with
kedro
suggestingnotebooks/
anddata
. Because if notebook is insidenotebooks
, you would need to use../
inside thecatalog.yml
. This is bad because it prevent thecatalog.yml
to be reused and assume that all the notebook will has to be in the same folder (preventing multi-users sharing 1 catalog)Requirements
This should be:
Consider:
Possible Implementation
See detail discussion
the new_argument could be name
data_source
,root
etc (undecided, this should be taken into consideration when implemented)#2924 (comment)
Possible Alternatives
The text was updated successfully, but these errors were encountered: