Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Configuration Management] Allow config in config or config in code. #3807

Closed
sheldontsen-qb opened this issue Apr 12, 2024 · 4 comments
Closed
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@sheldontsen-qb
Copy link
Contributor

sheldontsen-qb commented Apr 12, 2024

Description

Was just thinking that it would be great to let users pick between config in code vs config in yaml. What I mean by this is:

input_obj = KedroDataset(...) 
output_obj = KedroDataSet(...)

@dataclass
class MyParam:
   a = ...
   b = ...
my_param = MyParam()

Pipeline(
   [
      Node(func, ["input", "params:my_param", "output") # currently string, how it works right now
      Node(func, [input_obj, my_param], output_obj) # actual objects
   ]
)

It's not too many lines of code changed (I monkeypatched a few files just to check), but would also add another dimension of flexibility on how to use kedro. Giving option back to users to decide how they would like to manage their configuration.

Context

Right now I cannot use dataclasses to manage my configuration, which is a reasonable pattern to want to use. I've seen preferences of fully leveraging IDE, vs the mindset that there is clear separation between parameters vs code. I believe kedro should offer this type of flexibility to end users while still prescribing a preferred default.

Possible Implementation

I did monkeypatch a few files and got a small prototype working awhile back. Basically where kedro does a catalog.load for the string. For starters, we can allow the following:

    for name in node.inputs:
        hook_manager.hook.before_dataset_loaded(dataset_name=name, node=node)
                
        if isinstance(name, string):
           inputs[name] = catalog.load(name)   # original catalog.load(value) logic then pass to the node
        elif isinstance(name, AbstractDataset)
          inputs[str(name)] = name.load()  # call the dataset load method
        elif isinstance(name, dataclass)
          inputs[str(name)] = name            # pass directly to the node
        elif isinstance(name, Any)
          inputs[str(name)] = name            # tbh no difference from top unless we want to do something custom
        else:
           raise Exception("only string, AbstractDataSet, dataclass allowed for now?") # not sure if we want limiting or not

Changes can be made here:

inputs[name] = catalog.load(name)

Note the last elif basically means anything goes through, so I can define any object in any function used by a node and it cleanly gets passed through. Limiting to dataclasses also seems incomplete.

To make it flexible you could move this mapping of object to load_func to settings.py so users can always handle this themselves instead of putting if..elif in the kedro codebase.

Possible Alternatives

N/A

@sheldontsen-qb sheldontsen-qb added the Issue: Feature Request New feature or improvement to existing feature label Apr 12, 2024
@datajoely
Copy link
Contributor

Prototype from @limdauto a few years ago along the same lines: https://web.archive.org/web/20210921071139/https://kedrozerotohero.com/experiments/define-data-catalog-using-python

@datajoely
Copy link
Contributor

@benhorsburgh also had a similar idea about marshaling / unmarshaling parameters with Pydatntic

@sheldontsen-qb
Copy link
Contributor Author

Yes, this could also at the same time be used to unpack parameters into a kedro node so users have a way to avoid writing:


def my_func(df, p):
    p1 = p.get("p1")
    ...

@astrojuanlu
Copy link
Member

Thanks for opening this @sheldontsen-qb ! This has come up a few times so it's good to start collecting some use cases. I'm moving this to Discussions to also link it to a similar one we had not so long ago #3788

@kedro-org kedro-org locked and limited conversation to collaborators Apr 12, 2024
@astrojuanlu astrojuanlu converted this issue into discussion #3808 Apr 12, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
None yet
Development

No branches or pull requests

3 participants