Skip to content

VideoDataSet with GeneratorVideo save is broken #232

Open
@daniel-falk

Description

@daniel-falk

Description

I wasn't sure if I should put this issue here or in the Kedro core repo since the VideoDataSet has been broken after a change in Kedro core.

Shortly, the video dataset has multiple backends that can be saved. One of them is the GeneratorVideo which is an Iterable.

In commit fcf3ab4a9 "Enable the usage of generator functions in nodes (#2161)" by @idanov the runner functionality was changed to handle generator datasets.

Snippet from _run_node_sequential in kedro/runner/runner.py:

    items: Iterable = outputs.items()
    # if all outputs are iterators, then the node is a generator node
    if all(isinstance(d, Iterator) for d in outputs.values()):
        # Python dictionaries are ordered so we are sure
        # the keys and the chunk streams are in the same order
        # [a, b, c]
        keys = list(outputs.keys())
        # [Iterator[chunk_a], Iterator[chunk_b], Iterator[chunk_c]]
        streams = list(outputs.values())
        # zip an endless cycle of the keys
        # with an interleaved iterator of the streams
        # [(a, chunk_a), (b, chunk_b), ...] until all outputs complete
        items = zip(it.cycle(keys), interleave(*streams))

    for name, data in items:
        hook_manager.hook.before_dataset_saved(dataset_name=name, data=data, node=node)
        catalog.save(name, data)
        hook_manager.hook.after_dataset_saved(dataset_name=name, data=data, node=node)
    return node

Context

I have not had time yet to dive into the code and see what is actually happening, or how we should treat it in these situations. The effect now is that the runner will yield the first frame from the video and try to save that frame to the Video Dataset which is not possible.

The quick and dirty way to fix it would be to remove the __iter__ method from the GeneratorVideo class and implement some special logic in the VideoDataSet class to iterate it correctly. This would however not be very nice from a user perspective since iter(video) would then result in an indexed-iteration of the generator.

Steps to Reproduce

  1. Create a GeneratorVideo
  2. Create a VideoDataSet
  3. Call save on the video dataset with the generator video

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedContribution task, outside help would be appreciated!

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions