Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Array not CoreArray (round 2) #128

Merged
merged 16 commits into from
Sep 21, 2022
Merged

Conversation

TomNicholas
Copy link
Member

Follow on from #124.

I'm trying to type the return values so that mypy can verify for me that the correct type is being returned.

I've successfully typed the array creation functions module, but when I got to the ops module I became confused as to how the class design was supposed to work.

You have

# core/array.py
class CoreArray
    def rechunk(self: ?, chunks) -> ?:
        from cubed.core.ops import rechunk

        return rechunk(self, chunks)


# array_api/array_object.py
def rechunk(x: ?, chunks, ...) -> ?:
    return ...

How do you want this to work? Your private subclass is defining public API here. Typing wise this requires that we make rechunk pass types through unchanged, so CoreArray.rechunk returns CoreArray and Array.rechunk returns Array. This seems not helpful for ensuring that .rechunk as seen by the user always returns an Array.

I've used rechunk as an example here but I think you have a similar typing problem for many of the functions in ops.py, and one of the map_* functions returning a CoreArray is ultimately what's causing from_array to return a CoreArray (i.e. the specific bug I reported in #123).

Is there some obvious way of rewriting this that would ensure we get the correct type back in the public API?

@TomNicholas
Copy link
Member Author

TomNicholas commented Sep 13, 2022

Is there some obvious way of rewriting this that would ensure we get the correct type back in the public API?

Could maybe just have

ChunkedArray = TypeVar("ChunkedArray", bound=CoreArray)  # either Array or CoreArray

def rechunk(x: ChunkedArray, chunks, ...) -> ChunkedArray:
    return type(x)(name, target, spec, plan)

but it doesn't guarantee that we get back Array.

@tomwhite
Copy link
Member

Is the problem trying to satisfy mypy? It seems that the runtime types are all Array, as expected.

I wonder if it would help to move the rechunk method from CoreArray to Array?

@TomNicholas
Copy link
Member Author

TomNicholas commented Sep 20, 2022

Is the problem trying to satisfy mypy?

The problem was that I was unconvinced that this

It seems that the runtime types are all Array, as expected.

was actually true. So I was thinking about how to design the classes in such a way that this was always true. Adding type hints was just to help me work out if this was true or not.

However I've just realised I was being thrown off by the fact that Array.__repr__ misleadingly prints "CoreArray" 😅 (Now fixed.)

I think I'm now happier that I'm no longer going to randomly find my xarray object is wrapping a CoreArray.

I wonder if it would help to move the rechunk method from CoreArray to Array?

Maybe - rechunk is weird because it's the only function in ops.py that might accept and return CoreArray objects in addition to Array objects. All the rest* of the functions there are now type hinted to only accept and return Array objects, which is what we want in order to guarantee no surprises in public API. Now the bug is fixed this is less important though, and I think this can be merged as is.

*(except blockwise because dask's API choice is weird - see note in code)

Copy link
Member

@tomwhite tomwhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the reprs - this would definitely be confusing! (I had tried some isinstance tests to convince myself the right types were being returned before.)

This almost ready to go in - just a few naming things in the comments.

cubed/array_api/creation_functions.py Outdated Show resolved Hide resolved
cubed/core/array.py Outdated Show resolved Hide resolved
cubed/tests/test_array_api.py Outdated Show resolved Hide resolved
@TomNicholas
Copy link
Member Author

Great - fixed all those things.

@tomwhite tomwhite merged commit 766cfd0 into cubed-dev:main Sep 21, 2022
@TomNicholas TomNicholas deleted the exposeArray2 branch September 21, 2022 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants