-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX-#6793: use pandas.api.types.pandas_dtype
instead of np.dtype
for some more places
#6794
Conversation
9569c46
to
3203fac
Compare
new_dtype = dtype | ||
new_dtype = pandas.api.types.pandas_dtype(dtype) | ||
if hasattr(dtype, "_is_materialized"): | ||
_ = dtype._materialize_categories() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case dtype
cannot be serialized by Dask.
new_dtype = pandas.api.types.pandas_dtype(dtype) | ||
if hasattr(dtype, "_is_materialized"): | ||
_ = dtype._materialize_categories() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe like this? pandas_dtype
returns exactly the same object if a categorical is passed
>>> cat = pd.CategoricalDtype([1, 2])
>>> pd.api.types.pandas_dtype(cat) is cat
True
new_dtype = pandas.api.types.pandas_dtype(dtype) | |
if hasattr(dtype, "_is_materialized"): | |
_ = dtype._materialize_categories() | |
if isinstance(dtype, LazyCategoricalProxy): | |
new_dtype = dtype | |
else: | |
new_dtype = pandas.api.types.pandas_dtype(dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In LazyCategoricalProxy
case, new_dtype
will remain with non-materialized categories, which will lead to a problem with Dask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we then add a dask specific branching here and label it as a hack? Otherwise, materializing lazy categories without a need, breaks the whole sense of them being lazy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good, left a few comments
new_dtype = pandas.api.types.pandas_dtype(dtype) | ||
if hasattr(dtype, "_is_materialized"): | ||
_ = dtype._materialize_categories() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we then add a dask specific branching here and label it as a hack? Otherwise, materializing lazy categories without a need, breaks the whole sense of them being lazy
@dchigarev ready for review |
@dchigarev ping, I think all your comments have been answered. |
…some more places in Modin code Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
What do these changes do?
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
pandas.api.types.pandas_dtype
instead ofnp.dtype
in all Modin codebase to allow work with pandas dtypes #6793docs/development/architecture.rst
is up-to-date