-
Notifications
You must be signed in to change notification settings - Fork 793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Add Support for pd.Index, pd.Series, range to array type #3237
Conversation
Thanks for working on this @ChiaLingWeng ! And for all you contributions recently! This will be a convenient addition. I am not sure where would be the best place for this logic, and don't have time to dive deeper myself right now, but others might have thoughts on this. Regarding the logic, I wonder if we can make it more general to support any iterable (anything that can be converted into a list). This would allow us to support other dataframe libraries to as well as tuples, numpy arrays, etc, without special tests for each. I think all values that altair can take pass to altair is either a string (column names), dictionary (alt.value, alt.condition, etc), or class (alt.Color, alt.X, etc). So maybe we can just do something like: if ~isinstance(v, str | dict | type): # type is for class
list(v) # this works with any iterable, which includes series, indexes, arrays, etc This throws an informative error too pointing out that the object is not an iterable. There might be some grave oversight here because I haven't tried things out, but the general idea of making this solution more general would be helpful I think. |
@joelostblom Thanks for your suggestion! |
Would elif hasattr(v, '__iter__'):
kwds[k] = list(v) be sufficient? Or is this too loose? |
|
Good points @joelostblom and @binste! My first thought that it was OK to have a import altair as alt
import pandas as pd
data = {'x': ['a', 'b', 'c', 'd', 'e', 'f'], 'y': [5, 3, 8, 4, 6, 2]}
df = pd.DataFrame(data)
alt.Chart(df).mark_bar().encode(
x=alt.X('x', sort=list('cdfabe')), # `sort='cdfabe'` can be come ['c', 'd', 'f', 'a', 'b', 'e']
y='y'
)
alt.Chart(df).mark_bar().encode(
x=alt.X('x', sort='y'), # `sort='y'` should not become ['y']
y='y'
) Probably something you already realised, @joelostblom. |
Yes, exactly, that's the reason we need to avoid catching strings in this check. I would be in favor of developing our own types if there is a clear benefit over writing something like |
Not sure if this is strict enough, but from this discussion, I found pd.api.types.is_list_like() may be useful. |
Thanks @ChiaLingWeng! That seems to be the behaviour we are after here as well, but then without depending on pandas itself. Maybe we can use the same implementation as adopted within pandas, but to me its not really obvious how this function is implemented (I only can find this). @joelostblom, yes I was thinking to define these as a pubic type definitions that also can be used by users prior pushing their data into Altair. |
Nice find @ChiaLingWeng ! I agree with @mattijn that we if we could re-implement this without having to import pandas that would be great (since it would help us make pandas an optional dependency for altair in the future). I believe the actual pandas implementation is here because it is a Cython function. It seems to me that they use a similar approach to what we discussed here; essentially checking if it is an from typing import Iterable
if not isinstance(v, str | dict | type | bytes) and isinstance(v, Iterable): # type is for class
v = list(v) We could also do a |
@ChiaLingWeng I'm just checking in if you're interested in continue working on this PR. It doesn't have to be right now, but let us know if you're planning to take it up again in the future. |
@joelostblom It's been a while since I checked on this, but yes, I'm willing to work on this PR. Let me know if you have any suggestions. |
FYI this PR will be the main source of conflicts, so be sure to check it out to see how to handle |
Thanks @ChiaLingWeng! As @dangotbanned mentioned, altair no longer requires pandas, so we will need to make sure that the implementation here is not using any from typing import Iterable
if isinstance(v, Iterable) and not isinstance(v, str | dict):
v = list(v) This means that all iterable objects that can be converted to list should be, except if a string (column name) or dict (alternative helper class definition) is passed. Implementing this and then running the test suite to see if anything fails would be the first step. After that, we can consider if we want to make this more specific and only do the conversions for keywords that accept iterables, such as |
|
Interesting, we would not only want it to work for data frame / series-like objects though, but any iterable, including tuples, ranges, arrays, etc. |
Apologies I think the language I used wasn't super helpful. for k, v in list(kwds.items()):
if k not in (list(ignore) + ["shorthand"]):
if isinstance(v, (pd.Series, pd.Index)): # <--- this part
kwds[k] = v.to_list()
elif isinstance(v, range):
kwds[k] = list(v)
else:
kwds.pop(k, None) I also wrongly assumed the changes were to be made in altair/tools/schemapi/schemapi.py Lines 487 to 516 in 781c507
|
@ChiaLingWeng i think we have a good solution to this issue in #3501, so I will close this PR as superseded by that one. Thank you again for your work and contributions here. |
try to deal with #2808 and #2877
add
to convert to array type if it's pd.Series, pd.Index or range.
Not sure if it's right approach to convert inside to_dict() but not before (maybe in _todict?)
It can work on some test cases mentioned: