-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#911 fix #945
#911 fix #945
Conversation
Thanks for taking this on @endremborza, it looks like my old PR was broken after the repository transferred ownership over to ydataai. Would you mind adding a unit test for this? I believe the current implementation will fail for the following: data = [1,2,3,4]
weights = [1,1,1,100]
weighted_median(data, weights)
-> TypeError: only integer scalar arrays can be converted to a scalar index The issue arises on this line
Because data is still technically a list and can't be indexed by a boolean array. I'm not particularly familiar with this function but according to the doc notes it's intended to support lists and arrays, if we are forced into coercion to accomplish that, might as well make it explicit and take advantage of numpy functionality everywhere. Something like if not isinstance(data, np.ndarray):
data = np.array(data)
if not isinstance(weights, np.ndarray):
weights = np.array(weights)
s_data, s_weights = map(np.sort, [data, weights])
midpoint = 0.5 * np.sum(s_weights)
if s_weights[-1] > midpoint:
w_median = data[weights == np.max(weights)][0]
else:
cs_weights = np.cumsum(s_weights)
idx = np.where(cs_weights <= midpoint)[0][-1]
if cs_weights[idx] == midpoint:
w_median = np.mean(s_data[idx : idx + 2])
else:
w_median = s_data[idx + 1]
return w_median The type signature should also be fixed, you could go with something like this from typing import Union
def weighted_median(data: Union[list, np.ndarray], weights: Union[list, np.ndarray]) -> Union[int, float]: I'm not 100% sure about the return signature but the base implementation certainly supports float returns for float Finally, this isn't an issue for you @endremborza but @sbrugman. Who is the primary contact / maintainer at ydataai? It doesn't look like the automated tests are running on PRs anymore based on this specific instance. |
and I'm quite positive that there is no way should a unit test still cover this one function with arrays? |
Doesn't the current PR propose to solve issue #911 by casting weights to np.array? Am I missing something? |
Yes, but never from a list, but from a |
Exactly! So either way you're casting right? List, pandasArray, whatever doesn't really matter. |
Alright, but for the What you're suggesting is more explicit though, and if that's more important I'm fine with that. As it usually is in OS, someone has to make a decision, but its not straightforward, who 🙂 |
Just a suggestion, the above solution is strictly more efficient than the PR in a few ways:
I'm just a contributor as well though :). Feel free to take it or leave it! |
Oh sorry, fair enough. I didn't acknowledge the rest of the changes only the 2 casting lines, as I didn't have the original function to compare it to. I just applied it |
@endremborza and @ieaves Thanks a ton for your help :) |
Seems fixed!
@endremborza this would be my suggestion |
Hello.
This should fix #911
sorry, as this is sort of a duplicate of #928 but I couldn't make commitlint work there, so I just opened a new one.
I modified a test a little, as it seems to me some versions of pandas secondary sort on index, and that is what messed up the tests on #928
also, pre-commit seems all kinds of broken, but I linted manually best I could