-
Notifications
You must be signed in to change notification settings - Fork 97
Open
Milestone
Description
Summary
Add pandas-compatible sort and ascending arguments to Arkouda-backed value_counts implementations to align behavior with pandas.Series.value_counts.
Motivation
In pandas, value_counts sorts by frequency by default (sort=True, ascending=False).
Current Arkouda-backed implementations return results in server-dependent order, which:
- diverges from pandas expectations,
- makes results harder to reason about,
- forces tests to ignore ordering.
Adding explicit sorting controls improves API parity while preserving a path to efficient implementations.
Pandas reference behavior
Series.value_counts(
normalize=False,
sort=True,
ascending=False,
bins=None,
dropna=True,
)sort=True: sort by countascending=False: highest counts firstsort=False: preserve order of appearance (after dropna)
Proposed API
Add arguments to all Arkouda-backed arrays that implement value_counts:
def value_counts(
self,
dropna: bool = True,
sort: bool = True,
ascending: bool = False,
):
...Semantics
- Default behavior matches pandas (
sort=True,ascending=False) - Sorting is performed client-side on the (typically small)
(keys, counts)result - When
sort=False, document that ordering reflects server return order (not guaranteed to be first-seen order)
Scope
ArkoudaArray.value_countsArkoudaStringArray.value_countsArkoudaCategoricalArray.value_counts- Docstrings and user-facing documentation
- Unit tests
Testing
- Verify default ordering matches pandas for representative inputs
- Verify
sort=Falsepreserves unsorted output (order-insensitive comparison) - Verify interaction with
dropna
Non-goals
- No requirement to implement stable "first-seen" ordering when
sort=False - No server-side sorting changes in this ticket
Acceptance criteria
- API matches pandas defaults
- Tests cover sorted and unsorted cases
- Documentation clearly states ordering guarantees
Metadata
Metadata
Assignees
Labels
No labels