Skip to content

Add sort and ascending arguments to value_counts #5340

@ajpotts

Description

@ajpotts

Summary

Add pandas-compatible sort and ascending arguments to Arkouda-backed value_counts implementations to align behavior with pandas.Series.value_counts.

Motivation

In pandas, value_counts sorts by frequency by default (sort=True, ascending=False).
Current Arkouda-backed implementations return results in server-dependent order, which:

  • diverges from pandas expectations,
  • makes results harder to reason about,
  • forces tests to ignore ordering.

Adding explicit sorting controls improves API parity while preserving a path to efficient implementations.

Pandas reference behavior

Series.value_counts(
    normalize=False,
    sort=True,
    ascending=False,
    bins=None,
    dropna=True,
)
  • sort=True: sort by count
  • ascending=False: highest counts first
  • sort=False: preserve order of appearance (after dropna)

Proposed API

Add arguments to all Arkouda-backed arrays that implement value_counts:

def value_counts(
    self,
    dropna: bool = True,
    sort: bool = True,
    ascending: bool = False,
):
    ...

Semantics

  • Default behavior matches pandas (sort=True, ascending=False)
  • Sorting is performed client-side on the (typically small) (keys, counts) result
  • When sort=False, document that ordering reflects server return order (not guaranteed to be first-seen order)

Scope

  • ArkoudaArray.value_counts
  • ArkoudaStringArray.value_counts
  • ArkoudaCategoricalArray.value_counts
  • Docstrings and user-facing documentation
  • Unit tests

Testing

  • Verify default ordering matches pandas for representative inputs
  • Verify sort=False preserves unsorted output (order-insensitive comparison)
  • Verify interaction with dropna

Non-goals

  • No requirement to implement stable "first-seen" ordering when sort=False
  • No server-side sorting changes in this ticket

Acceptance criteria

  • API matches pandas defaults
  • Tests cover sorted and unsorted cases
  • Documentation clearly states ordering guarantees

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions