Skip to content

list.sort enhancement proposal: Adaptivity for binarysort #138946

@dg-pb

Description

@dg-pb

Feature or enhancement

Proposal:

Note:

  1. This is POC that this has a observable impact.
  2. This is subject to further calibration and optimizations, but high level concept is dicusable.
  3. Will issue PR shortly.

Rationale

Galloping provides adaptivity when merging runs.
However, underlying binarysort always does O(nlogn).

Concept

The concept is simple:

  1. binarysort optionally does adaptive routine.
    1. It has a mechanism to switch it off during the run and go to simple binarysort
    2. It returns 1 if it has completed full data with binary routine and 0 otherwise
  2. timsort calls binarysort
    1. If it returns 1, then use it again next time
    2. If it returns 0, then use binarysort without adaptivity next time
    3. increase number of simple binarysort runs before next attempt of adaptive run with every 0 returned

High level results

  • A0 is current
  • A1 is with adaptivity
  • P means performance / runtime
  • C means comparison count
  • Aggregate numbers for all datasets combined are in the title
  1. Plain integers and floats
Image
  1. Integers and floats wrapped in list, so comparisons are __lt__ calls, thus more expensive
Image

So the benefit is higher when comparisons cost more.
But there is some benefit for optimized comparisons as well.

Worst case

  • Adaptive binarysort does ~20% more comparisons than simple binarysort for random data.
  • However, it detects that it is not paying off after about 30 iterations.

Thus the worst case would be random data up to size 30, which would be 20% slower.
Size 60 would only be 10% slower.
Size 120 would only be 5% slower.
And so on...

Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

Links to previous discussion of this feature:

https://discuss.python.org/t/sorting-adaptivity-improvement/103700

Linked PRs

Metadata

Metadata

Assignees

Labels

interpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagetype-featureA feature request or enhancement

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions