Skip to content

Commit

Permalink
[SPARK-49718][PS] Switch Scatter plot to sampled data
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?
Switch `Scatter` plot to sampled data

### Why are the changes needed?
when the data distribution has relationship with the order, the first n rows will not be representative of the whole dataset

for example:
```
import pandas as pd
import numpy as np
import pyspark.pandas as ps

# ps.set_option("plotting.max_rows", 10000)
np.random.seed(123)

pdf = pd.DataFrame(np.random.randn(10000, 4), columns=list('ABCD')).sort_values("A")
psdf = ps.DataFrame(pdf)

psdf.plot.scatter(x='B', y='A')
```

all 10k datapoints:
![image](https://github.com/user-attachments/assets/72cf7e97-ad10-41e0-a8a6-351747d5285f)

before (first 1k datapoints):
![image](https://github.com/user-attachments/assets/1ed50d2c-7772-4579-a84c-6062542d9367)

after (sampled 1k datapoints):
![image](https://github.com/user-attachments/assets/6c684cba-4119-4c38-8228-2bedcdeb9e59)

### Does this PR introduce _any_ user-facing change?
yes

### How was this patch tested?
ci and manually test

### Was this patch authored or co-authored using generative AI tooling?
no

Closes apache#48164 from zhengruifeng/ps_scatter_sampling.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
  • Loading branch information
zhengruifeng authored and dongjoon-hyun committed Sep 19, 2024
1 parent 92cad2a commit 6d1815e
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion python/pyspark/pandas/plot/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -479,7 +479,7 @@ class PandasOnSparkPlotAccessor(PandasObject):
"pie": TopNPlotBase().get_top_n,
"bar": TopNPlotBase().get_top_n,
"barh": TopNPlotBase().get_top_n,
"scatter": TopNPlotBase().get_top_n,
"scatter": SampledPlotBase().get_sampled,
"area": SampledPlotBase().get_sampled,
"line": SampledPlotBase().get_sampled,
}
Expand Down

0 comments on commit 6d1815e

Please sign in to comment.