Reduce amount of data for DataFrames by sampling #21011

uprokevin · 2023-06-10T02:19:59Z

On Side Panel, variable visualizer,
When clicking on large dataframe or large dictionnary, Panel and Spyder freezed.

Suggest the workaround : 
    nmax= 50000
    On Click Visualize(   dfbig.sample( n = min( len(dfbig, max_df))    , replace=False ) 

Suppose len(dfbig) = 1 million ...
It will sample the dataframe with nmax= 50000 values. and Spyder does not crash...

Same for list 
    On Click Visualize (    listbig[:nmax. )

Reference:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sample.html

Thanks !

The text was updated successfully, but these errors were encountered:

ccordoba12 · 2023-06-11T04:52:06Z

Hey @pprokevin, thanks for reporting. Could you post a video or animated gif that shows Spyder freezing after opening a big dataframe or dictionary?

I just tested with a one million row/single column dataframe, and Spyder didn't freeze for me.

uprokevin · 2023-06-13T08:28:08Z

Does it handle visualization of
10 million rows with 560 columns in string ?

believe sub-sampling is simple and efficient way to reduce load in visualization ...

ccordoba12 · 2023-06-13T16:17:58Z

Does it handle visualization of
10 million rows with 560 columns in string ?

That depends on the amount of memory available in your computer, not on Spyder. That's because we need to make a copy of the dataframe in the IPython console kernel to send and display it in Spyder (which runs in a different process).

believe sub-sampling is simple and efficient way to reduce load in visualization ...

Sure, this is a good idea too. Thanks for the suggestion, I didn't know about it. We'll try to implement it in Spyder 6.

uprokevin · 2023-06-14T08:44:10Z

Thanks for considering it.
Think visualizing 1 million rows table does not make much sense for human...
At max 100,000 rows would handl most use visualization use cases ( ie find pattern, wrong columns)
and reduce memory footprint a lot.

ccordoba12 self-assigned this Jun 11, 2023

ccordoba12 added the status:Awaiting Followup label Jun 11, 2023

ccordoba12 changed the title ~~Feature: Variable Explorer : reduce amount of data visualized for DataFrame by sampling~~ Reduce amount of data for DataFrames by sampling Jun 13, 2023

ccordoba12 added type:Enhancement component:Variable Explorer and removed status:Awaiting Followup labels Jun 13, 2023

ccordoba12 modified the milestones: v6.0.1, v6.0alpha3 Jun 13, 2023

ccordoba12 modified the milestones: v6.0alphaX, v6.1.0 Nov 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce amount of data for DataFrames by sampling #21011

Reduce amount of data for DataFrames by sampling #21011

uprokevin commented Jun 10, 2023 •

edited

Loading

ccordoba12 commented Jun 11, 2023

uprokevin commented Jun 13, 2023

ccordoba12 commented Jun 13, 2023

uprokevin commented Jun 14, 2023

Reduce amount of data for DataFrames by sampling #21011

Reduce amount of data for DataFrames by sampling #21011

Comments

uprokevin commented Jun 10, 2023 • edited Loading

ccordoba12 commented Jun 11, 2023

uprokevin commented Jun 13, 2023

ccordoba12 commented Jun 13, 2023

uprokevin commented Jun 14, 2023

uprokevin commented Jun 10, 2023 •

edited

Loading