Spyder freezes when large MaskedArrays are in memory #2748

durack1 · 2015-10-08T17:05:01Z

I've experienced a long-standing issue with Spyder that dates back to Spyder 2.1.13 (and likely before) and is still present in the current 3.0.0b1. It is likely linked to Spyder interactions with the cdms2 module that uses numpy.ma for its backend array manipulation.

After loading a ~large [50,50,300,360] ~1GB matrix into memory (from a netcdf file - using cdms2.open), the responsiveness of the console disappears.. And intermittently responds to inputs. This continues to occur until the matrix is reduced in size, i.e. var = var[0:1,:,:,:]

The issue disappears when using pure numpy, so var = numpy.ma.ones([50,50,300,360]) doesn't yield the same lack of responsiveness.

The issue is not related to resource limitations on the machine (redhat6.7, 128GB, Xeon E5-2643 quad core).

As it's related to Spyder interactions with the cdms2 module, I'm not sure the best way to get to the bottom of this issue.. Following #1958 and #1968 I have adjusted the auto refresh feature of the variable explorer (turned this off) but this doesn't appear to solve the problem.

What further information is required to attempt to get to the bottom of this issue? For completeness:
Spyder version: 3.0.0-b1
Python version: 2.7.10
Qt version: 4.8.4
PyQT4 version: 4.11.3
numpy version: 1.9.0
uv-cdat/cdms2 version: 2.4.0-rc1

@dnadeau4 pinging you here..

The text was updated successfully, but these errors were encountered:

Nodd · 2015-10-08T21:12:53Z

Did you try to completely disable the variable explorer (ie close the pane) ? There is nothing special about cdms2 in particular in spyder.

durack1 · 2015-10-08T22:28:53Z

@Nodd good call.. Closing the variable explorer starts to speed things up considerably.. Ok so my issue appears to be with the variable explorer really slowing things down.. Is this noted somewhere in another open issue?

It's a pity this issue exists, because I do like having access to a visual of the variables (and their dimensions and types) that are currently in memory..

ccordoba12 · 2015-10-09T03:57:07Z

@durack1, thanks for taking the time to open this issue and letting us know about this problem.

This is a very interesting use case to improve Spyder responsiveness with big data. We have done a lot of work lately to better handle big DataFrames and NumPy arrays when they are opened.

However, the problem here could lie in the fact that we're not optimizing how the array is represented in the Variable Explorer. What I mean is that in the Value column we are just putting the full repr of the array (i.e. what's printed when you call print(array)) instead of something simpler (like it's first 10 or 20 elements).

But to test this hypothesis, I need you to give some sample code (using cdms2) I can test on my side to fix this problem :-)

durack1 · 2015-10-09T16:42:13Z

@ccordoba12 no problem, the easiest example would be just load a variable from a netcdf file using cdms2, while you have a variable explorer pane open in Spyder.

So the code:

import cdms2 as cdm
fileHandle = cdm.open('AusCOM1-0.Salt.Omon.so.00011231-00501231.nc')
var = fileHandle('so')
fileHandle.close()

Should reproduce the laggyness that I've experienced - the variable will require ~1GB available memory to load.. You can grab the file above from here (it's 355MB)

I've noted what would appear to be a rogue Python process at 100% CPU when the laggyness is occurring, so it's trying to do something.. But what I have no idea..

goanpeca · 2015-10-09T19:58:55Z

@ccordoba12 we should use that file for future tests! 😉

durack1 · 2015-10-09T20:08:17Z

@goanpeca @ccordoba12 if you would like an even larger test file, I'd be more than happy to provide this!

durack1 · 2015-10-16T04:51:49Z

@ccordoba12 @goanpeca let me know if you have any trouble getting access to cdms2 and installing this along with netcdf4.. From memory the file above uses deflation, so netcdf will need to be built against the zlib libraries too..

goanpeca · 2015-10-16T14:39:35Z

@durack1 thanks for the heads up :-)

ccordoba12 · 2015-10-16T14:49:58Z

@durack1, could you also upload a smaller nc file? I mean one that doesn't cause Spyder to freeze? That would be really helpful too :-)

durack1 · 2015-10-18T09:05:38Z

@ccordoba12 it'd have to be pretty small.. I've been experiencing this freezing issue even with smaller matrices.. I'm currently on travel but will drop a smaller file on the webserver when I'm back in a week or so..

ccordoba12 · 2015-10-18T14:05:28Z

Ok, no problem. I'm working on other things right now, but I hope to address this issue for beta2.

durack1 · 2015-10-19T12:26:03Z

@ccordoba12 great - what it the timeline for beta2? I'm not holding things up if I get this data to you next week am I?

ccordoba12 · 2015-10-19T14:14:34Z

Don't worry, we are two or three weeks away of it :-)

El 19/10/15 a las 07:26, Paul J. Durack escribió:

@ccordoba12 https://github.com/ccordoba12 great - what it the
timeline for beta2? I'm not holding things up if I get this data to
you next week am I?

—
Reply to this email directly or view it on GitHub
#2748 (comment).

durack1 · 2015-10-31T00:28:29Z

@ccordoba12 apologies for the delay.. Here is a much smaller file that doesn't appear to trigger the laggyness issues. In the example below I have re-enabled the variable explorer pane in Spyder 3.0.0b1 and it seems to work fine - it loads two variables from a netcdf file using cdms2.

So the code:

import cdms2 as cdm
fileHandle = cdm.open('DurackandWijffels_GlobalOceanSurfaceChanges_1950-2000.nc')
saltChange = fileHandle('salinity_change')
thetaoChange = fileHandle('thetao_change')

You can grab the file above from here (it's 607KB)

ccordoba12 · 2015-11-22T17:58:51Z

@durack1, please give us a smaller file than the one you uploaded first (called AusCOM1-0.Salt.Omon.so.00011231-00501231.nc).

That file seems to require 10 gigs of ram (not 1, I tested it on my virtual machines! :-), so I can't use it for testing.

durack1 · 2015-11-24T17:44:42Z

@ccordoba12 the file should be fine - you could load a smaller subset using:

>>> import resource
>>> import cdms2 as cdm
>>> import numpy as np

>>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 0.155 GB
>>> fileHandle = cdm.open('AusCOM1-0.Salt.Omon.so.00011231-00501231.nc')
>>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 0.155 GB
>>> var = fileHandle('so',time=slice(0,1))
>>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 0.155 GB
>>> fileHandle.close()
>>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 0.155 GB

Obviously then changing the number of timesteps loaded by altering your indexes (e.g. var = fileHandle('so',time=slice(0,10))) will then get progressively larger matrices loaded into memory. The single time slice example above should need just ~0.2GB or so.. For the larger matrix, you'll need ~1GB:

>>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 0.156 GB
>>> fileHandle = cdm.open('AusCOM1-0.Salt.Omon.so.00011231-00501231.nc')
>>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 0.156 GB
>>> var = fileHandle('so',time=slice(0,10))
>>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 0.706 GB
>>> fileHandle.close()
>>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 0.706 GB

For the full matrix I only need ~4GB:

>>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 0.156 GB
>>> fileHandle = cdm.open('AusCOM1-0.Salt.Omon.so.00011231-00501231.nc')
>>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 0.156 GB
>>> var = fileHandle('so')
>>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 3.239 GB
>>> fileHandle.close()
>>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 3.239 GB

Nodd · 2015-11-24T18:48:17Z

I wonder if we should show the repr of objects smaller than say 1MB, and keep a whitelist for types that we know have no problems (like pandas dataframe). It would avoid freezing or crashing spyder because of unknown big objects.

ccordoba12 · 2015-11-24T19:14:55Z

@Nodd, it's a good idea. But how to determine reliably an object's memory?

ccordoba12 · 2015-11-24T19:15:43Z

Sorry, it's not its memory the problem, but the size of its repr. How could we account for that?

durack1 · 2015-11-24T19:21:21Z

@ccordoba12 @Nodd in the case of a numpy array, why not use numpyArray.shape?

ccordoba12 · 2015-11-24T19:31:48Z

@durack1, that just gives its size :-) repr is the string printed when you do something like:

>>> a = np.array([1, 2, 3])
>>> a
array([1, 2, 3])

Most numpy repr's are very efficient, but some (like record arrays and it seems masked arrays) seem to not be :-)

durack1 · 2015-11-24T21:17:14Z

@ccordoba12 but such shape info can then be used to query a small subset of the array - using the very efficient indexing syntax - so the example above var.shape = (50,50,300,360), could then define a more targeted (and much smaller) temporary matrix in which repr can then be performed.

Nodd · 2015-11-24T22:57:44Z

We can guess that the bigger an object is, the bigger its repr could be. If we know that a type has a simple repr, we can force to always show it.

My proposal is not perfect, it's just a workaround to avoid recurring problems with the variable explorer and big objects.

As for the size, I vaguely recall that python has a sizeof equivalent. It may fail for some types, but again it can be better than nothing.

ccordoba12 · 2015-12-09T00:15:51Z

I discovered that this problem was generated because the repr of masked arrays is terribly inefficient.

So the fix is to use as the Value of masked arrays (i.e. what appears in the fourth column of the Variable Explorer) a simple string (i.e. just "Masked array") instead of the variable's repr.

And then the problem goes away :-)

durack1 · 2015-12-09T00:21:21Z

@ccordoba12 great, looking forward to testing this in beta2.. @goanpeca any chance code-folding will also find itself into beta2?

goanpeca · 2015-12-09T00:21:48Z

@durack1, not yet sorry :-(
I think it will be for beta3

durack1 · 2015-12-09T00:22:47Z

@goanpeca no problem, looking forward to testing code-folding in beta3 then!

Nodd · 2015-12-09T10:16:09Z

@ccordoba12 Maybe it would be worth opening a bug report for numpy ?

ccordoba12 · 2015-12-09T13:16:23Z

@Nodd, sure. Could you do it, please?

goanpeca · 2015-12-09T15:17:45Z

@Nodd could you do it, please :-), and add a link to this issue :p

Nodd · 2015-12-10T00:47:38Z

Yeah I'll do it, but first I'll have to check what "the repr of masked arrays is terribly inefficient." means.

ccordoba12 · 2015-12-10T01:30:21Z

@Nodd, it means that when you run

repr(ma)

where ma is a masked array, it takes a lot of time to return the result (at least for the arrays provided by @durack1)

durack1 · 2015-12-10T01:33:59Z

@Nodd those files are available through the links above here (large 3D matrices ~GB) and here (smaller 2D matrices ~KB)

durack1 mentioned this issue Oct 8, 2015

Spyder freezes when loading large matlab files #1968

Closed

ccordoba12 added component:Variable Explorer type:Bug labels Oct 9, 2015

ccordoba12 added this to the v3.0 milestone Oct 9, 2015

ccordoba12 modified the milestones: v2.3.8, v3.0 Nov 16, 2015

ccordoba12 modified the milestones: v3.0, v2.3.8 Nov 22, 2015

ccordoba12 changed the title ~~Spyder freezes when large matrices are in memory~~ Spyder freezes when large MaskedArrays are in memory Nov 22, 2015

durack1 mentioned this issue Dec 3, 2015

Spyder editor cursor lags during typing #2817

Closed

ccordoba12 mentioned this issue Dec 3, 2015

Changing IPython graphics backend to "Qt" will result in error when using Qt5 #2792

Closed

ccordoba12 closed this as completed in 0e3f0a9 Dec 9, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spyder freezes when large MaskedArrays are in memory #2748

Spyder freezes when large MaskedArrays are in memory #2748

durack1 commented Oct 8, 2015

Nodd commented Oct 8, 2015

durack1 commented Oct 8, 2015

ccordoba12 commented Oct 9, 2015

durack1 commented Oct 9, 2015

goanpeca commented Oct 9, 2015

durack1 commented Oct 9, 2015

durack1 commented Oct 16, 2015

goanpeca commented Oct 16, 2015

ccordoba12 commented Oct 16, 2015

durack1 commented Oct 18, 2015

ccordoba12 commented Oct 18, 2015

durack1 commented Oct 19, 2015

ccordoba12 commented Oct 19, 2015

durack1 commented Oct 31, 2015

ccordoba12 commented Nov 22, 2015

durack1 commented Nov 24, 2015

Nodd commented Nov 24, 2015

ccordoba12 commented Nov 24, 2015

ccordoba12 commented Nov 24, 2015

durack1 commented Nov 24, 2015

ccordoba12 commented Nov 24, 2015

durack1 commented Nov 24, 2015

Nodd commented Nov 24, 2015

ccordoba12 commented Dec 9, 2015

durack1 commented Dec 9, 2015

goanpeca commented Dec 9, 2015

durack1 commented Dec 9, 2015

Nodd commented Dec 9, 2015

ccordoba12 commented Dec 9, 2015

goanpeca commented Dec 9, 2015

Nodd commented Dec 10, 2015

ccordoba12 commented Dec 10, 2015

durack1 commented Dec 10, 2015

Spyder freezes when large MaskedArrays are in memory #2748

Spyder freezes when large MaskedArrays are in memory #2748

Comments

durack1 commented Oct 8, 2015

Nodd commented Oct 8, 2015

durack1 commented Oct 8, 2015

ccordoba12 commented Oct 9, 2015

durack1 commented Oct 9, 2015

goanpeca commented Oct 9, 2015

durack1 commented Oct 9, 2015

durack1 commented Oct 16, 2015

goanpeca commented Oct 16, 2015

ccordoba12 commented Oct 16, 2015

durack1 commented Oct 18, 2015

ccordoba12 commented Oct 18, 2015

durack1 commented Oct 19, 2015

ccordoba12 commented Oct 19, 2015

durack1 commented Oct 31, 2015

ccordoba12 commented Nov 22, 2015

durack1 commented Nov 24, 2015

Nodd commented Nov 24, 2015

ccordoba12 commented Nov 24, 2015

ccordoba12 commented Nov 24, 2015

durack1 commented Nov 24, 2015

ccordoba12 commented Nov 24, 2015

durack1 commented Nov 24, 2015

Nodd commented Nov 24, 2015

ccordoba12 commented Dec 9, 2015

durack1 commented Dec 9, 2015

goanpeca commented Dec 9, 2015

durack1 commented Dec 9, 2015

Nodd commented Dec 9, 2015

ccordoba12 commented Dec 9, 2015

goanpeca commented Dec 9, 2015

Nodd commented Dec 10, 2015

ccordoba12 commented Dec 10, 2015

durack1 commented Dec 10, 2015