-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spyder freezes when large MaskedArrays are in memory #2748
Comments
Did you try to completely disable the variable explorer (ie close the pane) ? There is nothing special about |
@Nodd good call.. Closing the variable explorer starts to speed things up considerably.. Ok so my issue appears to be with the variable explorer really slowing things down.. Is this noted somewhere in another open issue? It's a pity this issue exists, because I do like having access to a visual of the variables (and their dimensions and types) that are currently in memory.. |
@durack1, thanks for taking the time to open this issue and letting us know about this problem. This is a very interesting use case to improve Spyder responsiveness with big data. We have done a lot of work lately to better handle big DataFrames and NumPy arrays when they are opened. However, the problem here could lie in the fact that we're not optimizing how the array is represented in the Variable Explorer. What I mean is that in the Value column we are just putting the full But to test this hypothesis, I need you to give some sample code (using |
@ccordoba12 no problem, the easiest example would be just load a variable from a netcdf file using So the code: import cdms2 as cdm
fileHandle = cdm.open('AusCOM1-0.Salt.Omon.so.00011231-00501231.nc')
var = fileHandle('so')
fileHandle.close() Should reproduce the laggyness that I've experienced - the variable will require ~1GB available memory to load.. You can grab the file above from here (it's 355MB) I've noted what would appear to be a rogue |
@ccordoba12 we should use that file for future tests! 😉 |
@goanpeca @ccordoba12 if you would like an even larger test file, I'd be more than happy to provide this! |
@ccordoba12 @goanpeca let me know if you have any trouble getting access to |
@durack1 thanks for the heads up :-) |
@durack1, could you also upload a smaller |
@ccordoba12 it'd have to be pretty small.. I've been experiencing this freezing issue even with smaller matrices.. I'm currently on travel but will drop a smaller file on the webserver when I'm back in a week or so.. |
Ok, no problem. I'm working on other things right now, but I hope to address this issue for beta2. |
@ccordoba12 great - what it the timeline for beta2? I'm not holding things up if I get this data to you next week am I? |
Don't worry, we are two or three weeks away of it :-) El 19/10/15 a las 07:26, Paul J. Durack escribió:
|
@ccordoba12 apologies for the delay.. Here is a much smaller file that doesn't appear to trigger the laggyness issues. In the example below I have re-enabled the So the code: import cdms2 as cdm
fileHandle = cdm.open('DurackandWijffels_GlobalOceanSurfaceChanges_1950-2000.nc')
saltChange = fileHandle('salinity_change')
thetaoChange = fileHandle('thetao_change') You can grab the file above from here (it's 607KB) |
@durack1, please give us a smaller file than the one you uploaded first (called That file seems to require 10 gigs of ram (not 1, I tested it on my virtual machines! :-), so I can't use it for testing. |
@ccordoba12 the file should be fine - you could load a smaller subset using: >>> import resource
>>> import cdms2 as cdm
>>> import numpy as np
>>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 0.155 GB
>>> fileHandle = cdm.open('AusCOM1-0.Salt.Omon.so.00011231-00501231.nc')
>>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 0.155 GB
>>> var = fileHandle('so',time=slice(0,1))
>>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 0.155 GB
>>> fileHandle.close()
>>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 0.155 GB Obviously then changing the number of timesteps loaded by altering your indexes (e.g. >>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 0.156 GB
>>> fileHandle = cdm.open('AusCOM1-0.Salt.Omon.so.00011231-00501231.nc')
>>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 0.156 GB
>>> var = fileHandle('so',time=slice(0,10))
>>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 0.706 GB
>>> fileHandle.close()
>>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 0.706 GB For the full matrix I only need ~4GB: >>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 0.156 GB
>>> fileHandle = cdm.open('AusCOM1-0.Salt.Omon.so.00011231-00501231.nc')
>>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 0.156 GB
>>> var = fileHandle('so')
>>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 3.239 GB
>>> fileHandle.close()
>>> print 'Max mem: %05.3f GB' % (np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
Max mem: 3.239 GB |
I wonder if we should show the repr of objects smaller than say 1MB, and keep a whitelist for types that we know have no problems (like pandas dataframe). It would avoid freezing or crashing spyder because of unknown big objects. |
@Nodd, it's a good idea. But how to determine reliably an object's memory? |
Sorry, it's not its memory the problem, but the size of its repr. How could we account for that? |
@ccordoba12 @Nodd in the case of a |
@durack1, that just gives its size :-) repr is the string printed when you do something like: >>> a = np.array([1, 2, 3])
>>> a
array([1, 2, 3]) Most numpy repr's are very efficient, but some (like record arrays and it seems masked arrays) seem to not be :-) |
@ccordoba12 but such |
We can guess that the bigger an object is, the bigger its repr could be. If we know that a type has a simple repr, we can force to always show it. My proposal is not perfect, it's just a workaround to avoid recurring problems with the variable explorer and big objects. As for the size, I vaguely recall that python has a |
I discovered that this problem was generated because the repr of masked arrays is terribly inefficient. So the fix is to use as the Value of masked arrays (i.e. what appears in the fourth column of the Variable Explorer) a simple string (i.e. just "Masked array") instead of the variable's repr. And then the problem goes away :-) |
@ccordoba12 great, looking forward to testing this in |
@durack1, not yet sorry :-( |
@goanpeca no problem, looking forward to testing |
@ccordoba12 Maybe it would be worth opening a bug report for numpy ? |
@Nodd, sure. Could you do it, please? |
@Nodd could you do it, please :-), and add a link to this issue :p |
Yeah I'll do it, but first I'll have to check what "the repr of masked arrays is terribly inefficient." means. |
@Nodd those files are available through the links above here (large 3D matrices ~GB) and here (smaller 2D matrices ~KB) |
I've experienced a long-standing issue with Spyder that dates back to
Spyder 2.1.13
(and likely before) and is still present in the current3.0.0b1
. It is likely linked toSpyder
interactions with thecdms2
module that usesnumpy.ma
for its backend array manipulation.After loading a ~large [50,50,300,360] ~1GB matrix into memory (from a netcdf file - using
cdms2.open
), the responsiveness of the console disappears.. And intermittently responds to inputs. This continues to occur until the matrix is reduced in size, i.e.var = var[0:1,:,:,:]
The issue disappears when using pure
numpy
, sovar = numpy.ma.ones([50,50,300,360])
doesn't yield the same lack of responsiveness.The issue is not related to resource limitations on the machine (
redhat6.7, 128GB, Xeon E5-2643 quad core
).As it's related to Spyder interactions with the
cdms2
module, I'm not sure the best way to get to the bottom of this issue.. Following #1958 and #1968 I have adjusted the auto refresh feature of the variable explorer (turned this off) but this doesn't appear to solve the problem.What further information is required to attempt to get to the bottom of this issue? For completeness:
Spyder version: 3.0.0-b1
Python version: 2.7.10
Qt version: 4.8.4
PyQT4 version: 4.11.3
numpy version: 1.9.0
uv-cdat/cdms2 version: 2.4.0-rc1
@dnadeau4 pinging you here..
The text was updated successfully, but these errors were encountered: