Skip to content
This repository has been archived by the owner on Dec 11, 2023. It is now read-only.

During handling of the above exception,another exception occured #87

Open
sdlmw opened this issue Aug 29, 2017 · 16 comments
Open

During handling of the above exception,another exception occured #87

sdlmw opened this issue Aug 29, 2017 · 16 comments

Comments

@sdlmw
Copy link

sdlmw commented Aug 29, 2017

qq 20170829141241
qq 20170829141338
Does anyone know what is the cause of this?
My system environment:Ubuntu_16.04_X86_64; nvidia dirver version:v384.66 nvcc_version:v8.0 cuDNN:6.0 .
I dont know why.
Thank you before

@ljch2018
Copy link

ljch2018 commented Aug 31, 2017

OOM when allocating tensor with shape...

OOM: Out of memory which means your system doesn't have enough memory to run tensorflow.

@sdlmw
Copy link
Author

sdlmw commented Aug 31, 2017

@luosmart ok,thank you. But I'm running with 8G of RAM maximum use 2G.And then it ends. I processed the file 15M。Whether the TensorFlow itself causes an exception to be thrown

@ljch2018
Copy link

ljch2018 commented Sep 1, 2017

@sdlmw Please check how much GPU memory left in your system, maybe other programs use too much GPU memory, try this command:

nvidia-smi 

@sdlmw
Copy link
Author

sdlmw commented Sep 1, 2017

@luosmart ,yeah. I have checked.In addition to system use,All remaining for TensorFlow use,about 1940MB

@ljch2018
Copy link

ljch2018 commented Sep 1, 2017

@sdlmw shape[12360, 17191] width and length of tensor is too large ? Have you checked the format of input data ?

@sdlmw
Copy link
Author

sdlmw commented Sep 1, 2017

@luosmart 。yeah,this is the file size.These two files were 12360kb and 17191kb.

@ljch2018
Copy link

ljch2018 commented Sep 1, 2017

@sdlmw 12360 * 17191 = 212480760 which is too much for GPU, why not use batch mechanism.

@sdlmw
Copy link
Author

sdlmw commented Sep 1, 2017

@luosmart are you chinese? i am sorry . I didn't touch beatch mechanism.

@sdlmw
Copy link
Author

sdlmw commented Sep 1, 2017

@luosmart ,真的很抱歉,我是刚刚开始接触linux,然后派了这样的工作。很多东西都不了解。既然这样。那我就清楚了。我犯得的问题是没有仔细看看 error 输出。

@ljch2018
Copy link

ljch2018 commented Sep 1, 2017

@sdlmw 检查你读取的数据,你输入的数据是一个超级大的矩阵,一定会OOM的。另外,如果有处理大的输入,可以分批量来输入,构造成小的tensor,这样吐给模型就不会OOM了。

@sdlmw
Copy link
Author

sdlmw commented Sep 1, 2017

@luosmart 是的,从昨天我发现问题所在,然后使用了美亚云的一个比较强劲的gpu就可以了。通过你解释的矩阵,那么我清楚了现在的工作内容。再次感谢你,谢谢了

@sdlmw
Copy link
Author

sdlmw commented Sep 1, 2017

@luosmart
但同时还有个以为,可不可以把使用gpu的内存切换到 运存内呢?

@ljch2018
Copy link

ljch2018 commented Sep 1, 2017

@sdlmw

可不可以把使用gpu的内存切换到 运存内呢?

你是指把GPU的内存切换成系统的内存吗?这是不行的,因为GPU和系统的内存是两块不同的硬件,不能通用的。

@sdlmw
Copy link
Author

sdlmw commented Sep 1, 2017

@luosmart 好的 谢谢。那是否tensorflow提供了其他的方案可以解决像是千万级别数据的一次性处理呢。。如果没有那可能处理的结果并不能达到我们想要的要求。甚至批量处理会浪费大量的时间

@ljch2018
Copy link

ljch2018 commented Sep 1, 2017

@sdlmw 分批量处理(batch)是所有深度学习模型的通用的处理方法,完全可以处理千万级别的数据,这是毫无疑问可以做到的。

甚至批量处理会浪费大量的时间

正确的分批量处理完全不会浪费大量时间,甚至和全量处理差不多。

@sdlmw
Copy link
Author

sdlmw commented Sep 1, 2017

@luosmart 好的,谢谢。我会抽时间来学习下深度学习,现在太白了。非常感谢你

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants