Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

large memory used when infer #11185

Closed
tensor-tang opened this issue Jun 5, 2018 · 6 comments
Closed

large memory used when infer #11185

tensor-tang opened this issue Jun 5, 2018 · 6 comments
Assignees
Labels
预测 原名Inference,包含Capi预测问题等

Comments

@tensor-tang
Copy link
Contributor

This is an issue of NLP online service.

When run inference, the memory usage is always kept as about 6G, which is definitely larger than actually needed.

image

@tensor-tang tensor-tang added the 预测 原名Inference,包含Capi预测问题等 label Jun 5, 2018
@tensor-tang tensor-tang self-assigned this Jun 5, 2018
@tensor-tang tensor-tang changed the title Larger memory used when infer large memory used when infer Jun 5, 2018
@ChinaLiuHao
Copy link

I meet this situation too. In addition, when i use the inference by multi-thread way with "export OPENBLAS_NUM_THREADS=1", the program may end with the "Aborted" error!

@tensor-tang
Copy link
Contributor Author

@ChinaLiuHao
And as an addition, the "Abort" error is randomly encountered, not always appears.

@luotao1
Copy link
Contributor

luotao1 commented Jun 5, 2018

The OCR CRNN_CTC service also has a large memory:
image

@tensor-tang
Copy link
Contributor Author

// Allocate a new maximum sized block
size_t index = 0;
void* p = system_allocator_->Alloc(&index, max_chunk_size_);

This should be the reason.
Paddle would alloc max chunk size at the first time.

@tensor-tang
Copy link
Contributor Author

After debugging we can found there is a flag to choose how much memory we would like to use at the first time. Default it would use about 3.2%(1/32) of your total memory.

usage:

your_app --fraction_of_cpu_memory_to_use=0.1 # it would use 3.2% * 0.1 of total

The track back should be like this:

DEFINE_double(fraction_of_cpu_memory_to_use, 1,
"Default use 100% of CPU memory for PaddlePaddle,"
"reserve the rest for page tables, etc");

size_t CpuMaxAllocSize() {
// For distributed systems, it requires configuring and limiting
// the fraction of memory to use.
return FLAGS_fraction_of_cpu_memory_to_use * CpuTotalPhysicalMemory();
}

size_t CpuMaxChunkSize() {
// Allow to allocate the maximum chunk size is roughly 3% of CPU memory.
return CpuMaxAllocSize() / 32;
}

if (a == nullptr) {
a = new detail::BuddyAllocator(new detail::CPUAllocator,
platform::CpuMinChunkSize(),
platform::CpuMaxChunkSize());
}

@tensor-tang
Copy link
Contributor Author

tensor-tang commented Jun 5, 2018

@ChinaLiuHao
About the "Abort" issue, we can open another issue to discuss it. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
预测 原名Inference,包含Capi预测问题等
Projects
None yet
Development

No branches or pull requests

3 participants