large memory used when infer #11185

tensor-tang · 2018-06-05T07:05:30Z

This is an issue of NLP online service.

When run inference, the memory usage is always kept as about 6G, which is definitely larger than actually needed.

ChinaLiuHao · 2018-06-05T07:10:58Z

I meet this situation too. In addition, when i use the inference by multi-thread way with "export OPENBLAS_NUM_THREADS=1", the program may end with the "Aborted" error!

tensor-tang · 2018-06-05T07:15:03Z

@ChinaLiuHao
And as an addition, the "Abort" error is randomly encountered, not always appears.

luotao1 · 2018-06-05T09:42:58Z

The OCR CRNN_CTC service also has a large memory:

tensor-tang · 2018-06-05T11:23:04Z

Paddle/paddle/fluid/memory/detail/buddy_allocator.cc

Lines 188 to 192 in 666c94e

    
           // Allocate a new maximum sized block 
        
           size_t index = 0; 
        
           void* p = system_allocator_->Alloc(&index, max_chunk_size_);

This should be the reason.
Paddle would alloc max chunk size at the first time.

tensor-tang · 2018-06-05T11:41:21Z

After debugging we can found there is a flag to choose how much memory we would like to use at the first time. Default it would use about 3.2%(1/32) of your total memory.

usage:

your_app --fraction_of_cpu_memory_to_use=0.1 # it would use 3.2% * 0.1 of total

The track back should be like this:

Paddle/paddle/fluid/platform/cpu_info.cc

Lines 26 to 28 in 666c94e

    
           DEFINE_double(fraction_of_cpu_memory_to_use, 1, 
        
                         "Default use 100% of CPU memory for PaddlePaddle," 
        
                         "reserve the rest for page tables, etc");

Paddle/paddle/fluid/platform/cpu_info.cc

Lines 54 to 58 in 666c94e

    
           size_t CpuMaxAllocSize() { 
        
             // For distributed systems, it requires configuring and limiting 
        
             // the fraction of memory to use. 
        
             return FLAGS_fraction_of_cpu_memory_to_use * CpuTotalPhysicalMemory(); 
        
           }

Paddle/paddle/fluid/platform/cpu_info.cc

Lines 65 to 69 in 666c94e

    
           size_t CpuMaxChunkSize() { 
        
             // Allow to allocate the maximum chunk size is roughly 3% of CPU memory. 
        
             return CpuMaxAllocSize() / 32; 
        
           }

Paddle/paddle/fluid/memory/malloc.cc

Lines 32 to 36 in 666c94e

    
           if (a == nullptr) { 
        
             a = new detail::BuddyAllocator(new detail::CPUAllocator, 
        
                                            platform::CpuMinChunkSize(), 
        
                                            platform::CpuMaxChunkSize()); 
        
           }

tensor-tang · 2018-06-05T12:06:04Z

@ChinaLiuHao
About the "Abort" issue, we can open another issue to discuss it. Thanks.

tensor-tang added the 预测原名Inference，包含Capi预测问题等 label Jun 5, 2018

tensor-tang self-assigned this Jun 5, 2018

tensor-tang changed the title ~~Larger memory used when infer~~ large memory used when infer Jun 5, 2018

tensor-tang closed this as completed Jun 5, 2018

This was referenced Jun 6, 2018

abort in multi-threads infer on CPU #11231

Closed

allocator initial cpu memory usage is too large #11272

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

large memory used when infer #11185

large memory used when infer #11185

tensor-tang commented Jun 5, 2018

ChinaLiuHao commented Jun 5, 2018

tensor-tang commented Jun 5, 2018

luotao1 commented Jun 5, 2018

tensor-tang commented Jun 5, 2018

tensor-tang commented Jun 5, 2018

tensor-tang commented Jun 5, 2018 •

edited

Loading

large memory used when infer #11185

large memory used when infer #11185

Comments

tensor-tang commented Jun 5, 2018

ChinaLiuHao commented Jun 5, 2018

tensor-tang commented Jun 5, 2018

luotao1 commented Jun 5, 2018

tensor-tang commented Jun 5, 2018

tensor-tang commented Jun 5, 2018

tensor-tang commented Jun 5, 2018 • edited Loading

tensor-tang commented Jun 5, 2018 •

edited

Loading