Error malloc(): unaligned tcache chunk detected Always Occur after tensorrt server handling a certain amount requests

### System Info

- Ubuntu 20.04
- NVIDIA A100

### Who can help?

@kaiyux

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction

1. docker run -itd --gpus=all --shm-size=1g -p8000:8000 -p8001:8001 -p8002:8002 -v /share/datasets:/share/datasets nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3
2. code version is 0.11.0
git clone https://github.com/NVIDIA/TensorRT-LLM.git
git clone https://github.com/triton-inference-server/tensorrtllm_backend.git
3. Perform some serving inference calls by aiohttp

### Expected behavior

All request are successfully processed and no error

### actual behavior

When the server performs multiple inferences, such as 5000 times, it raise error
**malloc(): unaligned tcache chunk detected**
**Signal (6) received.**
<img width="937" alt="截屏2024-08-27 11 56 31" src="https://github.com/user-attachments/assets/292bf969-d79b-4370-8be5-66f1d323a6fe">
Both continuous and intermittent (such as one day) inference will cause this error.

When I calls 8000 inferences in one test, it raise error 
**pinned_memory_manager.cc:170] "failed to allocate pinned system memory, falling back to non-pinned system memory**
Finally I set parameter **cuda-memory-pool-byte-size** to 512M and **pinned-memory-pool-byte-size** to 512M and solve this problem, but these two parameters are not exposed in the script **scripts/launch_triton_server.py**, so I want to ask why this problem occurs and if there is any other way to solve this problem.

When I call the server with high concurrency it raise error 
**malloc_consolidate(): unaligned fastbin chunk detected
Signal (6) received.**
![image](https://github.com/user-attachments/assets/e41fb6d8-b4a3-47a2-b157-e5c806861ad4)

Hope you can help me solve these problems, thanks very much!

### additional notes

I think this seems to be because the server does not completely clean up the memory after each inference is completed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error malloc(): unaligned tcache chunk detected Always Occur after tensorrt server handling a certain amount requests #587

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error malloc(): unaligned tcache chunk detected Always Occur after tensorrt server handling a certain amount requests #587

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions