Duplicate entries in `memray table --leaks` reporter.

### Current Behavior

I am testing Memray's capabilities to identify memory leaks in python programs that use C extensions. I have a program that uses a library, which  has a known memory leak coming from an extension code, and I am running Memray in several configurations to find the one shows the leak most prominently.

When I run my program with `memray --native` mode for 1 hr, my profile file  is about 2GB without using aggregated mode or  6MB in aggregated mode (yay for this feature!). 

When I create a table profile, it has about 25k rows, yet many of the rows are duplicates. I have converted the html report to a CSV, file and then did further aggregation to remove duplicates by adding together allocations that happened in the same place.

This reduced the number of rows to ~200; when I removed the Thread_ID column and aggregated the entries again, I ended up with only 50 rows, and had a meaningful information about my leak.  

Here are sample entries from the table that demonstrate duplication 

```
Thread_ID,Size,Allocator,Allocations,Location
0x1,192,malloc,1,operator new(unsigned long) at <unknown>:0
0x1,56,malloc,1,operator new(unsigned long) at <unknown>:0
...
0x1,944,malloc,1,_PyMem_RawMalloc at Objects/obmalloc.c:99
0x1,944,malloc,1,_PyMem_RawMalloc at Objects/obmalloc.c:99
...
0x25 (fn_api_status_handler),328,realloc,1,upb_Arena_InitSlow at <unknown>:0
0x38 (Thread-30 (_run)),328,realloc,1,upb_Arena_InitSlow at <unknown>:0
0x1c (Thread-6 (_run)),328,realloc,1,upb_Arena_InitSlow at <unknown>:0
...
0x25 (fn_api_status_handler),71,malloc,1,<unknown> at <unknown>:0
0x25 (fn_api_status_handler),87,malloc,1,<unknown> at <unknown>:0
...
```

In my case the application is creating several ephemeral threads, and thread ID info is not meaningful, it would be nice to have an option to exclude ThreadID from the report. But even that aside , it seems that we should be adding together allocations that are happening at the same location, increasing the Size and Total allocations instead.

I have also tried using `--trace-python-allocation`, which didn't have a meaningful impact.

I also verified that duplicate entries appear in table report when not using `--native` mode. It happens but at a smaller ratio: post-processing resulted in 3x reduction in number of rows.  

### Expected Behavior

In Table report, there should be only 1 row for each (Thread_ID, Size, Allocator, Location) tuple. 

Bonus: provide the option to exclude ThreadID from the report.

### Steps To Reproduce

My setup is somewhat involved, I am profiling a data processing pipeline, also discussed in https://github.com/bloomberg/memray/discussions/852. 
I am happy to attach the collected profiles if it helps. If we must have a repro for a meaningful investigation, let me know.

### Memray Version

1.19.1

### Python Version

3.10

### Operating System

Linux

### Anything else?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Duplicate entries in `memray table --leaks` reporter. #857

Current Behavior

Expected Behavior

Steps To Reproduce

Memray Version

Python Version

Operating System

Anything else?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Duplicate entries in memray table --leaks reporter. #857

Description

Current Behavior

Expected Behavior

Steps To Reproduce

Memray Version

Python Version

Operating System

Anything else?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Duplicate entries in `memray table --leaks` reporter. #857