Implemented a more space efficient string<->integer map. #9113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

facebook-github-bot merged 1 commit into pytorch:main from redmercury:export-D69472841

Mar 11, 2025

Contributor

redmercury commented Mar 10, 2025

Summary:
While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps. These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):

2x doubly linked list pointers at 8 bytes each
1 std::uint64_t (8 bytes)
1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:

Sharing of the data payload between two hash indices.
Variable sized integers, variable sized string length fields and best fit allocation of string data. That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:

string integer map size = 2623343
unordered map size = 16078928

There was a significant speedup when looking up strings, although looking up integers was about the same:

Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100

Differential Revision: D69472841

redmercury requested review from jackzhxng, iseeyuan and larryliu0820 as code owners

March 10, 2025 23:48

pytorch-bot bot commented Mar 10, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9113

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot added the CLA Signed label

Contributor

facebook-github-bot commented Mar 10, 2025

This pull request was exported from Phabricator. Differential Revision: D69472841

facebook-github-bot added the fb-exported label

Contributor

facebook-github-bot commented Mar 10, 2025

This pull request was exported from Phabricator. Differential Revision: D69472841

redmercury force-pushed the export-D69472841 branch from 455cc0e to e285f7d Compare

March 10, 2025 23:52

Contributor Author

redmercury commented Mar 10, 2025

@pytorchbot label "topic: not user facing"

pytorch-bot bot added the topic: not user facing label

redmercury force-pushed the export-D69472841 branch from e285f7d to a64f28c Compare

March 10, 2025 23:59

redmercury pushed a commit to redmercury/executorch that referenced this pull request


          Implemented a more space efficient string<->integer map. (pytorch#9113)

a64f28c

Summary:

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok

Differential Revision: D69472841

Contributor

facebook-github-bot commented Mar 11, 2025

This pull request was exported from Phabricator. Differential Revision: D69472841

redmercury pushed a commit to redmercury/executorch that referenced this pull request


          Implemented a more space efficient string<->integer map. (pytorch#9113)

587ead4

Summary:
Pull Request resolved: pytorch#9113

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok

Differential Revision: D69472841

redmercury force-pushed the export-D69472841 branch from a64f28c to 587ead4 Compare

March 11, 2025 00:02

larryliu0820 approved these changes

View reviewed changes

redmercury pushed a commit to redmercury/executorch that referenced this pull request


          Implemented a more space efficient string<->integer map. (pytorch#9113)

7b97eb4

Summary:

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841

redmercury force-pushed the export-D69472841 branch from 587ead4 to 7b97eb4 Compare

March 11, 2025 00:31

Contributor

facebook-github-bot commented Mar 11, 2025

This pull request was exported from Phabricator. Differential Revision: D69472841

redmercury pushed a commit to redmercury/executorch that referenced this pull request


          Implemented a more space efficient string<->integer map. (pytorch#9113)

d03ae90

Summary:

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841

redmercury force-pushed the export-D69472841 branch from 7b97eb4 to d03ae90 Compare

March 11, 2025 00:36

Contributor

facebook-github-bot commented Mar 11, 2025

This pull request was exported from Phabricator. Differential Revision: D69472841

redmercury pushed a commit to redmercury/executorch that referenced this pull request


          Implemented a more space efficient string<->integer map. (pytorch#9113)

4f90485

Summary:

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841

redmercury force-pushed the export-D69472841 branch from d03ae90 to 4f90485 Compare

March 11, 2025 00:47

Contributor

facebook-github-bot commented Mar 11, 2025

This pull request was exported from Phabricator. Differential Revision: D69472841

redmercury pushed a commit to redmercury/executorch that referenced this pull request


          Implemented a more space efficient string<->integer map. (pytorch#9113)

3d0c4cb

Summary:
Pull Request resolved: pytorch#9113

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841

redmercury force-pushed the export-D69472841 branch from 4f90485 to 3d0c4cb Compare

March 11, 2025 00:51

redmercury pushed a commit to redmercury/executorch that referenced this pull request


          Implemented a more space efficient string<->integer map. (pytorch#9113)

52a64b8

Summary:

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841

redmercury force-pushed the export-D69472841 branch from 3d0c4cb to 52a64b8 Compare

March 11, 2025 03:13

Contributor

facebook-github-bot commented Mar 11, 2025

This pull request was exported from Phabricator. Differential Revision: D69472841

redmercury pushed a commit to redmercury/executorch that referenced this pull request


          Implemented a more space efficient string<->integer map. (pytorch#9113)

46a9872

Summary:

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841

redmercury force-pushed the export-D69472841 branch from 52a64b8 to 46a9872 Compare

March 11, 2025 03:26

Contributor

facebook-github-bot commented Mar 11, 2025

This pull request was exported from Phabricator. Differential Revision: D69472841

redmercury pushed a commit to redmercury/executorch that referenced this pull request


          Implemented a more space efficient string<->integer map. (pytorch#9113)

cd5d9bb

Summary:
Pull Request resolved: pytorch#9113

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841

redmercury force-pushed the export-D69472841 branch from 46a9872 to cd5d9bb Compare

March 11, 2025 03:30

redmercury pushed a commit to redmercury/executorch that referenced this pull request


          Implemented a more space efficient string<->integer map. (pytorch#9113)

b857dd3

Summary:

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841

redmercury force-pushed the export-D69472841 branch from cd5d9bb to b857dd3 Compare

March 11, 2025 16:21

Contributor

facebook-github-bot commented Mar 11, 2025

This pull request was exported from Phabricator. Differential Revision: D69472841

redmercury pushed a commit to redmercury/executorch that referenced this pull request


          Implemented a more space efficient string<->integer map. (pytorch#9113)

239aced

Summary:
Pull Request resolved: pytorch#9113

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841

redmercury force-pushed the export-D69472841 branch from b857dd3 to 239aced Compare

March 11, 2025 16:28

redmercury pushed a commit to redmercury/executorch that referenced this pull request


          Implemented a more space efficient string<->integer map. (pytorch#9113)

cebe11f

Summary:

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841

redmercury force-pushed the export-D69472841 branch from 239aced to cebe11f Compare

March 11, 2025 16:54

Contributor

facebook-github-bot commented Mar 11, 2025

This pull request was exported from Phabricator. Differential Revision: D69472841

redmercury pushed a commit to redmercury/executorch that referenced this pull request


          Implemented a more space efficient string<->integer map. (pytorch#9113)

Summary:
Pull Request resolved: pytorch#9113

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841

redmercury force-pushed the export-D69472841 branch from cebe11f to 1471650 Compare

March 11, 2025 16:59


          Implemented a more space efficient string<->integer map. (pytorch#9113)

1818d89

Summary:

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841

redmercury force-pushed the export-D69472841 branch from 1471650 to 1818d89 Compare

March 11, 2025 19:33

redmercury requested a review from swolchok as a code owner

March 11, 2025 19:33

Contributor

facebook-github-bot commented Mar 11, 2025

This pull request was exported from Phabricator. Differential Revision: D69472841

facebook-github-bot merged commit 94dca7a into pytorch:main

5 of 6 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

larryliu0820 larryliu0820 approved these changes

jackzhxng Awaiting requested review from jackzhxng jackzhxng is a code owner

iseeyuan Awaiting requested review from iseeyuan iseeyuan is a code owner

swolchok Awaiting requested review from swolchok swolchok is a code owner

Labels

CLA Signed fb-exported topic: not user facing