Skip to content

Implemented a more space efficient string<->integer map. #9113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 11, 2025

Conversation

redmercury
Copy link
Contributor

Summary:
While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps. These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):

  • 2x doubly linked list pointers at 8 bytes each
  • 1 std::uint64_t (8 bytes)
  • 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:

  • Sharing of the data payload between two hash indices.
  • Variable sized integers, variable sized string length fields and best fit allocation of string data. That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:

string integer map size = 2623343
unordered map size = 16078928

There was a significant speedup when looking up strings, although looking up integers was about the same:

Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100

Differential Revision: D69472841

Copy link

pytorch-bot bot commented Mar 10, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9113

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 10, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69472841

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69472841

@redmercury
Copy link
Contributor Author

@pytorchbot label "topic: not user facing"

redmercury pushed a commit to redmercury/executorch that referenced this pull request Mar 10, 2025
Summary:

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok

Differential Revision: D69472841
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69472841

redmercury pushed a commit to redmercury/executorch that referenced this pull request Mar 11, 2025
Summary:
Pull Request resolved: pytorch#9113

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok

Differential Revision: D69472841
redmercury pushed a commit to redmercury/executorch that referenced this pull request Mar 11, 2025
Summary:

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69472841

redmercury pushed a commit to redmercury/executorch that referenced this pull request Mar 11, 2025
Summary:

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69472841

redmercury pushed a commit to redmercury/executorch that referenced this pull request Mar 11, 2025
Summary:

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69472841

redmercury pushed a commit to redmercury/executorch that referenced this pull request Mar 11, 2025
Summary:
Pull Request resolved: pytorch#9113

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841
redmercury pushed a commit to redmercury/executorch that referenced this pull request Mar 11, 2025
Summary:

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69472841

redmercury pushed a commit to redmercury/executorch that referenced this pull request Mar 11, 2025
Summary:

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69472841

redmercury pushed a commit to redmercury/executorch that referenced this pull request Mar 11, 2025
Summary:
Pull Request resolved: pytorch#9113

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841
redmercury pushed a commit to redmercury/executorch that referenced this pull request Mar 11, 2025
Summary:

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69472841

redmercury pushed a commit to redmercury/executorch that referenced this pull request Mar 11, 2025
Summary:
Pull Request resolved: pytorch#9113

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841
redmercury pushed a commit to redmercury/executorch that referenced this pull request Mar 11, 2025
Summary:

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69472841

redmercury pushed a commit to redmercury/executorch that referenced this pull request Mar 11, 2025
Summary:
Pull Request resolved: pytorch#9113

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841
Summary:

While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps.  These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.

The allocation of a node in each map is 40 bytes (on aarch64 Android):
* 2x doubly linked list pointers at 8 bytes each
* 1 std::uint64_t (8 bytes)
* 1 std::string (12 bytes, std::strings contain an internal buffer for small strings).

Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.

This implementation of the string/integer map has several features:
* Sharing of the data payload between two hash indices.
* Variable sized integers, variable sized string length fields and best fit allocation of string data.  That is to say, the data payload elements are variable sized.

The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
```
string integer map size = 2623343
unordered map size = 16078928
```

There was a significant speedup when looking up strings, although looking up integers was about the same:

```------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_FindStringIntegerMapString/iterations:100                4722 us         4722 us          100
BM_FindStringIntegerMapInteger/iterations:100                529 us          529 us          100
BM_FindStringIntegerMapStringOptional/iterations:100        4714 us         4713 us          100
BM_FindStringIntegerMapIntegerOptional/iterations:100        537 us          536 us          100
BM_FindStdUnorderedMapString/iterations:100                 7128 us         7127 us          100
BM_FindStdUnorderedMapInteger/iterations:100                 536 us          536 us          100
```

Reviewed By: swolchok, larryliu0820

Differential Revision: D69472841
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69472841

@facebook-github-bot facebook-github-bot merged commit 94dca7a into pytorch:main Mar 11, 2025
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported topic: not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants