-
Notifications
You must be signed in to change notification settings - Fork 585
Implemented a more space efficient string<->integer map. #9113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9113
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D69472841 |
This pull request was exported from Phabricator. Differential Revision: D69472841 |
455cc0e
to
e285f7d
Compare
@pytorchbot label "topic: not user facing" |
e285f7d
to
a64f28c
Compare
Summary: While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps. These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other. The allocation of a node in each map is 40 bytes (on aarch64 Android): * 2x doubly linked list pointers at 8 bytes each * 1 std::uint64_t (8 bytes) * 1 std::string (12 bytes, std::strings contain an internal buffer for small strings). Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries. This implementation of the string/integer map has several features: * Sharing of the data payload between two hash indices. * Variable sized integers, variable sized string length fields and best fit allocation of string data. That is to say, the data payload elements are variable sized. The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated: ``` string integer map size = 2623343 unordered map size = 16078928 ``` There was a significant speedup when looking up strings, although looking up integers was about the same: ```------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_FindStringIntegerMapString/iterations:100 4722 us 4722 us 100 BM_FindStringIntegerMapInteger/iterations:100 529 us 529 us 100 BM_FindStringIntegerMapStringOptional/iterations:100 4714 us 4713 us 100 BM_FindStringIntegerMapIntegerOptional/iterations:100 537 us 536 us 100 BM_FindStdUnorderedMapString/iterations:100 7128 us 7127 us 100 BM_FindStdUnorderedMapInteger/iterations:100 536 us 536 us 100 ``` Reviewed By: swolchok Differential Revision: D69472841
This pull request was exported from Phabricator. Differential Revision: D69472841 |
Summary: Pull Request resolved: pytorch#9113 While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps. These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other. The allocation of a node in each map is 40 bytes (on aarch64 Android): * 2x doubly linked list pointers at 8 bytes each * 1 std::uint64_t (8 bytes) * 1 std::string (12 bytes, std::strings contain an internal buffer for small strings). Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries. This implementation of the string/integer map has several features: * Sharing of the data payload between two hash indices. * Variable sized integers, variable sized string length fields and best fit allocation of string data. That is to say, the data payload elements are variable sized. The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated: ``` string integer map size = 2623343 unordered map size = 16078928 ``` There was a significant speedup when looking up strings, although looking up integers was about the same: ```------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_FindStringIntegerMapString/iterations:100 4722 us 4722 us 100 BM_FindStringIntegerMapInteger/iterations:100 529 us 529 us 100 BM_FindStringIntegerMapStringOptional/iterations:100 4714 us 4713 us 100 BM_FindStringIntegerMapIntegerOptional/iterations:100 537 us 536 us 100 BM_FindStdUnorderedMapString/iterations:100 7128 us 7127 us 100 BM_FindStdUnorderedMapInteger/iterations:100 536 us 536 us 100 ``` Reviewed By: swolchok Differential Revision: D69472841
a64f28c
to
587ead4
Compare
Summary: While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps. These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other. The allocation of a node in each map is 40 bytes (on aarch64 Android): * 2x doubly linked list pointers at 8 bytes each * 1 std::uint64_t (8 bytes) * 1 std::string (12 bytes, std::strings contain an internal buffer for small strings). Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries. This implementation of the string/integer map has several features: * Sharing of the data payload between two hash indices. * Variable sized integers, variable sized string length fields and best fit allocation of string data. That is to say, the data payload elements are variable sized. The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated: ``` string integer map size = 2623343 unordered map size = 16078928 ``` There was a significant speedup when looking up strings, although looking up integers was about the same: ```------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_FindStringIntegerMapString/iterations:100 4722 us 4722 us 100 BM_FindStringIntegerMapInteger/iterations:100 529 us 529 us 100 BM_FindStringIntegerMapStringOptional/iterations:100 4714 us 4713 us 100 BM_FindStringIntegerMapIntegerOptional/iterations:100 537 us 536 us 100 BM_FindStdUnorderedMapString/iterations:100 7128 us 7127 us 100 BM_FindStdUnorderedMapInteger/iterations:100 536 us 536 us 100 ``` Reviewed By: swolchok, larryliu0820 Differential Revision: D69472841
587ead4
to
7b97eb4
Compare
This pull request was exported from Phabricator. Differential Revision: D69472841 |
Summary: While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps. These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other. The allocation of a node in each map is 40 bytes (on aarch64 Android): * 2x doubly linked list pointers at 8 bytes each * 1 std::uint64_t (8 bytes) * 1 std::string (12 bytes, std::strings contain an internal buffer for small strings). Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries. This implementation of the string/integer map has several features: * Sharing of the data payload between two hash indices. * Variable sized integers, variable sized string length fields and best fit allocation of string data. That is to say, the data payload elements are variable sized. The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated: ``` string integer map size = 2623343 unordered map size = 16078928 ``` There was a significant speedup when looking up strings, although looking up integers was about the same: ```------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_FindStringIntegerMapString/iterations:100 4722 us 4722 us 100 BM_FindStringIntegerMapInteger/iterations:100 529 us 529 us 100 BM_FindStringIntegerMapStringOptional/iterations:100 4714 us 4713 us 100 BM_FindStringIntegerMapIntegerOptional/iterations:100 537 us 536 us 100 BM_FindStdUnorderedMapString/iterations:100 7128 us 7127 us 100 BM_FindStdUnorderedMapInteger/iterations:100 536 us 536 us 100 ``` Reviewed By: swolchok, larryliu0820 Differential Revision: D69472841
7b97eb4
to
d03ae90
Compare
This pull request was exported from Phabricator. Differential Revision: D69472841 |
Summary: While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps. These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other. The allocation of a node in each map is 40 bytes (on aarch64 Android): * 2x doubly linked list pointers at 8 bytes each * 1 std::uint64_t (8 bytes) * 1 std::string (12 bytes, std::strings contain an internal buffer for small strings). Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries. This implementation of the string/integer map has several features: * Sharing of the data payload between two hash indices. * Variable sized integers, variable sized string length fields and best fit allocation of string data. That is to say, the data payload elements are variable sized. The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated: ``` string integer map size = 2623343 unordered map size = 16078928 ``` There was a significant speedup when looking up strings, although looking up integers was about the same: ```------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_FindStringIntegerMapString/iterations:100 4722 us 4722 us 100 BM_FindStringIntegerMapInteger/iterations:100 529 us 529 us 100 BM_FindStringIntegerMapStringOptional/iterations:100 4714 us 4713 us 100 BM_FindStringIntegerMapIntegerOptional/iterations:100 537 us 536 us 100 BM_FindStdUnorderedMapString/iterations:100 7128 us 7127 us 100 BM_FindStdUnorderedMapInteger/iterations:100 536 us 536 us 100 ``` Reviewed By: swolchok, larryliu0820 Differential Revision: D69472841
d03ae90
to
4f90485
Compare
This pull request was exported from Phabricator. Differential Revision: D69472841 |
Summary: Pull Request resolved: pytorch#9113 While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps. These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other. The allocation of a node in each map is 40 bytes (on aarch64 Android): * 2x doubly linked list pointers at 8 bytes each * 1 std::uint64_t (8 bytes) * 1 std::string (12 bytes, std::strings contain an internal buffer for small strings). Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries. This implementation of the string/integer map has several features: * Sharing of the data payload between two hash indices. * Variable sized integers, variable sized string length fields and best fit allocation of string data. That is to say, the data payload elements are variable sized. The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated: ``` string integer map size = 2623343 unordered map size = 16078928 ``` There was a significant speedup when looking up strings, although looking up integers was about the same: ```------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_FindStringIntegerMapString/iterations:100 4722 us 4722 us 100 BM_FindStringIntegerMapInteger/iterations:100 529 us 529 us 100 BM_FindStringIntegerMapStringOptional/iterations:100 4714 us 4713 us 100 BM_FindStringIntegerMapIntegerOptional/iterations:100 537 us 536 us 100 BM_FindStdUnorderedMapString/iterations:100 7128 us 7127 us 100 BM_FindStdUnorderedMapInteger/iterations:100 536 us 536 us 100 ``` Reviewed By: swolchok, larryliu0820 Differential Revision: D69472841
4f90485
to
3d0c4cb
Compare
Summary: While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps. These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other. The allocation of a node in each map is 40 bytes (on aarch64 Android): * 2x doubly linked list pointers at 8 bytes each * 1 std::uint64_t (8 bytes) * 1 std::string (12 bytes, std::strings contain an internal buffer for small strings). Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries. This implementation of the string/integer map has several features: * Sharing of the data payload between two hash indices. * Variable sized integers, variable sized string length fields and best fit allocation of string data. That is to say, the data payload elements are variable sized. The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated: ``` string integer map size = 2623343 unordered map size = 16078928 ``` There was a significant speedup when looking up strings, although looking up integers was about the same: ```------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_FindStringIntegerMapString/iterations:100 4722 us 4722 us 100 BM_FindStringIntegerMapInteger/iterations:100 529 us 529 us 100 BM_FindStringIntegerMapStringOptional/iterations:100 4714 us 4713 us 100 BM_FindStringIntegerMapIntegerOptional/iterations:100 537 us 536 us 100 BM_FindStdUnorderedMapString/iterations:100 7128 us 7127 us 100 BM_FindStdUnorderedMapInteger/iterations:100 536 us 536 us 100 ``` Reviewed By: swolchok, larryliu0820 Differential Revision: D69472841
3d0c4cb
to
52a64b8
Compare
This pull request was exported from Phabricator. Differential Revision: D69472841 |
Summary: While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps. These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other. The allocation of a node in each map is 40 bytes (on aarch64 Android): * 2x doubly linked list pointers at 8 bytes each * 1 std::uint64_t (8 bytes) * 1 std::string (12 bytes, std::strings contain an internal buffer for small strings). Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries. This implementation of the string/integer map has several features: * Sharing of the data payload between two hash indices. * Variable sized integers, variable sized string length fields and best fit allocation of string data. That is to say, the data payload elements are variable sized. The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated: ``` string integer map size = 2623343 unordered map size = 16078928 ``` There was a significant speedup when looking up strings, although looking up integers was about the same: ```------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_FindStringIntegerMapString/iterations:100 4722 us 4722 us 100 BM_FindStringIntegerMapInteger/iterations:100 529 us 529 us 100 BM_FindStringIntegerMapStringOptional/iterations:100 4714 us 4713 us 100 BM_FindStringIntegerMapIntegerOptional/iterations:100 537 us 536 us 100 BM_FindStdUnorderedMapString/iterations:100 7128 us 7127 us 100 BM_FindStdUnorderedMapInteger/iterations:100 536 us 536 us 100 ``` Reviewed By: swolchok, larryliu0820 Differential Revision: D69472841
52a64b8
to
46a9872
Compare
This pull request was exported from Phabricator. Differential Revision: D69472841 |
Summary: Pull Request resolved: pytorch#9113 While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps. These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other. The allocation of a node in each map is 40 bytes (on aarch64 Android): * 2x doubly linked list pointers at 8 bytes each * 1 std::uint64_t (8 bytes) * 1 std::string (12 bytes, std::strings contain an internal buffer for small strings). Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries. This implementation of the string/integer map has several features: * Sharing of the data payload between two hash indices. * Variable sized integers, variable sized string length fields and best fit allocation of string data. That is to say, the data payload elements are variable sized. The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated: ``` string integer map size = 2623343 unordered map size = 16078928 ``` There was a significant speedup when looking up strings, although looking up integers was about the same: ```------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_FindStringIntegerMapString/iterations:100 4722 us 4722 us 100 BM_FindStringIntegerMapInteger/iterations:100 529 us 529 us 100 BM_FindStringIntegerMapStringOptional/iterations:100 4714 us 4713 us 100 BM_FindStringIntegerMapIntegerOptional/iterations:100 537 us 536 us 100 BM_FindStdUnorderedMapString/iterations:100 7128 us 7127 us 100 BM_FindStdUnorderedMapInteger/iterations:100 536 us 536 us 100 ``` Reviewed By: swolchok, larryliu0820 Differential Revision: D69472841
46a9872
to
cd5d9bb
Compare
Summary: While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps. These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other. The allocation of a node in each map is 40 bytes (on aarch64 Android): * 2x doubly linked list pointers at 8 bytes each * 1 std::uint64_t (8 bytes) * 1 std::string (12 bytes, std::strings contain an internal buffer for small strings). Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries. This implementation of the string/integer map has several features: * Sharing of the data payload between two hash indices. * Variable sized integers, variable sized string length fields and best fit allocation of string data. That is to say, the data payload elements are variable sized. The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated: ``` string integer map size = 2623343 unordered map size = 16078928 ``` There was a significant speedup when looking up strings, although looking up integers was about the same: ```------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_FindStringIntegerMapString/iterations:100 4722 us 4722 us 100 BM_FindStringIntegerMapInteger/iterations:100 529 us 529 us 100 BM_FindStringIntegerMapStringOptional/iterations:100 4714 us 4713 us 100 BM_FindStringIntegerMapIntegerOptional/iterations:100 537 us 536 us 100 BM_FindStdUnorderedMapString/iterations:100 7128 us 7127 us 100 BM_FindStdUnorderedMapInteger/iterations:100 536 us 536 us 100 ``` Reviewed By: swolchok, larryliu0820 Differential Revision: D69472841
cd5d9bb
to
b857dd3
Compare
This pull request was exported from Phabricator. Differential Revision: D69472841 |
Summary: Pull Request resolved: pytorch#9113 While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps. These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other. The allocation of a node in each map is 40 bytes (on aarch64 Android): * 2x doubly linked list pointers at 8 bytes each * 1 std::uint64_t (8 bytes) * 1 std::string (12 bytes, std::strings contain an internal buffer for small strings). Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries. This implementation of the string/integer map has several features: * Sharing of the data payload between two hash indices. * Variable sized integers, variable sized string length fields and best fit allocation of string data. That is to say, the data payload elements are variable sized. The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated: ``` string integer map size = 2623343 unordered map size = 16078928 ``` There was a significant speedup when looking up strings, although looking up integers was about the same: ```------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_FindStringIntegerMapString/iterations:100 4722 us 4722 us 100 BM_FindStringIntegerMapInteger/iterations:100 529 us 529 us 100 BM_FindStringIntegerMapStringOptional/iterations:100 4714 us 4713 us 100 BM_FindStringIntegerMapIntegerOptional/iterations:100 537 us 536 us 100 BM_FindStdUnorderedMapString/iterations:100 7128 us 7127 us 100 BM_FindStdUnorderedMapInteger/iterations:100 536 us 536 us 100 ``` Reviewed By: swolchok, larryliu0820 Differential Revision: D69472841
b857dd3
to
239aced
Compare
Summary: While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps. These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other. The allocation of a node in each map is 40 bytes (on aarch64 Android): * 2x doubly linked list pointers at 8 bytes each * 1 std::uint64_t (8 bytes) * 1 std::string (12 bytes, std::strings contain an internal buffer for small strings). Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries. This implementation of the string/integer map has several features: * Sharing of the data payload between two hash indices. * Variable sized integers, variable sized string length fields and best fit allocation of string data. That is to say, the data payload elements are variable sized. The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated: ``` string integer map size = 2623343 unordered map size = 16078928 ``` There was a significant speedup when looking up strings, although looking up integers was about the same: ```------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_FindStringIntegerMapString/iterations:100 4722 us 4722 us 100 BM_FindStringIntegerMapInteger/iterations:100 529 us 529 us 100 BM_FindStringIntegerMapStringOptional/iterations:100 4714 us 4713 us 100 BM_FindStringIntegerMapIntegerOptional/iterations:100 537 us 536 us 100 BM_FindStdUnorderedMapString/iterations:100 7128 us 7127 us 100 BM_FindStdUnorderedMapInteger/iterations:100 536 us 536 us 100 ``` Reviewed By: swolchok, larryliu0820 Differential Revision: D69472841
239aced
to
cebe11f
Compare
This pull request was exported from Phabricator. Differential Revision: D69472841 |
Summary: Pull Request resolved: pytorch#9113 While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps. These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other. The allocation of a node in each map is 40 bytes (on aarch64 Android): * 2x doubly linked list pointers at 8 bytes each * 1 std::uint64_t (8 bytes) * 1 std::string (12 bytes, std::strings contain an internal buffer for small strings). Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries. This implementation of the string/integer map has several features: * Sharing of the data payload between two hash indices. * Variable sized integers, variable sized string length fields and best fit allocation of string data. That is to say, the data payload elements are variable sized. The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated: ``` string integer map size = 2623343 unordered map size = 16078928 ``` There was a significant speedup when looking up strings, although looking up integers was about the same: ```------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_FindStringIntegerMapString/iterations:100 4722 us 4722 us 100 BM_FindStringIntegerMapInteger/iterations:100 529 us 529 us 100 BM_FindStringIntegerMapStringOptional/iterations:100 4714 us 4713 us 100 BM_FindStringIntegerMapIntegerOptional/iterations:100 537 us 536 us 100 BM_FindStdUnorderedMapString/iterations:100 7128 us 7127 us 100 BM_FindStdUnorderedMapInteger/iterations:100 536 us 536 us 100 ``` Reviewed By: swolchok, larryliu0820 Differential Revision: D69472841
cebe11f
to
1471650
Compare
Summary: While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps. These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other. The allocation of a node in each map is 40 bytes (on aarch64 Android): * 2x doubly linked list pointers at 8 bytes each * 1 std::uint64_t (8 bytes) * 1 std::string (12 bytes, std::strings contain an internal buffer for small strings). Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries. This implementation of the string/integer map has several features: * Sharing of the data payload between two hash indices. * Variable sized integers, variable sized string length fields and best fit allocation of string data. That is to say, the data payload elements are variable sized. The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated: ``` string integer map size = 2623343 unordered map size = 16078928 ``` There was a significant speedup when looking up strings, although looking up integers was about the same: ```------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------ BM_FindStringIntegerMapString/iterations:100 4722 us 4722 us 100 BM_FindStringIntegerMapInteger/iterations:100 529 us 529 us 100 BM_FindStringIntegerMapStringOptional/iterations:100 4714 us 4713 us 100 BM_FindStringIntegerMapIntegerOptional/iterations:100 537 us 536 us 100 BM_FindStdUnorderedMapString/iterations:100 7128 us 7127 us 100 BM_FindStdUnorderedMapInteger/iterations:100 536 us 536 us 100 ``` Reviewed By: swolchok, larryliu0820 Differential Revision: D69472841
1471650
to
1818d89
Compare
This pull request was exported from Phabricator. Differential Revision: D69472841 |
Summary:
While investigating memory consumption, I noticed that the tiktoken loader was allocating 16MB of memory, mainly distributed over two std::unordered_maps. These two maps (the Encode and Decoder types) in Tiktoken are inverses of each other, and the std::string objects contained therein are clones of each other.
The allocation of a node in each map is 40 bytes (on aarch64 Android):
Each node actually allocates 48 bytes of usable memory, as the allocator aligns the allocations to 16 byte boundaries.
This implementation of the string/integer map has several features:
The implemented unit tests tracks the memory size allocated between the old std::unordered_map method and the new StringIntegerMap method, yielding a ~6x improvement in the memory allocated:
There was a significant speedup when looking up strings, although looking up integers was about the same:
Differential Revision: D69472841