Make server resource file checksumming code memory-efficient #3924

AlexTMjugador · 2025-01-04T17:20:47Z

Problem description and proposed solution

At the moment, the server resource file checksum calculation code reads the entire file to a memory buffer, which is handed over to a MD5 hasher. Then, the file is copied to the HTTP cache directory by writing the buffer contents to a file in that directory. This algorithm is simple and works well for smaller resource files, but has O(n) space complexity, meaning that it may cause heap or address space exhaustion problems for bigger files, such as IMG archives, textures and sounds, and for resources with many files, which is a realistic concern especially since #3203. In fact, Discord user Ben reported an out of memory server crash stemming from this code when trying to add big IMG archives to a resource.

These changes make the algorithm have constant space complexity by not reading the entire file to a buffer and using the
CChecksum::GenerateChecksumFromFile method instead. Because no buffer with the file contents is available any longer, FileSize is used to compute the resource file size hint, and FileCopy is used to copy the file to the HTTP cache directory.

Remarks

The new code should be much more scalable, but has a higher constant overhead due to the additional file operations for fetching its size and copying it. Luckily, opening a file to get its size is a fairly cheap operation in most sane environments, and reading it again for making a copy should hit the OS disk cache in RAM, making it also fairly cheap. All in all, I couldn't measure a negative performance impact for the common case of small resource files on a local filesystem. (Of course, I could indeed measure a favorable difference in memory usage during resource load.)

In my view, a bigger potential concern would be the posibility of TOCTOU bugs that the repeated operations over the same file introduce: an external thread could race against the server thread to swap the contents of the file being checksummed after such checksum is computed, but before it's copied to the HTTP cache directory. After giving this concern some thought, I concluded that, fortunately, it's not a problem in this case because clients would detect a checksum mismatch after downloading the file, refusing to deal with it and thus sidestepping any security concerns. (Conceptually, this is similar to what would already happen if the HTTP cache directory gets corrupted.)

While implementing these changes, I've also simplified a needlessly verbose if-else chain in the client code related to resource file checksum checks, and renamed the CResourceFile m_uiFileSize field to m_uiFileSizeHint to emphasize that, due to TOCTOU races and/or HTTP cache directory poisoning, it may not always be accurate.

At the moment, the server resource file checksum calculation code reads the entire file to a memory buffer, which is then handed over to a MD5 hasher. Then, the file is copied to the HTTP cache directory by writing the buffer contents to a file in that directory. This algorithm is simple and works well for smaller resource files, but has O(n) space complexity, meaning that it may cause heap or address space exhaustion problems for bigger files, such as IMG archives, textures and sounds, and for resources with many files, which is a realistic concern especially since #3203. In fact, Discord user _Ben_ reported an out of memory server crash stemming from this code when trying to add big IMG archives to a resource. These changes make the algorithm have constant space complexity by not reading the file to a buffer and using the `CChecksum::GenerateChecksumFromFile` method instead. Because no buffer with the file contents is available any longer, `FileSize` is used to compute the resource file size hint, and `FileCopy` is used to copy the file to the HTTP cache directory. Thus, the new code should be much more scalable, but has a higher constant overhead due to the additional file operations for fetching its size and copying it. Luckily, opening a file to get its size is a fairly cheap operation in most sane environments, and reading it again for making a copy should hit the OS disk cache in RAM, making it also fairly cheap. All in all, I couldn't measure a negative performance impact for the common case of small resource files on a local filesystem. (Of course, I could indeed measure a favorable difference in memory usage during resource load.) In my view, a bigger potential concern would be the posibility of TOCTOU bugs that the repeated operations over the same file introduce: an external thread could race against the server thread to swap the contents of the file being checksummed after such checksum is computed, but before it's copied to the HTTP cache directory. After giving this concern some thought, I concluded that, fortunately, it's not a problem in this case because clients would detect a checksum mismatch after downloading the file, refusing to deal with it and thus sidestepping any security concerns. (Conceptually, this is similar to what would already happen if the HTTP cache directory gets corrupted.)

cb90339 Make server resource file checksumming code memory-efficient (PR #3924) e0ce8b6 Update client en_US pot

botder approved these changes Jan 4, 2025

View reviewed changes

botder merged commit cb90339 into multitheftauto:master Jan 4, 2025
6 checks passed

botder added the enhancement New feature or request label Jan 4, 2025

botder added this to the 1.6.1 milestone Jan 4, 2025

MTABot pushed a commit that referenced this pull request Jan 4, 2025

Synchronize changes from 1.6 master branch [ci skip]

a0de962

cb90339 Make server resource file checksumming code memory-efficient (PR #3924) e0ce8b6 Update client en_US pot

sacr1ficez mentioned this pull request Apr 8, 2025

Exceedingly long resource (re)start times #4151

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Make server resource file checksumming code memory-efficient #3924

Make server resource file checksumming code memory-efficient #3924

Uh oh!

AlexTMjugador commented Jan 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Make server resource file checksumming code memory-efficient #3924

Make server resource file checksumming code memory-efficient #3924

Uh oh!

Conversation

AlexTMjugador commented Jan 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem description and proposed solution

Remarks

Uh oh!

Uh oh!

Uh oh!

AlexTMjugador commented Jan 4, 2025 •

edited

Loading