Skip to content

Conversation

@philnik777
Copy link
Contributor

@philnik777 philnik777 commented Oct 27, 2025

Benchmark         old       new    Difference    % Difference
-----------  --------  --------  ------------  --------------
bm_read       2468.45    736.27      -1732.18         -70.17%

@philnik777 philnik777 force-pushed the optimize_fstream_get branch 2 times, most recently from 1c20a74 to b3e6abe Compare October 29, 2025 13:55
@philnik777 philnik777 marked this pull request as ready for review October 30, 2025 09:32
@philnik777 philnik777 requested a review from a team as a code owner October 30, 2025 09:32
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Oct 30, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 30, 2025

@llvm/pr-subscribers-libcxx

Author: Nikolas Klauser (philnik777)

Changes
Benchmark         old       new    Difference    % Difference
-----------  --------  --------  ------------  --------------
bm_read       2468.45    736.27      -1732.18         -70.17%

Full diff: https://github.com/llvm/llvm-project/pull/165223.diff

5 Files Affected:

  • (modified) libcxx/docs/ReleaseNotes/22.rst (+2-2)
  • (modified) libcxx/include/fstream (+13)
  • (renamed) libcxx/test/benchmarks/streams/fstream.bench.cpp (+18)
  • (modified) libcxx/test/libcxx/input.output/file.streams/fstreams/filebuf/traits_mismatch.verify.cpp (+1-1)
  • (modified) libcxx/test/libcxx/input.output/file.streams/fstreams/traits_mismatch.verify.cpp (+1-1)
diff --git a/libcxx/docs/ReleaseNotes/22.rst b/libcxx/docs/ReleaseNotes/22.rst
index 25d33a9c2eb50..b5376c31ae31c 100644
--- a/libcxx/docs/ReleaseNotes/22.rst
+++ b/libcxx/docs/ReleaseNotes/22.rst
@@ -64,8 +64,8 @@ Improvements and New Features
   by up to 2.5x
 - The performance of ``erase(iterator, iterator)`` in the unordered containers has been improved by up to 1.9x
 - The performance of ``map::insert_or_assign`` has been improved by up to 2x
-- ``ofstream::write`` has been optimized to pass through large strings to system calls directly instead of copying them
-  in chunks into a buffer.
+- ``ofstream::write`` and ``ifstream::read`` have been optimized to pass through large reads and writes to system calls
+  directly instead of copying them in chunks.
 - Multiple internal types have been refactored to use ``[[no_unique_address]]``, resulting in faster compile times and
   reduced debug information.
 
diff --git a/libcxx/include/fstream b/libcxx/include/fstream
index 1f88d134fe061..b07ca636094af 100644
--- a/libcxx/include/fstream
+++ b/libcxx/include/fstream
@@ -308,6 +308,19 @@ protected:
     return basic_streambuf<_CharT, _Traits>::xsputn(__str, __len);
   }
 
+  _LIBCPP_HIDE_FROM_ABI_VIRTUAL streamsize xsgetn(char_type* __str, streamsize __len) override {
+    if (__always_noconv_) {
+      const streamsize __n = std::min(this->egptr() - this->gptr(), __len);
+      if (__n != 0) {
+        traits_type::copy(__str, this->gptr(), __n);
+        this->__gbump_ptrdiff(__n);
+      }
+      if (__len - __n >= this->egptr() - this->eback())
+        return std::fread(__str + __n, sizeof(char_type), __len - __n, __file_);
+    }
+    return basic_streambuf<_CharT, _Traits>::xsgetn(__str, __len);
+  }
+
 private:
   char* __extbuf_;
   const char* __extbufnext_;
diff --git a/libcxx/test/benchmarks/streams/ofstream.bench.cpp b/libcxx/test/benchmarks/streams/fstream.bench.cpp
similarity index 62%
rename from libcxx/test/benchmarks/streams/ofstream.bench.cpp
rename to libcxx/test/benchmarks/streams/fstream.bench.cpp
index 60606a9d67e2f..3fef23506e67e 100644
--- a/libcxx/test/benchmarks/streams/ofstream.bench.cpp
+++ b/libcxx/test/benchmarks/streams/fstream.bench.cpp
@@ -22,4 +22,22 @@ static void bm_write(benchmark::State& state) {
 }
 BENCHMARK(bm_write);
 
+static void bm_read(benchmark::State& state) {
+  std::vector<char> buffer;
+  buffer.resize(16384);
+
+  std::ofstream gen_testfile("testfile");
+  gen_testfile.write(buffer.data(), buffer.size());
+
+  std::ifstream stream("testfile");
+  assert(stream);
+
+  for (auto _ : state) {
+    stream.read(buffer.data(), buffer.size());
+    benchmark::DoNotOptimize(buffer);
+    stream.seekg(0);
+  }
+}
+BENCHMARK(bm_read);
+
 BENCHMARK_MAIN();
diff --git a/libcxx/test/libcxx/input.output/file.streams/fstreams/filebuf/traits_mismatch.verify.cpp b/libcxx/test/libcxx/input.output/file.streams/fstreams/filebuf/traits_mismatch.verify.cpp
index 283adbc057d1e..30e7b66d42325 100644
--- a/libcxx/test/libcxx/input.output/file.streams/fstreams/filebuf/traits_mismatch.verify.cpp
+++ b/libcxx/test/libcxx/input.output/file.streams/fstreams/filebuf/traits_mismatch.verify.cpp
@@ -19,4 +19,4 @@
 
 std::basic_filebuf<char, std::char_traits<wchar_t> > f;
 // expected-error-re@*:* {{static assertion failed{{.*}}traits_type::char_type must be the same type as CharT}}
-// expected-error@*:* 10 {{only virtual member functions can be marked 'override'}}
+// expected-error@*:* 11 {{only virtual member functions can be marked 'override'}}
diff --git a/libcxx/test/libcxx/input.output/file.streams/fstreams/traits_mismatch.verify.cpp b/libcxx/test/libcxx/input.output/file.streams/fstreams/traits_mismatch.verify.cpp
index ba6f3c31d3e34..daafb36f9151a 100644
--- a/libcxx/test/libcxx/input.output/file.streams/fstreams/traits_mismatch.verify.cpp
+++ b/libcxx/test/libcxx/input.output/file.streams/fstreams/traits_mismatch.verify.cpp
@@ -21,7 +21,7 @@ std::basic_fstream<char, std::char_traits<wchar_t> > f;
 // expected-error-re@*:* {{static assertion failed{{.*}}traits_type::char_type must be the same type as CharT}}
 // expected-error-re@*:* {{static assertion failed{{.*}}traits_type::char_type must be the same type as CharT}}
 
-// expected-error@*:* 12 {{only virtual member functions can be marked 'override'}}
+// expected-error@*:* 13 {{only virtual member functions can be marked 'override'}}
 
 // FIXME: As of commit r324062 Clang incorrectly generates a diagnostic about mismatching
 // exception specifications for types which are already invalid for one reason or another.

@ldionne
Copy link
Member

ldionne commented Nov 7, 2025

/libcxx-bot benchmark libcxx/test/benchmarks/streams/fstream.bench.cpp

Benchmark results:
Benchmark      Baseline    Candidate    Difference    % Difference
-----------  ----------  -----------  ------------  --------------
bm_read         7880.16      2276.16      -5604.00         -71.12%
bm_write        1273.50      1287.83         14.33           1.13%
Geomean         3167.87      1712.11      -1455.76         -45.95%

@philnik777 philnik777 force-pushed the optimize_fstream_get branch from b3e6abe to fc3c07b Compare November 7, 2025 10:23
@philnik777 philnik777 enabled auto-merge (squash) November 7, 2025 10:24
@philnik777 philnik777 merged commit 9b114c5 into llvm:main Nov 7, 2025
15 of 16 checks passed
vinay-deshmukh pushed a commit to vinay-deshmukh/llvm-project that referenced this pull request Nov 8, 2025
```
Benchmark         old       new    Difference    % Difference
-----------  --------  --------  ------------  --------------
bm_read       2468.45    736.27      -1732.18         -70.17%
```
this->__gbump_ptrdiff(__n);
}
if (__len - __n >= this->egptr() - this->eback())
return std::fread(__str + __n, sizeof(char_type), __len - __n, __file_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are seeing regressions with this commit. Applications failing with message like:
InvalidWordCount: WordCount exceeds remaining input stream size: expected size = 47564 bytes, remaining size = 544 bytes

Should we return total number of characters read not only the number of characters read by fread here?
Also should we return __n; after the if?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also seeing downstream failure in ACE C++ test suite Cxx11/27/9/1/5/t_xsgetn_1.C where it appears application is accessing invalid memory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also seeing failures with this commit. Shall we revert?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philnik777 Can you have a look? Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've got a patch that fixes the issue:

diff --git a/libcxx/include/fstream b/libcxx/include/fstream
index b07ca636094a..e97d378ae6ce 100644
--- a/libcxx/include/fstream
+++ b/libcxx/include/fstream
@@ -315,8 +315,11 @@ protected:
         traits_type::copy(__str, this->gptr(), __n);
         this->__gbump_ptrdiff(__n);
       }
-      if (__len - __n >= this->egptr() - this->eback())
-        return std::fread(__str + __n, sizeof(char_type), __len - __n, __file_);
+      const streamsize __remainder = __len - __n;
+
+      if (__remainder >= 0)
+        return std::fread(__str + __n, sizeof(char_type), __remainder, __file_) + __n;
+      return __n;
     }
     return basic_streambuf<_CharT, _Traits>::xsgetn(__str, __len);
   }

I'm planning to put up a PR tomorrow once I get a test written, but I won't be upset if someone beats me to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, coming up with a test has been difficult. The fix PR, which works for our unreduced example, is in #167779

michaelrj-google added a commit to michaelrj-google/llvm-project that referenced this pull request Nov 12, 2025
The optimized version of xsgetn for basic_filebuf added in llvm#165223 has
an issue where if the reads come from both the buffer and the
filesystem it returns the wrong number of characters. This patch should
address the issue.
Sterling-Augustine pushed a commit that referenced this pull request Nov 13, 2025
The optimized version of xsgetn for basic_filebuf added in #165223 has
an issue where if the reads come from both the buffer and the
filesystem it returns the wrong number of characters. This patch should
address the issue.
@rupprecht
Copy link
Collaborator

This also causes failures when reading from a default constructed std::ifstream, which AFAICT should be well defined to set the fail bit instead of crashing. See #168628.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants