-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[libc++] Optimize fstream::read #165223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libc++] Optimize fstream::read #165223
Conversation
1c20a74 to
b3e6abe
Compare
|
@llvm/pr-subscribers-libcxx Author: Nikolas Klauser (philnik777) ChangesFull diff: https://github.com/llvm/llvm-project/pull/165223.diff 5 Files Affected:
diff --git a/libcxx/docs/ReleaseNotes/22.rst b/libcxx/docs/ReleaseNotes/22.rst
index 25d33a9c2eb50..b5376c31ae31c 100644
--- a/libcxx/docs/ReleaseNotes/22.rst
+++ b/libcxx/docs/ReleaseNotes/22.rst
@@ -64,8 +64,8 @@ Improvements and New Features
by up to 2.5x
- The performance of ``erase(iterator, iterator)`` in the unordered containers has been improved by up to 1.9x
- The performance of ``map::insert_or_assign`` has been improved by up to 2x
-- ``ofstream::write`` has been optimized to pass through large strings to system calls directly instead of copying them
- in chunks into a buffer.
+- ``ofstream::write`` and ``ifstream::read`` have been optimized to pass through large reads and writes to system calls
+ directly instead of copying them in chunks.
- Multiple internal types have been refactored to use ``[[no_unique_address]]``, resulting in faster compile times and
reduced debug information.
diff --git a/libcxx/include/fstream b/libcxx/include/fstream
index 1f88d134fe061..b07ca636094af 100644
--- a/libcxx/include/fstream
+++ b/libcxx/include/fstream
@@ -308,6 +308,19 @@ protected:
return basic_streambuf<_CharT, _Traits>::xsputn(__str, __len);
}
+ _LIBCPP_HIDE_FROM_ABI_VIRTUAL streamsize xsgetn(char_type* __str, streamsize __len) override {
+ if (__always_noconv_) {
+ const streamsize __n = std::min(this->egptr() - this->gptr(), __len);
+ if (__n != 0) {
+ traits_type::copy(__str, this->gptr(), __n);
+ this->__gbump_ptrdiff(__n);
+ }
+ if (__len - __n >= this->egptr() - this->eback())
+ return std::fread(__str + __n, sizeof(char_type), __len - __n, __file_);
+ }
+ return basic_streambuf<_CharT, _Traits>::xsgetn(__str, __len);
+ }
+
private:
char* __extbuf_;
const char* __extbufnext_;
diff --git a/libcxx/test/benchmarks/streams/ofstream.bench.cpp b/libcxx/test/benchmarks/streams/fstream.bench.cpp
similarity index 62%
rename from libcxx/test/benchmarks/streams/ofstream.bench.cpp
rename to libcxx/test/benchmarks/streams/fstream.bench.cpp
index 60606a9d67e2f..3fef23506e67e 100644
--- a/libcxx/test/benchmarks/streams/ofstream.bench.cpp
+++ b/libcxx/test/benchmarks/streams/fstream.bench.cpp
@@ -22,4 +22,22 @@ static void bm_write(benchmark::State& state) {
}
BENCHMARK(bm_write);
+static void bm_read(benchmark::State& state) {
+ std::vector<char> buffer;
+ buffer.resize(16384);
+
+ std::ofstream gen_testfile("testfile");
+ gen_testfile.write(buffer.data(), buffer.size());
+
+ std::ifstream stream("testfile");
+ assert(stream);
+
+ for (auto _ : state) {
+ stream.read(buffer.data(), buffer.size());
+ benchmark::DoNotOptimize(buffer);
+ stream.seekg(0);
+ }
+}
+BENCHMARK(bm_read);
+
BENCHMARK_MAIN();
diff --git a/libcxx/test/libcxx/input.output/file.streams/fstreams/filebuf/traits_mismatch.verify.cpp b/libcxx/test/libcxx/input.output/file.streams/fstreams/filebuf/traits_mismatch.verify.cpp
index 283adbc057d1e..30e7b66d42325 100644
--- a/libcxx/test/libcxx/input.output/file.streams/fstreams/filebuf/traits_mismatch.verify.cpp
+++ b/libcxx/test/libcxx/input.output/file.streams/fstreams/filebuf/traits_mismatch.verify.cpp
@@ -19,4 +19,4 @@
std::basic_filebuf<char, std::char_traits<wchar_t> > f;
// expected-error-re@*:* {{static assertion failed{{.*}}traits_type::char_type must be the same type as CharT}}
-// expected-error@*:* 10 {{only virtual member functions can be marked 'override'}}
+// expected-error@*:* 11 {{only virtual member functions can be marked 'override'}}
diff --git a/libcxx/test/libcxx/input.output/file.streams/fstreams/traits_mismatch.verify.cpp b/libcxx/test/libcxx/input.output/file.streams/fstreams/traits_mismatch.verify.cpp
index ba6f3c31d3e34..daafb36f9151a 100644
--- a/libcxx/test/libcxx/input.output/file.streams/fstreams/traits_mismatch.verify.cpp
+++ b/libcxx/test/libcxx/input.output/file.streams/fstreams/traits_mismatch.verify.cpp
@@ -21,7 +21,7 @@ std::basic_fstream<char, std::char_traits<wchar_t> > f;
// expected-error-re@*:* {{static assertion failed{{.*}}traits_type::char_type must be the same type as CharT}}
// expected-error-re@*:* {{static assertion failed{{.*}}traits_type::char_type must be the same type as CharT}}
-// expected-error@*:* 12 {{only virtual member functions can be marked 'override'}}
+// expected-error@*:* 13 {{only virtual member functions can be marked 'override'}}
// FIXME: As of commit r324062 Clang incorrectly generates a diagnostic about mismatching
// exception specifications for types which are already invalid for one reason or another.
|
|
/libcxx-bot benchmark libcxx/test/benchmarks/streams/fstream.bench.cpp Benchmark results: |
b3e6abe to
fc3c07b
Compare
``` Benchmark old new Difference % Difference ----------- -------- -------- ------------ -------------- bm_read 2468.45 736.27 -1732.18 -70.17% ```
| this->__gbump_ptrdiff(__n); | ||
| } | ||
| if (__len - __n >= this->egptr() - this->eback()) | ||
| return std::fread(__str + __n, sizeof(char_type), __len - __n, __file_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are seeing regressions with this commit. Applications failing with message like:
InvalidWordCount: WordCount exceeds remaining input stream size: expected size = 47564 bytes, remaining size = 544 bytes
Should we return total number of characters read not only the number of characters read by fread here?
Also should we return __n; after the if?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also seeing downstream failure in ACE C++ test suite Cxx11/27/9/1/5/t_xsgetn_1.C where it appears application is accessing invalid memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also seeing failures with this commit. Shall we revert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@philnik777 Can you have a look? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've got a patch that fixes the issue:
diff --git a/libcxx/include/fstream b/libcxx/include/fstream
index b07ca636094a..e97d378ae6ce 100644
--- a/libcxx/include/fstream
+++ b/libcxx/include/fstream
@@ -315,8 +315,11 @@ protected:
traits_type::copy(__str, this->gptr(), __n);
this->__gbump_ptrdiff(__n);
}
- if (__len - __n >= this->egptr() - this->eback())
- return std::fread(__str + __n, sizeof(char_type), __len - __n, __file_);
+ const streamsize __remainder = __len - __n;
+
+ if (__remainder >= 0)
+ return std::fread(__str + __n, sizeof(char_type), __remainder, __file_) + __n;
+ return __n;
}
return basic_streambuf<_CharT, _Traits>::xsgetn(__str, __len);
}I'm planning to put up a PR tomorrow once I get a test written, but I won't be upset if someone beats me to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, coming up with a test has been difficult. The fix PR, which works for our unreduced example, is in #167779
The optimized version of xsgetn for basic_filebuf added in llvm#165223 has an issue where if the reads come from both the buffer and the filesystem it returns the wrong number of characters. This patch should address the issue.
The optimized version of xsgetn for basic_filebuf added in #165223 has an issue where if the reads come from both the buffer and the filesystem it returns the wrong number of characters. This patch should address the issue.
|
This also causes failures when reading from a default constructed |
Uh oh!
There was an error while loading. Please reload this page.