-
Notifications
You must be signed in to change notification settings - Fork 188
Description
Bug Report: CSVReader aborts with uncaught std::error_code from MmapParser::next() (Linux, large file, chunk boundary)
Summary
The file parsing method "CSVReader{file}" uses a multi-threaded approach with mmap for file operations in the underlying implementation of the CSV library. However, this can lead to crashes where MmapParser::next() throws a std::error_code.
When iterating a large CSV file using csv::CSVReader on Linux, the process aborts with:
terminate called after throwing an instance of 'std::error_code'
This is triggered around the chunk boundary (the library reads ITERATION_CHUNK_SIZE = 10,000,000 bytes per chunk). In our dataset the crash happens around ~61,6xx rows (first chunk finishes at ~61,640 rows).
Two issues combine to make this user-unrecoverable:
MmapParser::next()throws a rawstd::error_code(if (error) throw error;) instead of a standard exception type (std::system_error).CSVReaderruns parsing in astd::thread(CSVReader::read_csv), and exceptions escaping that thread invokestd::terminate(not catchable from user code surrounding the iteration).
Observed Error Code
By installing a custom std::terminate handler, we can inspect the uncaught exception:
std::error_codevalue = 22message = "Invalid argument"category = "system"
This corresponds to EINVAL on Linux, likely originating from the underlying mmap call (invalid offset/length).
Environment
- CSV Parser: 2.3.0
- OS: Rocky Linux 9.6 (Linux)
- Compiler: GCC 15 (C++20),
-pthread - Test file: ~79,370 lines, ~13MB CSV (
TA2601.csv)
Steps to Reproduce
- Build the repro:
g++ -std=c++20 -O2 -I path/to/csv-parser/single_include -pthread repro_csvreader_terminate_inspect.cpp -o repro- Run it against a large CSV (our crash is deterministic on
TA2601.csv):
./repro /path/to/TA2601.csvMinimal Reproducer
File: repro_csvreader_terminate_inspect.cpp
#include "csv.hpp"
#include <cstdlib>
#include <exception>
#include <iostream>
#include <string>
#include <system_error>
static void install_terminate_handler() {
std::set_terminate([] {
std::cerr << "\n=== std::terminate called ===\n";
if (auto eptr = std::current_exception()) {
try {
std::rethrow_exception(eptr);
} catch (const std::error_code& ec) {
std::cerr << "uncaught exception type: std::error_code\n";
std::cerr << " value=" << ec.value()
<< " message='" << ec.message()
<< "' category='" << ec.category().name() << "'\n";
} catch (const std::system_error& e) {
std::cerr << "uncaught exception type: std::system_error\n";
std::cerr << " what=" << e.what() << "\n";
std::cerr << " code.value=" << e.code().value()
<< " code.message='" << e.code().message()
<< "' category='" << e.code().category().name() << "'\n";
} catch (const std::exception& e) {
std::cerr << "uncaught exception type: std::exception\n";
std::cerr << " what=" << e.what() << "\n";
} catch (...) {
std::cerr << "uncaught exception type: (unknown)\n";
}
} else {
std::cerr << "no current_exception()\n";
}
std::abort();
});
}
int main(int argc, char** argv) {
install_terminate_handler();
std::string path = (argc > 1) ? argv[1] : std::string("large.csv");
csv::CSVReader reader(path);
std::size_t n = 0;
for (const auto& row : reader) {
(void)row;
++n;
if ((n % 10000) == 0) {
std::cerr << "rows=" << n << "\n";
}
}
std::cerr << "done rows=" << n << "\n";
return 0;
}Root Cause in Code
In include/internal/basic_csv_parser.cpp (MmapParser::next()):
std::error_code error;
this->data_ptr->_data = std::make_shared<mio::basic_mmap_source<char>>(
mio::make_mmap_source(this->_filename, this->mmap_pos, length, error));
this->mmap_pos += length;
if (error) throw error; // BUG: throws std::error_code directlyThis also exists in:
single_include/csv.hppsingle_include_test/csv.hpp
Additionally, CSVReader::read_csv() runs on a std::thread and does not catch exceptions:
this->parser->set_output(*this->records);
this->parser->next(bytes); // exception here => std::terminateSuggested Fix
1) Throw a proper exception type
Replace:
if (error) throw error;With:
if (error) {
throw std::system_error(error, "Memory mapping failed during CSV parsing");
}Ideally include context (filename / offset / length) in the message, e.g.:
if (error) {
std::string msg = "mmap failed: file='" + this->_filename
+ "' offset=" + std::to_string(this->mmap_pos)
+ " length=" + std::to_string(length);
throw std::system_error(error, msg);
}2) Propagate worker-thread exceptions to the caller
To make errors catchable by user code, catch all exceptions inside CSVReader::read_csv() and store an std::exception_ptr, then rethrow it from read_row() / iterator increment.