Skip to content

Bug Report: CSVReader aborts with uncaught std::error_code from MmapParser::next() (Linux, large file, chunk boundary) #280

@xing-cg

Description

@xing-cg

Bug Report: CSVReader aborts with uncaught std::error_code from MmapParser::next() (Linux, large file, chunk boundary)

Summary

The file parsing method "CSVReader{file}" uses a multi-threaded approach with mmap for file operations in the underlying implementation of the CSV library. However, this can lead to crashes where MmapParser::next() throws a std::error_code.

When iterating a large CSV file using csv::CSVReader on Linux, the process aborts with:

terminate called after throwing an instance of 'std::error_code'

This is triggered around the chunk boundary (the library reads ITERATION_CHUNK_SIZE = 10,000,000 bytes per chunk). In our dataset the crash happens around ~61,6xx rows (first chunk finishes at ~61,640 rows).

Two issues combine to make this user-unrecoverable:

  1. MmapParser::next() throws a raw std::error_code (if (error) throw error;) instead of a standard exception type (std::system_error).
  2. CSVReader runs parsing in a std::thread (CSVReader::read_csv), and exceptions escaping that thread invoke std::terminate (not catchable from user code surrounding the iteration).

Observed Error Code

By installing a custom std::terminate handler, we can inspect the uncaught exception:

  • std::error_code
  • value = 22
  • message = "Invalid argument"
  • category = "system"

This corresponds to EINVAL on Linux, likely originating from the underlying mmap call (invalid offset/length).

Environment

  • CSV Parser: 2.3.0
  • OS: Rocky Linux 9.6 (Linux)
  • Compiler: GCC 15 (C++20), -pthread
  • Test file: ~79,370 lines, ~13MB CSV (TA2601.csv)

Steps to Reproduce

  1. Build the repro:
g++ -std=c++20 -O2 -I path/to/csv-parser/single_include -pthread repro_csvreader_terminate_inspect.cpp -o repro
  1. Run it against a large CSV (our crash is deterministic on TA2601.csv):
./repro /path/to/TA2601.csv

Minimal Reproducer

File: repro_csvreader_terminate_inspect.cpp

#include "csv.hpp"

#include <cstdlib>
#include <exception>
#include <iostream>
#include <string>
#include <system_error>

static void install_terminate_handler() {
    std::set_terminate([] {
        std::cerr << "\n=== std::terminate called ===\n";

        if (auto eptr = std::current_exception()) {
            try {
                std::rethrow_exception(eptr);
            } catch (const std::error_code& ec) {
                std::cerr << "uncaught exception type: std::error_code\n";
                std::cerr << "  value=" << ec.value()
                          << " message='" << ec.message()
                          << "' category='" << ec.category().name() << "'\n";
            } catch (const std::system_error& e) {
                std::cerr << "uncaught exception type: std::system_error\n";
                std::cerr << "  what=" << e.what() << "\n";
                std::cerr << "  code.value=" << e.code().value()
                          << " code.message='" << e.code().message()
                          << "' category='" << e.code().category().name() << "'\n";
            } catch (const std::exception& e) {
                std::cerr << "uncaught exception type: std::exception\n";
                std::cerr << "  what=" << e.what() << "\n";
            } catch (...) {
                std::cerr << "uncaught exception type: (unknown)\n";
            }
        } else {
            std::cerr << "no current_exception()\n";
        }

        std::abort();
    });
}

int main(int argc, char** argv) {
    install_terminate_handler();

    std::string path = (argc > 1) ? argv[1] : std::string("large.csv");
    csv::CSVReader reader(path);

    std::size_t n = 0;
    for (const auto& row : reader) {
        (void)row;
        ++n;
        if ((n % 10000) == 0) {
            std::cerr << "rows=" << n << "\n";
        }
    }

    std::cerr << "done rows=" << n << "\n";
    return 0;
}

Root Cause in Code

In include/internal/basic_csv_parser.cpp (MmapParser::next()):

std::error_code error;
this->data_ptr->_data = std::make_shared<mio::basic_mmap_source<char>>(
    mio::make_mmap_source(this->_filename, this->mmap_pos, length, error));
this->mmap_pos += length;
if (error) throw error;  // BUG: throws std::error_code directly

This also exists in:

  • single_include/csv.hpp
  • single_include_test/csv.hpp

Additionally, CSVReader::read_csv() runs on a std::thread and does not catch exceptions:

this->parser->set_output(*this->records);
this->parser->next(bytes); // exception here => std::terminate

Suggested Fix

1) Throw a proper exception type

Replace:

if (error) throw error;

With:

if (error) {
    throw std::system_error(error, "Memory mapping failed during CSV parsing");
}

Ideally include context (filename / offset / length) in the message, e.g.:

if (error) {
    std::string msg = "mmap failed: file='" + this->_filename
        + "' offset=" + std::to_string(this->mmap_pos)
        + " length=" + std::to_string(length);
    throw std::system_error(error, msg);
}

2) Propagate worker-thread exceptions to the caller

To make errors catchable by user code, catch all exceptions inside CSVReader::read_csv() and store an std::exception_ptr, then rethrow it from read_row() / iterator increment.

TA2601.csv

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions