BinPrint: Simple C++ library to save (binary) data to files

This repository contains the code for a C++ library that may be used to efficiently save data from a program into output files. It specializes in the management of many independent output streams, all combined within a single object within the program.

Description

The library defines the BinPrint class, an object designed to take in a list of output files to open, and to save specified data to those files whenever prompted. A BinPrint makes use of buffers before flushing to file, and the size of these buffers may be modified by the developer. As of now, and for efficiency, this library only saves numerical data, and does so as streams of binary numbers (output files with extension .dat), which much therefore be read by specialized functions (e.g. the R package readsim).

How to

Simply copy the header src/binprint.hpp and the associated source file src/binprint.cpp in your own project, make sure your C++ code and build setup can locate and include these files without a hitch, and you should be good to go. In this repository, we provide an example, minimal setup showcasing how to use the library. Here we explain how to compile this minimal setup. For the implementation details of this minimal example, please refer to the source file src/MAIN.cpp.

Workflow

First, a BinPrint object must be instantiated with the names of the output files (without extension) to open as argument:

BinPrint p({"foo", "bar", "baz"});

Optionally, the size (in MB) of the buffers to fill before flushing may be specified as second argument. The default is 1 MB per output file.

The output files must then be open, using:

p.open();

This will open the three output files: foo.dat, bar.dat, and baz.dat.

Alternatively, the developer may decide that the user can choose not to open all hardcoded files, but only a subset. Then, a text file may be provided to open() that contains a list of the output files to open. Such file may look like this:

foo
bar

Assuming that this file is called whattosave.txt and available in the working directory upon calling the program, it will be read provided that the following line is added just before opening the buffers to the output files:

p.read("whattosave.txt");

Once open() has been called, as many output file streams (i.e. buffers) are open as was requested (i.e. foo.dat, bar.dat and baz.dat if no subset was requested, or just foo.dat and bar.dat if the above whattosave.txt file was read). We can now save stuff.

The BinPrint class offers a single interface through which to dispatch values into the desired target output files. To save, simply use:

p.save("foo", 0.99);

This will save the scalar 0.99 (in binary) into foo.dat. At this point, for reading purposes - since humans typically lack the ability to read binary files - it is best practice to record the number of bytes taken by a single double-precision floating point number (i.e. a double) on the machine where the program is being run. Typically this will by 8 bytes on a 64-bit system, but it might be different. If the wrong encoding is given to the reading function (e.g. in R or in Python scripts supposed to decipher the saved data), the wrong values will be read in.

That's pretty much it. Note that vectors of values can be saved by placing the save() function inside of a loop:

for (int i = 0; i < 10; ++i) 
    p.save("bar", static_cast<double>(i));

This means that it is important to also know how many values in total each output file is supposed to take (although that might be deduced by the number of bytes taken by each value, it may still be important for complicated setups, e.g. one number per individual per generation, with varying numbers of individuals every generation in an individual-based simulation).

Also note that the static_cast<>() function is here to make explicit the conversion of the integer-typed i into a double, which is eventually what is being written to file bar.dat.

At the end of the program, the printer (and all of the output files) may be closed:

p.close();

We highlight here that this library mostly shines when handling many possible output files, in cases where the user may request different output files to be saved in different runs, where the different outputs may have different units of observations (i.e. incurring a lot of possible data duplication if the output were saved, say, as a table in a CSV file), and when the data to save is possibly large (binary is much faster to write than text output). See the readsim package in R for a tool designed to read the type of data saved by this library.

About

This code is written in C++20. It was developed on Ubuntu Linux 24.04 LTS, making mostly use of Visual Studio Code 1.99.0 (C/C++ Extension Pack 1.3.1). CMake 3.28.3 was used as build system, with g++ 13.3.0 as compiler. GDB 15.0.50.20240403 was used for debugging. Tests (see here) were written with Boost.Test 1.87, itself retrieved with Git 2.43.0 and vcpkg 2025.04.09. Memory use was checked with Valgrind 3.22.0. Code coverage was analyzed with LCOV 2.0-1. Profiling was performed with gprof 2.42. (See the dev/ folder and this page for details about the checks performed.) During development, occasional use was also made of ChatGPT and GitHub Copilot.

Links

The present library has been used (in this form or slightly modified) in the following, non-exhaustive list of projects:

reschoice: evolutionary simulation of ecological specialization under informed resource choice
brachypode: evolutionary simulation of adaptation to a changing climate in a facilitated system

The following R package may be used to read the data saved by this type of implementation:

readsim: read and combine binary simulation data in R

Permissions

This code is licensed under the MIT license. See license file for details. This code comes with no guarantee whatsoever.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dev		dev
doc		doc
src		src
tests		tests
vcpkg @ aa2d376		vcpkg @ aa2d376
.gitignore		.gitignore
.gitmodules		.gitmodules
CITATION.cff		CITATION.cff
CMakeLists.txt		CMakeLists.txt
LICENSE.txt		LICENSE.txt
README.md		README.md
main.cpp		main.cpp
vcpkg.json		vcpkg.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BinPrint: Simple C++ library to save (binary) data to files

Description

How to

Workflow

About

Links

Permissions

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BinPrint: Simple C++ library to save (binary) data to files

Description

How to

Workflow

About

Links

Permissions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages