Skip to content

kushnertodd/cpp-berkeley-db-framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

berkeley-db-cpp-framework

C++ framework to improve Berkeley DB developer efficiency and reduce programming errors.

Table of Contents

Background

Oracle Berkeley DB provides highly efficient storing, managaging, and accessing of large sets of data.

The Berkeley DB C++ Framework (the Framework) are C++ classes and functions created to hide some of the complexity of using Berkeley DB. Berkeley DB is simpler and requires less resources than common Relational Database Systems (RDBMS) such as that provided by Oracle, Microsoft SQL Server or MySQL. As a result, it is much faster and easier to use. It is an embedded database, called as a library from programs at the cost of not having an interactive user facility such as Structured Query Language (SQL) that makes databases available to non-programmers.

The purpose of Databases

A significant feature of databases is using indexes that greatly speed data access. Assuming each data item is identified by a key, such a an account, order, part, or social security number, an index allows finding a given item in time almost independent of the amount of data stored. Indexed databases have been used by the earliest data processing systems, and RDBMS incorporate indexes to be more efficient. Ironically, though, because of the extensive features they provide, RDBMS are still substantially slower than the earlier simple indexed databases.

RDBMS allow flexible data access to explore developing database applications, and this makes them among the most popular databases. However, when data applications are put into production, as data processing or web data access applications, the patterns of data access are generally fixed and the flexibility of a RDBMS is not required. Hence, a much simpler indexed database can be substituted to greatly enhance performance at the cost reimplementing the application to use the new database.

Berkeley DB is an example of such a simpler indexed database systems. Because it requires programming, changes take more time, but for commercial applications the time is repaid by the cost saved such as charges from a cloud service such as AWS. As a result, it is available for use by technical IT personnel rather than general users.

Berekley DB advantages

RDBMS provide other capabilities such as allowing users to concurrently access a database, ensuring data is always valid (such as an account referenced in an order always exists in the system), and that databases recover properly when systems fail. Berkeley DB, however, also provides those capabilities as well as much higher efficiency. While Berkeley DB is simpler, bacause it is programmed there is necessarily some complexity using it. The Framework was created to simplify some of those programming tasks to make developing and maintaining applications using Berkeley DB easier.

Berekley DB resources

The rest of this description is intended for programmers. Before continuing, a basic understanding of Berkeley DB and C++ is needed. The following resources should be used to understand the how the Berkeley DB works and have an introduction to how the Berkeley DB is programmed. While it is not essential to know all details of how the original libraries work to use this framework, it is necessary to know the primary C++ classes to understand how the framework works 'under the hood'.

Installing the necessary Berkeley DB components

The Framework requires a number of software packages are installed. Instructions to install them are provide in this document. These include:

  • The Oracle Berkeley DB library
  • The json-c JSON library
  • The cmake tool for building C++ applications

The current system is designed to be built and run under Linux, such as Ubuntu or running under Windows Subsystem for Linux (WSL2) on Windows. After release the ability to build and run under native Windows using Visual Studio will be added. Git is recommended for version control of your developing application.

Using the Berkeley DB C++ Framework

Sample application

A sample simple database application using the Framework is included. It is based on the example application included in the basic Berkeley DB software distribution described in Getting Started with Berkeley DB for C++. Read that reference before reading this description of the application implemented in the Framework, it will provide an introduction to the basic library and allow a comparison of the ease of implementing the application with the basic library and the framework.

Berkeley DB Framework components

The Framework is a set of C++ template classes and functions that simplify Berkeley DB programming and remove a lot of boilerplate code making programs more compact and error free. The principal template classes and functions used are these.

Data Transfer Object (DTO) classes

The programmer determines the data stored in the database, so they program a set of DTO classes that provide objects containing the data in a C++ format. To enable the Framework they encode functions to serialize and deserialize the data into and from,Berkeley DB records. Berkeley DB enforces no format for the data so the serialization and deserialization functions written by the programmer converts their data into binary data stored in database records. Defining these classes enables the function of the rest of the template classes and functions. If the DTO classes are C++ Plain Old Data (POD) objects, serialization and deserialization may be as simple as copying the in-memory data structures to and from Berkeley DB records. For serializing and deserializing DTO classes containing variable length character or numeric data, the Framework provides functions to simplify serialization and deserialization. Examples of coded DTO classes are provided.

Data Access Object (DAO) template functions

Once DTO classes are defined, DAO template functions simplify Berkeley DB operations such as reading database records, saving data records, and selecting all records with unique keys or duplicate records sharing a primary or secondary database key. The example Getting Started application included in the Berkeley DB distribution is recoded as an example of using the Framework.

Wrappers for Berkeley DB classes

Specific template classes are provided to simplify use of individual Berkeley DB C++ library classes. They provide simpler and more regular access to class functions, use DTOs as template types to make the Framework classes more simply available to different data collections, and use RAII to safely manage resources such as database handles, cursors, and allocated free store memory.

Dbt record management class wrapper

The Bdb_dbt class simplifies using the Berkeley DB C and C++ Dbt data type. Using DTO serialization and deserialization, it simplifies encoding and decoding data contained in Dbt objects and manages free store memory to prevent memory leaks

Db database handle wropper

The Dbd_db class manages database handles for primary and secondary databases provided by the Berkeley DB C++ Db data type. A JSON file describing the primary and secondary database relationships provided by the programmer automatically generates database handles and associates primary and secondary databases. The class manages database handles to ensure properly acquiring and releasing database handles.

Dbc cursor handle wrapper

The Dbd_cursor class simplifies cursor use to select all primary and secondary keyed database record, and all duplicate primary and secondary database records with a given key. The class also manages cursors to properly ensure acquiring and releasing cursors.

Additional convenience classes

The Framwork provides various additional classes to simplify other functions encountered increating Framwork programs.

File system IO classes

The Bdb_file_io class simplifies access to operating system files by providing functions to obtain and properly release file handles, and to read and write file data through C++ streams.

JSON utility class

The Framework assists in serialization and deserialization data in JSON format. The Bdb_json_utils class wraps the json-c open source library routines and provides simpler access to the library.

Utility classes

Utility classes such as Bdb_errors and Bdb_tokens are provided to simplify error tracking and string token processing.

The Framework code

The Framework was developed for Linux using the cmake build utility. An early version worked with Visual Studio in Windows and presumably could be modified to support the Berkeley DB Windows library version readily.

The Framework is distributed from a Github repository currently under the MIT license. The distribution can be cloned with Git from https://github.com/kushnertodd/berkeley-db-cpp-framework. Installation instructions, requiring the Berkeley DB and json-c distributions, are included in INSTALL.db markdown file. Build instructions are in BUILD.db.

Jetbrains CLion was used as an environment to develop the Framework and is a nice option for development. Building the Framework with the cmake docs target generated Doxygen documentation for the low lever classes in HTML format. Enlisted additional developers as collaborators is encouraged.

Summary

The Framework classes work together to give the programmer an intuitive interface to the Berkeley C++ library. They work with the classes in the Berkeley DB distribution and those may be incorporated with the Framework classes as needed, though needing their usage should be eliminated or reduced. Various Berekey DB functions such as managing locks, threads, and certain specific operations such as deleting records are omitted because they were not needed for the initial application for which the Framework was developed, but classes to support them can added in a manner consistent with the rest of the Framework.

Appendix - Format of databases definition JSON file

The databases JSON definition allows having the Framework open up the databases automatically. The outer level JSON descriptors define an array of primary databases, and each may optionally have an array of secondary databases. The Framework handles associating the secondary databases with the primary database automatically.

  • The primary and secondary database descriptors have a "class_name" field, which is fixed, that verifies that the JSON is a valid definition file
  • Each database has a "name", used to access the database descriptor as below, and a "filename". The filename will be opened in a directory specified by a "db_home" (-d) command line argument as seen in the Getting Started application example.
  • Primary databases have an optional "duplicates" field that can indicate if duplicate keys are allowed.
  • Secondary databases always permit duplicate keys. They require a "key_extractor" field that specifies a secondadry database key extractor callback routine. The programmer must create the following function that can look up the actual key extractor function based on the name. See the example application:
Exeample_key_extractor::key_extractor_fct(const char *key_extractor_name)

Example JSON database definition file:

{
  "class_name": "Bdb_databases_config",
  "primary_databases": [
    {
      "class_name": "Primary_database_config",
      "name": "vendor",
      "filename": "vendor.db",
      "duplicates": "true"
    },
    {
      "class_name": "Primary_database_config",
      "name": "inventory",
      "filename": "inventory.db",
      "secondary_databases": [
        {
          "class_name": "Secondary_database_config",
          "name": "item_name",
          "filename": "item_name.sdb",
          "key_extractor": "get_item_name"
        }
      ]
    }
  ]
}
```
### Example code to open a database
This is sample code to open a primary database.
The primary database constructor takes the secondary database key extractor
as an argument for when it automatically associates the primary database
with the secondary database.
See code in the example main program for examples and also for opening
a secondary database. The database constructor opens theh database,
and the destructor closes the database:
[excxx_example_database_read.cpp](src%2Fapps%2Fexcxx_example_database_read.cpp)
```
    Primary_database_config inventory_primary_database_config;
    bdb_databases_config.select("inventory", inventory_primary_database_config, errors);
    if (!errors.has()) {
      std::unique_ptr<Bdb_key_extractor> example_key_extractor = std::make_unique<Example_key_extractor>();
      Primary_database inventory_db(inventory_primary_database_config, example_key_extractor.get(), db_home, errors);
      ...
```

About

C++ framework to improve developer efficiency and reduce programming errors

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published