Skip to content
Craig Minihan edited this page Jul 18, 2016 · 8 revisions

libstringintern is a C++ 11 library for interning strings. It is thread safe and lock free and should run happily on 32bit and 64bit Intel, PPC and ARM systems.

API

libstringintern has no static state, so any number of StringIntern instances may be created. For example:

    StringIntern intern1;
    StringIntern intern2;

Adding a string is very easy:

    StringIntern intern;
    auto ref = intern.Add("Hello world");

References are 32bit values which allow libstringintern to locate the string once it has been interned. Your application should not infer any significance from its value. To recover a string value from a reference:

    StringIntern intern;
    auto ref = intern.Add("Hello world");
    std::cout << intern.ToString(ref) << std::endl;

References are comparable, copyable and can be false if the string wasn't interned:

    auto ref = intern.Add("Hello world");
    if (!ref) { ... }

libstringintern can store C++ 11 encoded strings like u32string:

   auto ref = intern.Add(U"Hello Unicode");

   std::u32string str;
   intern.ToString(ref, str);
   std::cout << str << std::endl;

Design

The call interface to libstringintern is very simple. However the internals are much more complex. They are designed to be thread safe and lock free for maximum performance. The main design elements are:

  • Hash - a high quality hash value of the string - used to determine string equality
  • Page - strings are grouped into pages based on their length
  • Nursery - pages are created in the nursery; strings are added to pages here
  • Catalog - pages are grouped in the catalog by length
  • Archive - the archive contains all pages ordered sequentially from page 0 to page N
  • Reference - these are 32bit values which encode the page where the string is located in memory
Clone this wiki locally