GitHub - neolit123/cfg2: A simplistic configuration parser for INI like syntax in C

neolit123 / cfg2 Public

Notifications You must be signed in to change notification settings
Fork 0
Star 3

A simplistic configuration parser for INI like syntax in C

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 335 Commits
include		include
lib		lib
obj		obj
src		src
test		test
.gitignore		.gitignore
AUTHORS		AUTHORS
HISTORY		HISTORY
LICENSE		LICENSE
Makefile		Makefile
Makefile.msvc		Makefile.msvc
README		README

Repository files navigation

cfg2 0.99.0 README
a simplistic configuration parser for INI like syntax in C

last revisited:
	21.02.2016
author:
	lubomir i. ivanov (neolit123 at gmail)

================================================================================
INTRODUCTION:

cfg2 is a simple library for parsing and writing a configuration file format
very similar to the INI file format. it's not related to any projects such as
libcfg*. it can be used for very fast processing of application configurations
and even translations.

public domain code - see LICENSE!

cfg2 format example:
--------------------------------------------------------------------------------
[section1]
key1=value1
key2=value2
; a comment line
key3="value with some special \n characters"
key4="some value \
on multiple lines"

[section2]
; integer value
key5=-12345678
; floating point value
key6=12345678.312
; hex value
key7=0xfafafafa
; key/value with whitespace
"    key8  "   =   "   value8  "
; HelloWorld as a case-insensitive hex value (needs extra parsing)
key9=48656c6c6f576f726c64
--------------------------------------------------------------------------------

the above pretty much covers what the library can be used for.

format notes:
* the required file format is UTF-8, LF! CRLF and CR are pretty much untested.
* since 0.15.0 the supported format structure is close to the INI format
specified here:
    https://en.wikipedia.org/wiki/INI_file
* comment lines still can be defined starting with the ';' or '#' characters
or custom characters using cfg_t's 'comment_char[1/2]' fields
* the parser does not handle duplicates i.e. memory will be allocated for such.
to separate duplicates use sections!
* the parser is not very strict, thus not many errors will be thrown if things
go wrong, just warnings if cfg_t's 'verbose' field is more than 0
* \x??[??] sequences are not supported as they take way too much space.
use direct hex strings (e.g. key9) and parse them explicitly with
cfg_hex_to_char().
* the parser, file i/o and hashing functions have a 32bit limit.
================================================================================
WHY WRITE A NEW CONFIG LIBRARY?:

while there are many different configuration formats and parser libraries
out there, the answer is "for fun", but perhaps also for the performance,
portability and overhead benefits. this library is a minimal and very fast
alternative.

================================================================================
HOW IT WORKS:

* STRUCTURE

arrays of objects (not object pointers) are used instead of linked lists for
performance reasons.

all sections are allocated into cfg_section_t objects and each section object
has a list of cfg_entry_t objects which have the key / value pairs.
everything is continuous memory so cache misses when doing linear search are
very unlikely.

there is always at least one section, which is called the "root" section.

* PARSING:

once an input buffer is passed to the parser the first stage is to covert it
into a "raw" buffer, which is readable by the second stage, while counting
how many sections and entries per section are there.
in this stage the parser uses a trick with special characters to separate
sections, keys and values into the "raw" buffer.

the second stage is to parse the strings from the "raw" buffer into the library
object memory, while allocating the cfg_section_t and cfg_entry_t objects for
each section. the parsing is a two stage operation for the sake of not
calling realloc() multiple times on the output buffer. this is a very good
optimization for any size of the input buffer and parser implementation.

realloc() is only used to count the number of sections in a uint32 buffer,
which grows in power-of-two increments.

* ACCESS

linear search is used to traverse the list of entries in a section, but this
is blazing fast for millions of entries because of the continuous memory block
usage. if you truly have way too many entries, just separate them into multiple
sections.

addition and deletion of entries is something which arrays are supposedly much
worse than linked lists, yet performance in this library is quite good as
it takes milliseconds to transpose a list of millions of entries.

* WRITING

when writing the contents of the library objects the sections and entries
are serialized into strings one by one into a growing buffer.
mind that all formatting is lost in the process.

unlike parsing, no size estimation is done here, so realloc() is used for the
output buffer. it grows in power-of-two increments.

* CACHE

another present optimization is caching N values in a separate cache buffer and
always checking this buffer first, instead of a section buffer (which can be
huge). this technique optimizes performance when an entry is used multiple
times. the cache buffer can be resized dynamically or disabled when its size
is set to zero.

ironically, cache misses are more likely to happen when reading entries from
the library cache because of the way an entry in the cache is checked for
it's parent section. however, the library cache is still a major performance
boost, if it has been set to a reasonable size.

================================================================================
PERFORMANCE:

the library should be pretty fast even without optimization flags. from
some outdated tests, it takes around 1.4 seconds to parse a 20MB text file
with ~550,000 keys and ~80,000 sections on a modern dual core processor.

================================================================================
API:

the API is pretty simple, but mind that it might change here and there until
a "major" version is released. it uses a naming scheme for functions
similar to GLib and the usage logic is quite common as well:
	create a library object
	init/allocate object
	work with object
	free object

for more specific information for each API function take a look at the comments
in "cfg2.h". also a better usage example can be found in "test.c".

small usage example without error checking:
--------------------------------------------------------------------------------
#include "cfg2.h"

cfg_char *v;
cfg_t *st; /* library object */

st = cfg_alloc(); /* init the library */
cfg_file_parse(st, some_file); /* parse a file */
v = cfg_value_get(st, "section1", "key1"); /* get a key's value */
puts(v); /* value1 */
cfg_free(st);

================================================================================
COMPILATION:

you need GNU-Make and either GCC/Mingw or a MSVC toolchain.

--------------------------------------------------------------------------------
GCC
--------------------------------------------------------------------------------

to compile everything write:
make

to create just the library files write:
make lib

to compile the test write:
make test

to run the test write:
make run

NOTES:
- the Makefile has been tested on Win32 and Linux
- ./lib will contain both a dynamic and static library builds (e.g. dll.a, .a)
- ./test will contain a test(.exe) executable which is based on test.c
- to link against the static library define CFG_LIB_STATIC
- by default the library builds in DEBUG mode (e.g. gcc -g);
to build in release mode use:
make RELEASE=1

--------------------------------------------------------------------------------
MSVC
--------------------------------------------------------------------------------

specify the custom Makefile:
make -f Makefile.msvc
make -f Makefile.msvc lib
...

NOTES:
- the same Makefile rules as the GCC version apply
- MSVC building has only been tested with the 1998 and 2003 toolchains

================================================================================