Skip to content

y-scope/log-surgeon

Repository files navigation

log-surgeon: A performant log parsing library

CLP on Zulip

log-surgeon is a library for high-performance parsing of unstructured text logs. It allows users to parse and extract information from the vast amount of unstructured logs generated by today's open-source software.

Some of the library's features include:

  • Parsing and extracting variable values like the log event's log-level and any other user-specified variables, no matter where they appear in each log event.
  • Parsing by using regular expressions for each variable type rather than regular expressions for an entire log event.
  • Improved latency, and memory efficiency compared to popular regex engines.
  • Parsing multi-line log events (delimited by timestamps).

Note that log-surgeon is not a generic regex engine and does impose some constraints on how log events can be parsed.

Motivating example

Let's say we want to parse and inspect multi-line log events like this:

2023-02-23T18:10:14-0500 DEBUG task_123 crashed. Dumping stacktrace:
#0  0x000000000040110e in bar () at example.cpp:6
#1  0x000000000040111d in bar () at example.cpp:10
#2  0x0000000000401129 in main () at example.cpp:15

Using the example schema file which includes these rules:

timestamp:\d{4}\-\d{2}\-\d{2}T\d{2}:\d{2}:\d{2}\-\d{4}
...
loglevel:INFO|DEBUG|WARN|ERROR

We can parse and inspect the events as follows:

// Define a reader to read from your data source
Reader reader{/* <Omitted> */};

// Instantiate the parser
ReaderParser parser{"examples/schema.txt"};
parser.reset_and_set_reader(reader);

// Get the loglevel variable's ID
optional<uint32_t> loglevel_id{parser.get_variable_id("loglevel")};
// <Omitted validation of loglevel_id>

while (false == parser.done()) {
    if (ErrorCode err{parser.parse_next_event()}; ErrorCode::Success != err) {
        throw runtime_error("Parsing Failed");
    }

    // Get and print the timestamp
    Token* timestamp{event.get_timestamp()};
    if (nullptr != timestamp) {
        cout << "timestamp: " << timestamp->to_string_view() << endl;
    }

    // Get and print the log-level
    auto const& loglevels = event.get_variables(*loglevel_id);
    if (false == loglevels.empty()) {
        // In case there are multiple matches, just get the first one
        cout << "loglevel:" << loglevels[0]->to_string_view() << endl;
    }

    // Other analysis...

    // Print the entire event
    LogEventView const& event = parser.get_log_parser().get_log_event_view();
    cout << event->to_string() << endl;
}

For advanced uses, log-surgeon also has a BufferParser that reads directly from a buffer.

Building and installing

Requirements:

  • CMake >= 3.22.1
  • GCC >= 10 or Clang >= 7
  • Catch2 >= 3.8.1
  • fmt >= 8.0.1
  • GSL >= 4.0.0
  • Task >= 3.38
  • uv >= 0.7.10

To build and install the project to $HOME/.local:

task log-surgeon:install-release INSTALL_PREFIX="$HOME/.local"

Or to only build the project:

task log-surgeon:build-release

To build the debug version:

task log-surgeon:build-debug

Examples

examples contains programs demonstrating usage of the library. See examples/README.md for information on building and running the examples.

Documentation

Documentation site

The project includes a documentation site that's useful for exploring functionality and test coverage. In particular, it documents all unit tests, with additional detail for API-level tests.

To generate and view the files:

  • Run task docs:site.
  • Open build/docs/html/index.html in your preferred browser.

To host the site locally and view it:

  • Run task docs:serve.
  • Open the URL output by the task in your preferred browser.

Testing

To build and run all unit tests:

task test:run-debug

When generating targets, the CMake variable BUILD_TESTING is followed (unless overruled by setting log_surgeon_BUILD_TESTING to false). By default, if built as a top-level project, BUILD_TESTING is set to true and unit tests are built.

Linting

Before submitting a PR, ensure you've run our linting tools and either fixed any violations or suppressed the warning.

Running the linters

To report all errors, run:

task lint:check

To automatically fix any supported format or linting errors, run:

task lint:fix

Providing feedback

You can use GitHub issues to report a bug or request a feature.

Join us on Zulip to chat with developers and other community members.

Known issues

The following are issues we're aware of and working on:

  • Schema rules must use ASCII characters. We will release UTF-8 support in a future release.
  • Timestamps must appear at the start of the message to be handled specially (than other variable values) and support multi-line log events.
  • A variable pattern has no way to match text around a variable, without having it also be a part of the variable.
    • Support for submatch extraction will be coming in a future release.

About

A performant log parsing library

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 6