Skip to content
/ clp Public
forked from y-scope/clp

Compressed Log Processor (CLP) is a free tool capable of compressing text logs and searching the compressed logs without decompression.

License

Notifications You must be signed in to change notification settings

junhaoliao/clp

 
 

Repository files navigation

CLP

Open bug reports Open feature requests CLP on Zulip

YScope's Compressed Log Processor (CLP) compresses your logs, and allows you to search the compressed logs without decompression. CLP supports both JSON logs and unstructured (i.e., free text) logs. It also supports real-time log compression within several logging libraries. CLP also includes purpose-built web interfaces for searching and viewing the compressed logs. To learn more about it, read our 2021 paper about handling unstructured logs and our 2024 paper on extending it to JSON logs.

Benchmarks

CLP Benchmark on JSON Logs CLP Benchmark on Unstructured Logs

The figures above show CLP's compression and search performance compared to other tools. We separate the experiments between JSON and unstructured logs because (1) some tools can only handle one type of logs, and (2) tools that can handle both types often have different designs for each type (such as CLP).

Compression ratio is measured as the average across a variety of log datasets. Some of these datasets can be found here. Search performance is measured using queries on the MongoDB logs (for JSON) and the Hadoop logs (for unstructured logs). Note that CLP uses an index-less design, so for a fair comparison, we disabled MongoDB and PostgreSQL's indexes; If we left them enabled, MongoDB and PostgreSQL's compression ratio would be worse. We didn't disable indexing for Elasticsearch or Splunk since these tools are fundamentally index-based (i.e., logs cannot be searched without indexes). More details about our experimental methodology can be found in the 2021 paper and the 2024 paper.

System Overview

CLP systems overview

CLP provides an end-to-end log management pipeline consisting of compression, search, analytics, and viewing. The figure above shows the CLP ecosystem architecture. It consists of the following features:

  • Compression and Search: CLP compresses logs into archives, which can be searched and analyzed in a web UI. The input can either be raw logs or CLP's compressed IR (intermediate representation) produced by CLP's logging libraries.

  • Real-time Compression with CLP Logging Libraries: CLP provides logging libraries for Python and Java (Log4j and Logback). The logging libraries compress logs in real-time, so only compressed logs are written to disk or transmitted over the network. The compressed logs use CLP's intermediate representation (IR) format which achieves a higher compression ratio than general purpose compressors like Zstandard. Compressing IR into archives can further double the compression ratio and enable global search, but this requires more memory usage as it needs to buffer enough logs. More details on IR versus archives can be found in this Uber Engineering Blog.

  • Log Viewer: the compressed IR can be viewed in a web-based log viewer. Compared to viewing the logs in an editor, CLP's log viewer supports advanced features like filtering logs based on log level verbosity (e.g., only displaying logs with log level equal or higher than ERROR). These features are possible because CLP's logging libraries parse the logs before compressing them into IR.

  • IR Analytics Libraries: we also provide a Python library and a Go library that can analyze compressed IR.

  • Log parser: CLP also includes a custom pushdown-automata-based log parser that is 3x faster than state-of-the-art regular expression engines like RE2. The log parser is available as a library that can be used by other applications.

Getting Started

You can download a release package which includes support for distributed compression and search. Or, to quickly try CLP's core compression and search, you can use a prebuilt container.

We also have guides for building the package and CLP core from source.

For some logs you can use to test CLP, check out our open-source datasets.

Docs

You can find our docs online or view the source in docs/src.

Providing Feedback

You can use GitHub issues to report a bug or request a feature.

Join us on Zulip to chat with developers and other community members.

Next Steps

This is our open-source release which we will be constantly updating with bug fixes, features, etc. If you would like a feature or want to report a bug, please file an issue and we'll be happy to engage.

About

Compressed Log Processor (CLP) is a free tool capable of compressing text logs and searching the compressed logs without decompression.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages

  • C++ 84.4%
  • Python 7.7%
  • JavaScript 4.2%
  • CMake 2.7%
  • Shell 0.8%
  • SCSS 0.1%
  • Other 0.1%