Skip to content

WeiNyn/lflog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

35 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

lflog

Query log files with SQL using DataFusion and regex pattern macros.

Features

  • πŸ” SQL Queries - Query log files using familiar SQL syntax via DataFusion
  • 🧩 Pattern Macros - Use intuitive macros like {{timestamp:datetime("%Y-%m-%d")}} instead of raw regex
  • πŸ“Š Type Inference - Automatic schema generation with proper types (Int32, Float64, String)
  • ⚑ Fast - Leverages DataFusion's optimized query engine with parallel processing
  • πŸ“ Glob Patterns - Query multiple files at once with patterns like logs/*.log
  • 🏷️ Metadata Columns - Access file path (__FILE__) and raw log lines (__RAW__)
  • πŸ“ Config Profiles - Define reusable log profiles in TOML config files
  • πŸ’» Interactive REPL - Query logs interactively with command history

Why lflog?

Comparison: Count errors by log level

Tool Command
lflog
lflog access.log \
  --pattern '^\[{{time:any}}\] \[{{level:var_name}}\] {{msg:any}}$' \
  --query "SELECT level, COUNT(*) FROM log GROUP BY level"
awk
awk -F'[][]' '{print $4}' access.log | sort | uniq -c | sort -rn
# Or with proper parsing:
awk 'match($0, /\[[^\]]+\] \[([^\]]+)\]/, m) {count[m[1]]++} 
     END {for (l in count) print l, count[l]}' access.log
DuckDB
SELECT 
    regexp_extract(line, '\[[^\]]+\] \[([^\]]+)\]', 1) as level,
    COUNT(*) as count
FROM read_csv('access.log', columns={'line': 'VARCHAR'}, 
              header=false, delim=E'\x1F')
GROUP BY level;

Key Advantages

Feature lflog awk/grep DuckDB
Pattern syntax {{level:var_name}} Raw regex Raw regex
Named fields βœ… Built-in ❌ Manual indexing ❌ regexp_extract() per field
SQL queries βœ… Full SQL ❌ Not available βœ… Full SQL
Type inference βœ… Automatic ❌ All strings ❌ Manual
Multi-file glob βœ… 'logs/*.log' ⚠️ Shell expansion βœ… Supported
Source tracking βœ… __FILE__ column ❌ Manual ❌ Manual
Aggregations βœ… SQL GROUP BY ⚠️ Complex piping βœ… SQL GROUP BY
Joins βœ… Supported ❌ Not available βœ… Supported

Run the comparison demo: ./examples/duckdb_comparison.sh

Run the complex analysis demo: ./examples/complex_analysis_demo.sh (showcases multi-source analysis, security log inspection, and advanced SQL queries)

Installation

cargo build --release

CLI Usage

lflog <log_file> [OPTIONS]

Options

Option Description
-c, --config <path> Config file (default: ~/.config/lflog/config.toml or LFLOG_CONFIG env)
-p, --profile <name> Use profile from config
--pattern <regex> Inline pattern (overrides profile)
-t, --table <name> Table name for SQL (default: log)
-q, --query <sql> Execute SQL query (omit for interactive mode)
-f, --add-file-path Add __FILE__ column with source file path
-r, --add-raw Add __RAW__ column with raw log line
-n, --num-threads <N> Number of threads (default: 8, or LFLOGTHREADS env)

Examples

# Query with inline pattern
lflog loghub/Apache/Apache_2k.log \
  --pattern '^\[{{time:any}}\] \[{{level:var_name}}\] {{message:any}}$' \
  --query "SELECT * FROM log WHERE level = 'error' LIMIT 10"

# Query multiple files with glob pattern
lflog 'logs/*.log' \
  --pattern '{{ts:datetime}} [{{level:var_name}}] {{msg:any}}' \
  --query "SELECT * FROM log"

# Include file path and raw line in results
lflog 'logs/*.log' --pattern '...' \
  --add-file-path --add-raw \
  --query 'SELECT level, "__FILE__", "__RAW__" FROM log'

# Query with config profile
lflog /var/log/apache.log --profile apache --query "SELECT * FROM log LIMIT 5"

# Interactive REPL mode
lflog server.log --pattern '{{ts:datetime}} [{{level:var_name}}] {{msg:any}}'
> SELECT * FROM log WHERE level = 'error'
> SELECT level, COUNT(*) FROM log GROUP BY level
> .exit

Demos (Loghub)

lflog includes a comprehensive set of demos using the Loghub dataset collection. These demos showcase how to query 16 different types of system logs (Android, Apache, Hadoop, HDFS, Linux, Spark, etc.).

To run a demo:

# 1. Go to the demo scripts directory
cd examples/loghub_demos/scripts

# 2. Run the demo for a specific dataset (e.g., Apache)
./run_demo.sh apache

See examples/loghub_demos/README.md for the full list of available datasets and more details.

Config File

Create ~/.config/lflog/config.toml:

# Global custom macros
[[custom_macros]]
name = "timestamp"
pattern = '\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}'
type_hint = "DateTime"

# Apache log profile
[[profiles]]
name = "apache"
description = "Apache error log format"
pattern = '^\[{{time:datetime("%a %b %d %H:%M:%S %Y")}}\] \[{{level:var_name}}\] {{message:any}}$'

# Nginx access log profile
[[profiles]]
name = "nginx"
pattern = '{{ip:ip}} - - \[{{time:any}}\] "{{method:var_name}} {{path:any}}" {{status:number}} {{bytes:number}}'

Pattern Macros

Macro Description Type
{{field:number}} Integer (digits) Int32
{{field:float}} Floating point number Float64
{{field:string}} Non-greedy string String
{{field:any}} Non-greedy match all String
{{field:var_name}} Identifier ([A-Za-z_][A-Za-z0-9_]*) String
{{field:datetime("%fmt")}} Datetime with strftime format String
{{field:enum(a,b,c)}} One of the listed values String
{{field:uuid}} UUID format String
{{field:ip}} IPv4 address String

You can also use raw regex with named capture groups:

^(?P<ip>\d+\.\d+\.\d+\.\d+) - (?P<method>\w+)

Metadata Columns

When enabled, lflog adds special metadata columns to your query results:

Column Flag Description
__FILE__ -f, --add-file-path Absolute path of the source log file
__RAW__ -r, --add-raw The original, unparsed log line

These are useful when querying multiple files or when you need to see the original log line alongside parsed fields:

# Find errors across all log files with their source
lflog 'logs/*.log' --pattern '...' --add-file-path \
  --query 'SELECT "__FILE__", level, message FROM log WHERE level = '\''error'\'''

Note: Use double quotes around __FILE__ and __RAW__ in SQL to preserve case.

Library Usage

use lflog::{LfLog, QueryOptions};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // With inline pattern
    let lflog = LfLog::new();
    
    lflog.register(
        QueryOptions::new("access.log")
            .with_pattern(r#"^\[{{time:any}}\] \[{{level:var_name}}\] {{message:any}}$"#)
    )?;
    
    lflog.query_and_show("SELECT * FROM log WHERE level = 'error'").await?;
    Ok(())
}

With glob patterns and metadata columns:

use lflog::{LfLog, QueryOptions};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let lflog = LfLog::new();
    
    // Query multiple files with metadata columns
    lflog.register(
        QueryOptions::new("logs/*.log")  // Glob pattern
            .with_pattern(r#"^\[{{time:any}}\] \[{{level:var_name}}\] {{message:any}}$"#)
            .with_add_file_path(true)    // Add __FILE__ column
            .with_add_raw(true)          // Add __RAW__ column
            .with_num_threads(Some(4))   // Use 4 threads
    )?;
    
    lflog.query_and_show(r#"SELECT level, "__FILE__" FROM log WHERE level = 'error'"#).await?;
    Ok(())
}

Or with config profiles:

use lflog::{LfLog, QueryOptions};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let lflog = LfLog::from_config("~/.config/lflog/config.toml")?;
    
    lflog.register(
        QueryOptions::new("/var/log/apache.log")
            .with_profile("apache")
    )?;
    
    let df = lflog.query("SELECT level, COUNT(*) FROM log GROUP BY level").await?;
    df.show().await?;
    Ok(())
}

Project Structure

src/
β”œβ”€β”€ lib.rs              # Public API
β”œβ”€β”€ app.rs              # LfLog application struct
β”œβ”€β”€ types.rs            # FieldType enum
β”œβ”€β”€ scanner.rs          # Pattern matching
β”œβ”€β”€ macros/             # Macro expansion
β”‚   β”œβ”€β”€ parser.rs       # Config & macro parsing
β”‚   └── expander.rs     # Macro to regex expansion
β”œβ”€β”€ datafusion/         # DataFusion integration
β”‚   β”œβ”€β”€ builder.rs
β”‚   β”œβ”€β”€ provider.rs
β”‚   └── exec.rs
└── bin/
    β”œβ”€β”€ lflog.rs        # Main CLI
    └── lf_run.rs       # Simple runner (deprecated)

Performance

lflog is designed for high performance, leveraging zero-copy parsing and DataFusion's vectorized execution engine.

Benchmarks

Parsing an Apache error log (168MB, 2 million lines):

Query Time Throughput
SELECT count(*) FROM log WHERE level = 'error' ~450ms ~370 MB/s (4.4M lines/s)
SELECT count(*) FROM log WHERE message LIKE '%error%' ~450ms ~370 MB/s

Tested on Linux, single-threaded execution (default).

Optimizations

  • Zero-Copy Parsing: Parses log lines directly from memory-mapped files without intermediate String allocations.
  • Pre-calculated Regex Indices: Resolves capture group indices once at startup, avoiding repeated string lookups in the hot loop.
  • Parallel Execution: Automatically partitions files for parallel processing (configurable via LFLOGTHREADS).

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages