Skip to content

datadope-io/zbx-load-testing

Repository files navigation

zbx-load-testing

A high-performance Zabbix load testing tool designed to simulate thousands of Zabbix agents sending metrics to Zabbix proxies/servers. This tool helps you stress-test your Zabbix infrastructure and identify performance bottlenecks.

Features

  • Massive Scale: Simulate up to 50,000+ hosts simultaneously
  • Active Agent Simulation: Simulates Zabbix agent (active) sending metrics via trapper protocol
  • LLD Support: Simulates Low-Level Discovery (LLD) rules with configurable intervals
  • Multi-Proxy Support: Distribute load across multiple Zabbix proxies
  • Real-time TUI: Terminal User Interface with live statistics and graphs
  • Performance Profiling: Built-in debug and profiling tools for troubleshooting
  • Multi-Instance Support: Distribute load testing across multiple servers
  • Smart Buffering: Configurable metric buffering with jitter to avoid thundering herd
  • Trigger Simulation: Simulate hosts with firing triggers
  • Self-Throttling: Automatically adjust load based on Zabbix server health

Architecture

zbx-load-testing → Zabbix Proxy(s) → Zabbix Server → PostgreSQL

The tool simulates Zabbix agents by:

  1. Creating hosts in Zabbix via API
  2. Sending metrics using the Zabbix sender protocol (trapper)
  3. Optionally sending LLD data to trigger auto-discovery
  4. Monitoring Zabbix health and adjusting load dynamically

Installation

Prerequisites

  • Go 1.21 or later
  • Access to a Zabbix server API
  • Network access to Zabbix proxies/server on port 10051

Build from Source

# Clone the repository
git clone <repository-url>
cd zbx-load-testing

# Build the binary
go build -o zbx-load-testing ./cmd/zbx-load-testing

# Or use the justfile
just build

Configuration

Create a config.yaml file (see config.yaml for a complete example):

# Zabbix API Connection
zabbix_api:
  url: "http://localhost:8088/api_jsonrpc.php"
  user: "Admin"
  password: "your-password"
  timeout_sec: 120

# Zabbix Sender Configuration
zabbix_sender:
  proxies:
    - name: "proxy1"
      ip: "192.168.1.10"
      port: 10051
    - name: "proxy2"
      ip: "192.168.1.11"
      port: 10051
  port: 10051
  buffer_size: 1000
  buffer_send_sec: 5
  buffer_jitter_percent: 0.2  # 20% jitter to spread load

# Test Configuration
test_run:
  name: "LoadTest-01"
  hosts_to_create: 10000
  hosts_to_simulate: 10000
  speed_multiplier: 1  # 1x = normal speed, 2x = 2x faster
  randomize_start_time: true
  staggering_window_sec: 3600  # Spread LLD over 1 hour
  disable_lld: false
  templates:
    - "Linux by Zabbix agent active"

# Self-Throttling
throttling:
  enabled: true
  check_interval_sec: 60
  queue_threshold: 10
  history_syncer_busy_pct: 80

# Trigger Simulation
triggers:
  firing_percentage: 5  # 5% of hosts will have firing triggers
  state_change_interval_sec: 300  # Change state every 5 minutes

Multi-Instance Configuration

To distribute load across multiple servers:

test_run:
  hosts_to_create: 30000
  hosts_to_simulate: 30000
  instance_id: 0        # Server 1: instance 0
  total_instances: 3    # Total of 3 servers

Each server gets assigned hosts round-robin style (server 0 gets hosts 0,3,6,9..., server 1 gets 1,4,7,10..., etc.)

Usage

Setup Phase

Create hosts in Zabbix:

./zbx-load-testing setup

This will:

  • Create host groups
  • Create hosts based on templates
  • Assign hosts to proxies
  • Configure items and discovery rules

Run Phase

Start the load test:

./zbx-load-testing run

The TUI will display:

  • Load generator statistics (hosts, NVPS, LLD/sec)
  • Zabbix server health (queue, busy %, NVPS)
  • Sender response times (min/max/avg/percentiles)
  • Real-time NVPS graph
  • Connections per second graph
  • Event logs

TUI Keyboard Controls

  • q - Quit the application
  • +/- - Adjust NVPS graph time scale
  • h - Time jump (add 1 hour when randomize_start_time enabled)
  • d - Toggle debug mode (see below)
  • p - Toggle profiling mode (see below)

Debug Mode

When you observe issues with metric sending (e.g., using tcpdump), enable debug mode:

Press d to enable debug mode

This will:

  • Create/append to debug.log file
  • Log all sender operations at DEBUG level
  • Include details about:
    • Metrics being sent (count, host, proxy)
    • Buffer flush operations
    • Send durations and responses
    • Timer resets and scheduling

Example debug output:

time=2025-10-27T15:30:45Z level=DEBUG msg="Flushing metrics" count=150 host=LoadTest-0001 proxy=proxy1
time=2025-10-27T15:30:45Z level=DEBUG msg="Sending metrics" count=150 proxy=proxy1 proxy_addr=192.168.1.10:10051
time=2025-10-27T15:30:45Z level=DEBUG msg="Metrics sent successfully" duration=45ms responseInfo="processed: 150; failed: 0; total: 150; seconds spent: 0.045" proxy=proxy1 proxy_addr=192.168.1.10:10051

Press d again to disable debug mode

Debug logs help identify:

  • Which hosts/proxies are affected
  • If buffers are flushing
  • Network send latency
  • Protocol-level errors

Profiling Mode

When metrics stop sending and you need deeper analysis of goroutines and blocking:

Press p to enable profiling

This enables:

  • Goroutine blocking profile collection
  • Block profile rate tracking

Press p again to capture profile snapshot

This creates three files:

1. goroutine.prof - Goroutine Profile

Binary profile showing all goroutines and their call stacks.

Analyze with:

# Interactive analysis
go tool pprof goroutine.prof

# Common commands inside pprof:
# - top       : Show top goroutines
# - list      : Show source code
# - web       : Generate graph (requires graphviz)
# - traces    : Show all stack traces

# Quick text output
go tool pprof -text goroutine.prof

# Generate SVG graph
go tool pprof -svg goroutine.prof > goroutines.svg

Look for:

  • High goroutine counts (potential leaks)
  • Goroutines stuck in chan send or chan receive
  • Blocked on network I/O

2. block.prof - Block Profile

Shows where goroutines are blocking on synchronization primitives.

Analyze with:

# Interactive analysis
go tool pprof block.prof

# Show blocking events
go tool pprof -text block.prof

# Generate graph of blocking
go tool pprof -svg block.prof > blocking.svg

Look for:

  • Mutex contention
  • Channel blocking (full/empty channels)
  • High blocking duration

3. profile-snapshot.txt - Human-Readable Summary

Contains:

  • Total goroutine count
  • Buffer states for all hosts
  • Channel saturation levels

Example snapshot:

Profile Snapshot - 2025-10-27T15:30:45Z
===========================================

Total Goroutines: 10523

Buffered Senders Status (10000 total):
Host                           Proxy           Buf Size    Buf Cap   Chan Len   Chan Cap
--------------------------------------------------------------------------------------------
LoadTest-0001                  proxy1                10       1000         25       1000
LoadTest-0002                  proxy1               150       1000        500       1000
LoadTest-0003                  proxy2                 0       1000       1000       1000  <- FULL CHANNEL!
...

What to look for:

  • Full channels (Chan Len == Chan Cap): Metrics not being consumed, likely network I/O blocking
  • Large buffer sizes: Metrics accumulating but not flushing
  • Unexpectedly high goroutine count: Potential goroutine leak

Debugging "Not Sending Metrics" Issues

Workflow:

  1. Monitor with tcpdump to detect when metrics stop:

    tcpdump -i any -n port 10051
  2. When you see the issue, press d to enable debug mode

  3. Wait 10-30 seconds to collect logs

  4. Press p to enable profiling

  5. Wait a few seconds to collect blocking data

  6. Press p again to capture the profile snapshot

  7. Analyze the files:

    # Check debug logs for sending activity
    tail -f debug.log
    
    # Check buffer states
    cat profile-snapshot.txt
    
    # Analyze goroutines
    go tool pprof goroutine.prof
    
    # Check for blocking
    go tool pprof block.prof

Common findings:

  • Network I/O blocking: Goroutines stuck in network send, visible in goroutine profile
  • Channel saturation: Full metric channels in snapshot, indicates buffering issues
  • Mutex contention: High blocking on mutexes in block profile
  • Proxy connectivity: Debug logs show connection errors to specific proxies

Cleanup Phase

Remove all created hosts:

./zbx-load-testing cleanup

Debug Mode (Console)

For non-interactive debugging:

# Run with debug output to console
./zbx-load-testing run --debug

# Adjust verbosity (0=warn, 1=info, 2=debug, 3=trace)
./zbx-load-testing run --debug -v 3

Understanding the Metrics

NVPS (New Values Per Second)

  • Calculated NVPS: Actual metrics sent by the tool
  • Theoretical NVPS: Expected metrics based on item intervals
  • Reported NVPS: What Zabbix server reports receiving
  • DB Synced NVPS: What Zabbix has written to database

Sender Stats

Shows response times for Zabbix sender protocol:

  • Min/Max/Avg: Response time range
  • Percentiles: P50, P75, P90, P95, P99, etc.

High percentiles indicate network or Zabbix server issues.

Zabbix Health

  • Queue (>1m): Items waiting more than 1 minute
  • Queue (>10m): Items waiting more than 10 minutes
  • Process Busy %: How busy each Zabbix process type is
    • High History Syncer busy % indicates database bottleneck
    • High Trapper busy % indicates receiving bottleneck

Performance Tips

Avoiding Thundering Herd

Set buffer_jitter_percent to spread metric sending:

zabbix_sender:
  buffer_send_sec: 5
  buffer_jitter_percent: 0.2  # Each host gets random 0-1s offset

Optimizing for High Load

  1. Disable LLD if not testing discovery:

    test_run:
      disable_lld: true
  2. Increase speed multiplier for faster item intervals:

    test_run:
      speed_multiplier: 2  # 2x faster than normal
  3. Tune buffer sizes:

    zabbix_sender:
      buffer_size: 5000      # Larger buffer
      buffer_send_sec: 10    # Flush less frequently
  4. Distribute across proxies:

    zabbix_sender:
      proxies:
        - name: "proxy1"
        - name: "proxy2"
        - name: "proxy3"

Memory Considerations

The generate_report option stores ALL sender delays and connection history in memory. For long-running tests with many hosts, this can consume significant memory.

Disable for long tests:

test_run:
  generate_report: false

Troubleshooting

"Error connecting to Zabbix API"

  • Check zabbix_api.url is correct
  • Verify credentials
  • Ensure network connectivity

"No proxies defined"

  • Add at least one proxy to zabbix_sender.proxies

"Hosts not created"

  • Check templates exist in Zabbix
  • Verify API user has permissions
  • Check logs for specific errors

Low NVPS / Metrics not arriving

  1. Enable debug mode (d key)
  2. Check debug.log for send errors
  3. Enable profiling (p key) to check for blocking
  4. Use tcpdump to verify network traffic
  5. Check Zabbix proxy/server logs

High memory usage

  • Disable generate_report
  • Reduce hosts_to_simulate
  • Check for goroutine leaks with profiling

Project Structure

zbx-load-testing/
├── cmd/zbx-load-testing/    # Main application
│   ├── main.go
│   ├── setup.go             # Host creation
│   ├── run.go               # Load test execution
│   └── cleanup.go           # Host removal
├── internal/
│   ├── config/              # Configuration management
│   ├── tui/                 # Terminal UI
│   └── zabbix/
│       ├── api/             # Zabbix API client
│       ├── sender.go        # Zabbix sender protocol
│       └── buffered_sender.go  # Buffered metric sending
└── config.yaml              # Configuration file

Contributing

When contributing, please:

  • Follow Go best practices
  • Add tests for new features
  • Update documentation
  • Use just build to build
  • Test with various load levels

License

This project is licensed under the GNU General Public License v3.0 (GPL-3.0).

This means you are free to:

  • Use the software for any purpose
  • Study and modify the source code
  • Share the software with others
  • Share your modifications

However, if you distribute modified versions, you must:

  • Make the modified source code available under GPL-3.0
  • Document the changes you made
  • Include the same license

See the LICENSE file for the full license text.

Credits

Developed for stress-testing and validating Zabbix infrastructure at scale.

About

Application to load testing Zabbix

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •