Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 100 additions & 0 deletions .semaphore/semaphore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -275,6 +275,106 @@ blocks:
- python3 -m venv _venv && source _venv/bin/activate
- chmod u+r+x tools/source-package-verification.sh
- tools/source-package-verification.sh
- name: "Ducktape Performance Tests (Linux x64)"
dependencies: []
task:
agent:
machine:
type: s1-prod-ubuntu24-04-amd64-3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, how is the machine type chosen?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"s1-prod-" this is a production grade system, which is used by other semaphore jobs. IMO, its already timetested for this repository, so we should keep using it.

env_vars:
- name: OS_NAME
value: linux
- name: ARCH
value: x64
- name: BENCHMARK_BOUNDS_CONFIG
value: tests/ducktape/benchmark_bounds.json
- name: BENCHMARK_ENVIRONMENT
value: ci
prologue:
commands:
- '[[ -z $DOCKERHUB_APIKEY ]] || docker login --username $DOCKERHUB_USER --password $DOCKERHUB_APIKEY'
jobs:
- name: Build and Tests
commands:
# Setup Python environment
- sem-version python 3.9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess using an older python version will catch more perf issues but I worry new versions might have behavioral differences we'd miss. e.g. some of our C-bindings we use are deprecated in 3.13.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be ok with leaving a TODO to matrix this across a couple python versions down the line. What do you think?

- python3 -m venv _venv && source _venv/bin/activate

# Install ducktape framework and additional dependencies
- pip install ducktape psutil

# Install existing test requirements
- pip install -r requirements/requirements-tests.txt

# Build and install confluent-kafka from source
- lib_dir=dest/runtimes/$OS_NAME-$ARCH/native
- tools/wheels/install-librdkafka.sh "${LIBRDKAFKA_VERSION#v}" dest
- export CFLAGS="$CFLAGS -I${PWD}/dest/build/native/include"
- export LDFLAGS="$LDFLAGS -L${PWD}/${lib_dir}"
- export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$PWD/$lib_dir"
- python3 -m pip install -e .

# Store project root for reliable navigation
- PROJECT_ROOT="${PWD}"

# Start Kafka cluster and Schema Registry using dedicated ducktape compose file (KRaft mode)
- cd "${PROJECT_ROOT}/tests/docker"
- docker-compose -f docker-compose.ducktape.yml up -d kafka schema-registry

# Debug: Check container status and logs
- echo "=== Container Status ==="
- docker-compose -f docker-compose.ducktape.yml ps
- echo "=== Kafka Logs ==="
- docker-compose -f docker-compose.ducktape.yml logs kafka | tail -50

# Wait for Kafka to be ready (using PLAINTEXT listener for external access)
- |
timeout 1800 bash -c '
counter=0
until docker-compose -f docker-compose.ducktape.yml exec -T kafka kafka-topics --bootstrap-server localhost:9092 --list >/dev/null 2>&1; do
echo "Waiting for Kafka... (attempt $((counter+1)))"

# Show logs every 4th attempt (every 20 seconds)
if [ $((counter % 4)) -eq 0 ] && [ $counter -gt 0 ]; then
echo "=== Recent Kafka Logs ==="
docker-compose -f docker-compose.ducktape.yml logs --tail=10 kafka
echo "=== Container Status ==="
docker-compose -f docker-compose.ducktape.yml ps kafka
fi

counter=$((counter+1))
sleep 5
done
'
- echo "Kafka cluster is ready!"

# Wait for Schema Registry to be ready
- echo "=== Waiting for Schema Registry ==="
- |
timeout 300 bash -c '
counter=0
until curl -f http://localhost:8081/subjects >/dev/null 2>&1; do
echo "Waiting for Schema Registry... (attempt $((counter+1)))"

# Show logs every 3rd attempt (every 15 seconds)
if [ $((counter % 3)) -eq 0 ] && [ $counter -gt 0 ]; then
echo "=== Recent Schema Registry Logs ==="
docker-compose -f docker-compose.ducktape.yml logs --tail=10 schema-registry
echo "=== Schema Registry Container Status ==="
docker-compose -f docker-compose.ducktape.yml ps schema-registry
fi

counter=$((counter+1))
sleep 5
done
'
- echo "Schema Registry is ready!"

# Run standard ducktape tests with CI bounds
- cd "${PROJECT_ROOT}" && PYTHONPATH="${PROJECT_ROOT}" python tests/ducktape/run_ducktape_test.py

# Cleanup
- cd "${PROJECT_ROOT}/tests/docker" && docker-compose -f docker-compose.ducktape.yml down -v || true
- name: "Packaging"
run:
when: "tag =~ '.*'"
Expand Down
36 changes: 36 additions & 0 deletions tests/docker/docker-compose.ducktape.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
services:
kafka:
image: confluentinc/cp-kafka:latest
container_name: kafka-ducktape
ports:
- "9092:9092"
- "29092:29092"
environment:
KAFKA_NODE_ID: 1
KAFKA_PROCESS_ROLES: broker,controller
KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093,PLAINTEXT_HOST://0.0.0.0:29092
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092,PLAINTEXT_HOST://dockerhost:29092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
CLUSTER_ID: 4L6g3nShT-eMCtK--X86sw
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my knowledge, is this the cluster dedicated for kafka testing? https://github.com/search?q=org%3Aconfluentinc%204L6g3nShT-eMCtK--X86sw&type=code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes Each CI job spins up its own isolated Kafka container with this ID


schema-registry:
image: confluentinc/cp-schema-registry:latest
container_name: schema-registry-ducktape
depends_on:
- kafka
ports:
- "8081:8081"
extra_hosts:
- "dockerhost:172.17.0.1"
environment:
SCHEMA_REGISTRY_HOST_NAME: schema-registry
SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: dockerhost:29092
SCHEMA_REGISTRY_LISTENERS: http://0.0.0.0:8081
SCHEMA_REGISTRY_KAFKASTORE_TOPIC_REPLICATION_FACTOR: 1
SCHEMA_REGISTRY_DEBUG: 'true'
97 changes: 89 additions & 8 deletions tests/ducktape/README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,105 @@
# Ducktape Producer Tests

Ducktape-based producer tests for the Confluent Kafka Python client.
Ducktape-based producer tests for the Confluent Kafka Python client with comprehensive performance metrics.

## Prerequisites

- `pip install ducktape confluent-kafka`
- Kafka running on `localhost:9092`
- `pip install ducktape confluent-kafka psutil`
- Kafka running on `localhost:9092` (PLAINTEXT listener - ducktape tests use the simple port)
- Schema Registry running on `localhost:8081` (uses `host.docker.internal:29092` for Kafka connection)

## Running Tests

```bash
# Run all tests
# Run all tests with integrated performance metrics
./tests/ducktape/run_ducktape_test.py

# Run specific test
# Run specific test with metrics
./tests/ducktape/run_ducktape_test.py SimpleProducerTest.test_basic_produce
```

## Test Cases

- **test_basic_produce**: Basic message production with callbacks
- **test_produce_multiple_batches**: Parameterized tests (5, 10, 50 messages)
- **test_produce_with_compression**: Matrix tests (none, gzip, snappy)
- **test_basic_produce**: Basic message production with integrated metrics tracking
- **test_produce_multiple_batches**: Parameterized tests (2s, 5s, 10s durations) with metrics
- **test_produce_with_compression**: Matrix tests (none, gzip, snappy) with compression-aware metrics

## Integrated Performance Metrics Features

Every test automatically includes:

- **Latency Tracking**: P50, P95, P99 percentiles with real-time calculation
- **Per-Topic/Partition Metrics**: Detailed breakdown by topic and partition
- **Memory Monitoring**: Peak memory usage and growth tracking with psutil
- **Batch Efficiency**: Messages per poll and buffer utilization analysis
- **Throughput Validation**: Messages/sec and MB/sec with configurable bounds checking
- **Comprehensive Reporting**: Detailed performance reports with pass/fail validation
- **Automatic Bounds Validation**: Performance assertions against configurable thresholds

## Configuration

Performance bounds are loaded from an environment-based JSON config file. By default, it loads `benchmark_bounds.json`, but you can override this with the `BENCHMARK_BOUNDS_CONFIG` environment variable.

### Environment-Based Configuration

The bounds configuration supports different environments with different performance thresholds:

```json
{
"_comment": "Performance bounds for benchmark tests by environment",
"local": {
"_comment": "Default bounds for local development - more relaxed thresholds",
"min_throughput_msg_per_sec": 1000.0,
"max_p95_latency_ms": 2000.0,
"max_error_rate": 0.02,
"min_success_rate": 0.98,
"max_p99_latency_ms": 3000.0,
"max_memory_growth_mb": 800.0,
"max_buffer_full_rate": 0.05,
"min_messages_per_poll": 10.0
},
"ci": {
"_comment": "Stricter bounds for CI environment - production-like requirements",
"min_throughput_msg_per_sec": 1500.0,
"max_p95_latency_ms": 1500.0,
"max_error_rate": 0.01,
"min_success_rate": 0.99,
"max_p99_latency_ms": 2500.0,
"max_memory_growth_mb": 600.0,
"max_buffer_full_rate": 0.03,
"min_messages_per_poll": 15.0
},
"_default_environment": "local"
}
```

### Environment Selection

- **BENCHMARK_ENVIRONMENT**: Selects which environment bounds to use (`local`, `ci`, etc.)
- **Default**: Uses "local" environment if not specified
- **CI**: Automatically uses "ci" environment in CI pipelines

Usage:
```bash
# Use default environment (local)
./run_ducktape_test.py

# Explicitly use local environment
BENCHMARK_ENVIRONMENT=local ./run_ducktape_test.py

# Use CI environment with stricter bounds
BENCHMARK_ENVIRONMENT=ci ./run_ducktape_test.py

# Use different config file entirely
BENCHMARK_BOUNDS_CONFIG=custom_bounds.json ./run_ducktape_test.py
```

```python
from benchmark_metrics import MetricsBounds

# Loads from BENCHMARK_BOUNDS_CONFIG env var, or benchmark_bounds.json if not set
bounds = MetricsBounds()

# Or load from a specific config file
bounds = MetricsBounds.from_config_file("my_bounds.json")
```
26 changes: 26 additions & 0 deletions tests/ducktape/benchmark_bounds.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{
"_comment": "Performance bounds for benchmark tests by environment",
"local": {
"_comment": "Default bounds for local development - more relaxed thresholds",
"min_throughput_msg_per_sec": 1000.0,
"max_p95_latency_ms": 2000.0,
"max_error_rate": 0.02,
"min_success_rate": 0.98,
"max_p99_latency_ms": 3000.0,
"max_memory_growth_mb": 800.0,
"max_buffer_full_rate": 0.05,
"min_messages_per_poll": 10.0
},
"ci": {
"_comment": "Stricter bounds for CI environment - production-like requirements",
"min_throughput_msg_per_sec": 1500.0,
"max_p95_latency_ms": 1500.0,
"max_error_rate": 0.01,
"min_success_rate": 0.99,
"max_p99_latency_ms": 2500.0,
"max_memory_growth_mb": 600.0,
"max_buffer_full_rate": 0.03,
"min_messages_per_poll": 10.0
},
"_default_environment": "local"
}
Loading