Skip to content

Add performance test for Dragonfly SDK Proxy#1945

Open
bergwolf wants to merge 13 commits into
dragonflyoss:masterfrom
bergwolf:perftest
Open

Add performance test for Dragonfly SDK Proxy#1945
bergwolf wants to merge 13 commits into
dragonflyoss:masterfrom
bergwolf:perftest

Conversation

@bergwolf
Copy link
Copy Markdown
Member

Add a daily scheduled perf test for Draognfly SDK proxy. It helps to create a perf-test image that can be reused for perf test in different deployments.

Example run:
https://github.com/bergwolf/nydus/actions/runs/26274407769

bergwolf added 13 commits May 22, 2026 07:24
Introduce a complete performance testing framework for benchmarking Nydus image
loading through Dragonfly's SDK proxy mode. The harness builds a minimal container
image containing static nydusd, nydusctl, crane, and a Go workload that mounts
a Nydus image via FUSE and reads all files in parallel.

The GitHub Actions workflow orchestrates a full Dragonfly cluster (MySQL, Redis,
manager, scheduler, dfdaemon) and runs the benchmark against configurable Nydus
images. Results are captured in JSON format with assertions for successful
workload completion and non-zero bytes read.

Key components:
- Static musl-based perftest image with all required binaries
- Go workload for parallel file reading and metrics collection
- GitHub Actions workflow for CI integration
- Makefile target for local image building
- Configuration templates and documentation

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
…ists

Replace local artifact upload/download with direct push/pull to GHCR.
Compute a content-addressed tag from the commit SHA and lowercase repo owner,
then push the image once and reuse it across jobs.

Add a manifest check so the build step is skipped if the tag already exists,
saving time on re-runs or multiple workflows on the same commit.

Update permissions and login steps so both build and benchmark jobs can read
and write packages.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Enable performance tests to pull nydus images from private registries by injecting
registry credentials via REGISTRY_AUTH secret.

The workflow now forwards the base64-encoded credentials to the perftest container,
and the entrypoint script creates a docker config.json for crane authentication.
Security options are also added to allow unconfined apparmor/seccomp for full
container privileges during testing.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Switch runtime base from Alpine to Ubuntu so a host-built nydusd can be
bind-mounted over /usr/local/bin/nydusd without musl/glibc conflicts.
Update entrypoint to validate and log the selected binary and version,
recording both in result.json and the printed summary.

Document the bind-mount workflow in README.md so developers can test
daemon changes without rebuilding the perftest image.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
… manifest

Add environment variables and config template support for Dragonfly streaming prefetch feature, including
thread count, bandwidth limit, and retry settings. Also add DIGEST_VALIDATE option for RAFS metadata
validation.

Create a comprehensive Kubernetes pod manifest (pod.yaml) that demonstrates all perftest configuration
options with proper Dragonfly service endpoints for cluster deployments.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Add DIGEST_VALIDATE=true and STREAM_PREFETCH=true environment variables to the performance test
container configuration to enable additional validation and optimization features during testing.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Add hostNetwork to the example pod so the benchmark container can reach external registries and
the Dragonfly SDK can discover its local IP address. This avoids CNI configurations that might
block outbound traffic or hide the pod’s routable address.

Install iproute2 in the image so tools like ip and ss are available for network diagnostics
inside the container.

Update documentation to explain when and why hostNetwork is needed and remind users to remove
it only if their pod network already provides a working default route.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Add busybox package to the performance test container to support additional shell utilities
needed for test scenarios and debugging.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Add ENABLE_DEFAULT_ROUTE_WORKAROUND option to automatically create a default IPv4 route when the pod has only
endpoint-specific routes. This fixes Dragonfly SDK local IP discovery in environments without a default route.

The workaround derives the default route from existing routes to Dragonfly scheduler, proxy, or registry
endpoints. It requires CAP_NET_ADMIN capability or privileged container mode to modify routing tables.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Extend the performance test report with detailed cache hit ratio, backend I/O size distribution, average latency,
total fetched data, and network efficiency metrics. These additions help analyze nydusd blob cache effectiveness
and backend storage access patterns.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Enhance the performance test output with comprehensive prefetch statistics and cache analysis. The new
metrics include prefetch data amount, average merge size, latency, bandwidth, and IO breakdown between
prefetch and on-demand reads.

Add cache entries count and backend error tracking to provide better visibility into system behavior
during performance testing.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Add metrics tracking for the blob prefetcher to monitor streaming performance and
cache efficiency. The new instrumentation captures timing, data volume, and request counts.

Key changes:
- Record wall-clock start time when first blob begins streaming using SystemTime
- Track prefetch_data_amount by accumulating compressed chunk sizes as they are cached
- Increment prefetch_requests_count for each successful blob stream
- Set prefetch_end_time_secs after each blob completes streaming
- Add metrics() accessor to BlobCache trait to expose BlobcacheMetrics
- Ensure begin time is only recorded once per prefetch session using std::sync::Once

The metrics provide visibility into prefetch duration, throughput, and cache hit rates for
performance analysis and optimization.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Replace push and pull request triggers with a daily cron schedule at 00:40 UTC. This reduces unnecessary workflow runs
while maintaining regular performance testing.

The workflow will now run automatically once per day and can still be triggered manually via workflow_dispatch when
needed.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
@bergwolf bergwolf requested review from a team as code owners May 22, 2026 09:25
@bergwolf bergwolf requested review from Fricounet, Zephyrcf, adam3q, changweige, liubin and power-more and removed request for a team May 22, 2026 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant