Skip to content

Conversation

@tomassrnka
Copy link
Member

@tomassrnka tomassrnka commented Feb 10, 2026

Summary

Adds ARM64/aarch64 architecture support to the E2B infrastructure, enabling builds and sandbox execution on Apple Silicon and other ARM64 hosts (via Lima VM + nested KVM).

Changes by commit:

  1. Makefiles — Replace hardcoded GOARCH=amd64 and --platform linux/amd64 with $(shell go env GOARCH) across all 4 service Makefiles
  2. Go runtime detection — Disable SMT on ARM64 (not supported), use runtime.GOARCH for OCI image platform, add ARM64 fallback for CPU detection (gopsutil doesn't populate Family/Model on ARM)
  3. Provision script — Make chattr calls non-fatal (|| true) for busybox versions that lack it
  4. create-build — Arch-aware Firecracker and kernel download URLs (tries arm64/ subdirectory first, falls back to generic), E2B_BASE_IMAGE env var for base image override
  5. fetch-busybox — Makefile target to swap the embedded x86 busybox binary with the system's ARM64 busybox-static before compilation

Related PRs:

Test plan

  • Build orchestrator, envd, API, and client-proxy natively on ARM64 Linux
  • Run make fetch-busybox on ARM64 host to swap busybox binary
  • Template build succeeds with create-build on ARM64
  • Sandbox create/exec/delete works on ARM64 (Lima VM + KVM)
  • Verify uname -m in sandbox returns aarch64
  • Confirm no regression on x86_64 builds (all changes are backwards compatible)

🤖 Generated with Claude Code


Note

Medium Risk
Touches sandbox boot/runtime configuration and artifact download/path logic; mistakes can break VM startup or fetch the wrong binaries on either architecture, though changes are mostly gated by GOARCH with fallbacks.

Overview
Adds ARM64 PR coverage by introducing an ARM64 workflow that cross-compiles/vets all Go packages and runs a unit-test matrix on ARM runners, and wires it into pull-request.yml. Updates service Makefiles and orchestrator tooling to stop hardcoding amd64, using runtime.GOARCH/go env GOARCH for Docker build platforms, OCI image pulls, Firecracker binary paths, and Firecracker/kernel downloads (with arch-first and legacy fallbacks), plus ARM64-specific runtime behavior (disable SMT; tolerate missing CPU family/model) and a helper make fetch-busybox for ARM64 builds; template provisioning now ignores chattr failures to avoid aborting on unsupported busybox builds.

Written by Cursor Bugbot for commit 30ac631. This will update automatically on new commits. Configure here.

tomassrnka and others added 3 commits February 9, 2026 16:51
Replace hardcoded GOARCH=amd64 and --platform linux/amd64 with
$(shell go env GOARCH) across all service Makefiles. This enables
building on ARM64 (Apple Silicon) without manual overrides while
preserving existing amd64 behavior on x86_64 hosts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Disable SMT (Simultaneous Multi-Threading) on ARM64 since it's not
  supported by ARM processors
- Use runtime.GOARCH for OCI image platform instead of hardcoded "amd64"
- Add ARM64 fallback in CPU detection (gopsutil doesn't populate
  Family/Model fields from /proc/cpuinfo on ARM64)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Some busybox versions (e.g., busybox-static on ARM64 Ubuntu) lack
chattr support. Make the calls non-fatal so provisioning succeeds
regardless of busybox capabilities.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

// On arm64, try arch-specific URL first (e.g. .../vmlinux-6.1.102/arm64/vmlinux.bin)
if runtime.GOARCH == "arm64" {
archURL, _ := url.JoinPath("https://storage.googleapis.com/e2b-prod-public-builds/kernels/", version, "arm64", "vmlinux.bin")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error from url.JoinPath is silently ignored. If the URL construction fails (unlikely but possible with malformed version strings), the code will proceed with a zero-value URL which will fail at download time with a less clear error message.


// On arm64, try arch-specific URL first (e.g. .../v1.12.1/arm64/firecracker)
if runtime.GOARCH == "arm64" {
archURL := fmt.Sprintf("https://github.com/e2b-dev/fc-versions/releases/download/%s/arm64/firecracker", version)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version parameter used to construct download URLs is not validated or sanitized. If an attacker can control this value (e.g., via flags or environment variables), they could inject path traversal sequences or modify the URL to download malicious binaries from unintended locations.

if [ "$$ARCH" = "arm64" ]; then \
echo "Fetching arm64 busybox from system busybox-static..."; \
if [ -f /usr/bin/busybox ]; then \
cp /usr/bin/busybox ./internal/template/build/core/systeminit/busybox_1.36.1-2; \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Makefile copies /usr/bin/busybox without verifying its integrity or authenticity. An attacker who compromises the build host could replace this system binary with a malicious version that gets embedded into all ARM64 sandboxes.

)

func getBaseImage() string {
if img := os.Getenv("E2B_BASE_IMAGE"); img != "" {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The E2B_BASE_IMAGE environment variable allows overriding the base image without validation. This could be exploited to inject a malicious container image if an attacker can control environment variables during the build process.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 589a0596cb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 428 to 433
// On arm64, try arch-specific URL first (e.g. .../vmlinux-6.1.102/arm64/vmlinux.bin)
if runtime.GOARCH == "arm64" {
archURL, _ := url.JoinPath("https://storage.googleapis.com/e2b-prod-public-builds/kernels/", version, "arm64", "vmlinux.bin")
fmt.Printf("⬇ Downloading kernel %s (arm64)...\n", version)
if err := download(ctx, archURL, dstPath, 0o644); err == nil {
return nil
}
fmt.Printf(" arm64 kernel not found, trying generic URL...\n")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid falling back to amd64 kernel on any error

On arm64, setupKernel falls back to the generic URL for any download error. That means transient network failures (timeouts, DNS, TLS) against the arm64 URL will silently pull the generic (amd64) kernel instead, which will not boot on arm64 hosts. The fallback should be gated on a 404 (not found) or a clear “no arm64 build” condition; otherwise arm64 builds can fail later in a harder-to-diagnose way.

Useful? React with 👍 / 👎.

Comment on lines 456 to 474
// On arm64, try arch-specific URL first (e.g. .../v1.12.1/arm64/firecracker)
if runtime.GOARCH == "arm64" {
archURL := fmt.Sprintf("https://github.com/e2b-dev/fc-versions/releases/download/%s/arm64/firecracker", version)
fmt.Printf("⬇ Downloading Firecracker %s (arm64)...\n", version)
if err := download(ctx, archURL, dstPath, 0o755); err == nil {
return nil
}
fmt.Printf(" arm64 Firecracker not found, trying generic URL...\n")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid falling back to amd64 Firecracker on any error

The arm64 Firecracker download has the same unconditional fallback: any error when fetching the arm64 binary causes a retry against the generic (amd64) URL. On arm64 hosts, a transient network error will then download an amd64 binary that cannot execute. This should only fall back when the arm64 artifact is confirmed missing (e.g., 404), not for network/IO errors.

Useful? React with 👍 / 👎.

Comment on lines 128 to 139
@ARCH=$$(dpkg --print-architecture 2>/dev/null || echo "amd64"); \
if [ "$$ARCH" = "arm64" ]; then \
echo "Fetching arm64 busybox from system busybox-static..."; \
if [ -f /usr/bin/busybox ]; then \
cp /usr/bin/busybox ./internal/template/build/core/systeminit/busybox_1.36.1-2; \
echo "✓ Replaced embedded busybox with arm64 binary"; \
else \
echo "⚠ /usr/bin/busybox not found — install busybox-static (apt install busybox-static)"; \
exit 1; \
fi; \
else \
echo "✓ Using bundled amd64 busybox"; \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge fetch-busybox mis-detects arm64 on non-Debian

The new fetch-busybox target defaults to amd64 when dpkg is missing. On arm64 hosts that don’t have dpkg (e.g., Alpine, Arch, Fedora), this silently keeps the bundled amd64 busybox, leading to an ARM build with the wrong binary. Consider using uname -m/go env GOARCH as the fallback or erroring when architecture detection isn’t possible.

Useful? React with 👍 / 👎.

uses: ./.github/workflows/out-of-order-migrations.yml
unit-tests:
uses: ./.github/workflows/pr-tests.yml
arm64-tests:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it make sense to do matrix inside pr-tests.yml so we don't duplicate the code?
don't we want to run it also for integration tests as those tests much more (feel free to do it in another PR as I suspect much more work there)

$(eval EXPECTED_MIGRATION_TIMESTAMP := $(expectedMigration))
@ docker buildx install # sets up the buildx as default docker builder (otherwise the command below won't work)
@ docker build --platform linux/amd64 --tag $(IMAGE_REGISTRY) --push --build-arg COMMIT_SHA="$(COMMIT_SHA)" --build-arg EXPECTED_MIGRATION_TIMESTAMP="$(EXPECTED_MIGRATION_TIMESTAMP)" -f ./Dockerfile ..
@ docker build --platform linux/$(shell go env GOARCH) --tag $(IMAGE_REGISTRY) --push --build-arg COMMIT_SHA="$(COMMIT_SHA)" --build-arg EXPECTED_MIGRATION_TIMESTAMP="$(EXPECTED_MIGRATION_TIMESTAMP)" -f ./Dockerfile ..
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would read it from .env as devs won't be able to build from local to their remote clusters if the arch doesn't match

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe a separate build-and-upload-arm64? Or we go real crazy and rename the existing one to build-and-upload-amd64 and add:

.PHONY: build-and-upload
build-and-upload: build-and-upload-arm64 build-and-upload-amd64

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree we should be able to build for architectures other than the local one. Not sure if putting ARCH into the env file is best, as you can have one env with both ARM64 and AMD64 pools.

defaultKernel = "vmlinux-6.1.102"
defaultFC = "v1.12.1_717921c"
proxyPort = 5007
defaultBaseImage = "e2bdev/base:latest"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep current logic, we can just start building the image for arm64 as well


kernelURL, _ := url.JoinPath("https://storage.googleapis.com/e2b-prod-public-builds/kernels/", version, "vmlinux.bin")
// On arm64, try arch-specific URL first (e.g. .../vmlinux-6.1.102/arm64/vmlinux.bin)
if runtime.GOARCH == "arm64" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we discussed this doesn't have to make it arm specific now

}

// On arm64, try arch-specific URL first (e.g. .../v1.12.1/arm64/firecracker)
if runtime.GOARCH == "arm64" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we discussed this doesn't have to make it arm specific now

hugePages bool,
) error {
smt := true
smt := runtime.GOARCH != "arm64"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you link an article or docs why it has to be false for arm64? Also can you make arm64 a constant?

Comment on lines +28 to +37
// On ARM64, gopsutil doesn't populate Family/Model from /proc/cpuinfo.
// Provide fallback values so callers don't get an error.
if (family == "" || model == "") && runtime.GOARCH == "arm64" {
if family == "" {
family = "arm64"
}
if model == "" {
model = "0"
}
} else if family == "" || model == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make it cleaner

Suggested change
// On ARM64, gopsutil doesn't populate Family/Model from /proc/cpuinfo.
// Provide fallback values so callers don't get an error.
if (family == "" || model == "") && runtime.GOARCH == "arm64" {
if family == "" {
family = "arm64"
}
if model == "" {
model = "0"
}
} else if family == "" || model == "" {
// On ARM64, gopsutil doesn't populate Family/Model from /proc/cpuinfo.
// Provide fallback values so callers don't get an error.
if (runtime.GOARCH == "arm64") {
if family == "" {
family = "arm64"
}
if model == "" {
model = "0"
}
}
if family == "" || model == "" {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are these changes relevant?

-firecracker $(FIRECRACKER_VERSION)

.PHONY: fetch-busybox
fetch-busybox:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this good for? I don't see it used


// On arm64, try arch-specific URL first (e.g. .../v1.12.1/arm64/firecracker)
if runtime.GOARCH == "arm64" {
archURL := fmt.Sprintf("https://github.com/e2b-dev/fc-versions/releases/download/%s/arm64/firecracker", version)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update to GCP as we don't want to be dependent on github

tomassrnka and others added 3 commits February 10, 2026 11:27
Standardize Firecracker binary paths to include architecture:
  {version}/{arch}/firecracker (e.g. v1.12.1/arm64/firecracker)

This aligns with the fc-kernels convention of {version}/{arch}/ and
prepares for multi-arch production deployments.

Changes:
- config.go: FirecrackerPath includes runtime.GOARCH in path
- create-build: download FC from GCS bucket (not GitHub releases),
  using {version}/{arch}/firecracker with legacy fallback on 404
- create-build: add errNotFound sentinel for reliable 404 detection,
  handle url.JoinPath errors explicitly
- script_builder_test.go: update expected paths for arch directory

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The embedded busybox binary is x86_64. On ARM64 hosts, add a Makefile
target that extracts the correct busybox from the busybox-static
apt package, ensuring the binary comes from a trusted source rather
than copying an arbitrary system binary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a new PR workflow that:
- Cross-compiles all packages for arm64 on x86_64 runners (catches
  build issues without needing ARM64 hardware)
- Runs unit tests on ubuntu-24.04-arm runners for packages that
  don't require KVM (api, client-proxy, db, docker-reverse-proxy,
  shared)

Orchestrator and envd are excluded from ARM64 unit tests since they
require KVM, hugepages, NBD, and other kernel features only available
on self-hosted runners.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document the architecture naming convention (amd64/arm64 vs x86_64/aarch64),
binary path layout, ARM64-specific behavior, and the fc-kernels naming
discrepancy that needs follow-up alignment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@tomassrnka tomassrnka marked this pull request as draft February 10, 2026 19:41
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.


func (t Config) FirecrackerPath(config cfg.BuilderConfig) string {
return filepath.Join(config.FirecrackerVersionsDir, t.FirecrackerVersion, FirecrackerBinaryName)
return filepath.Join(config.FirecrackerVersionsDir, t.FirecrackerVersion, runtime.GOARCH, FirecrackerBinaryName)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firecracker path change breaks existing amd64 production deployments

High Severity

FirecrackerPath now unconditionally includes runtime.GOARCH in the path ({version}/amd64/firecracker on x86_64), but the production host init script (init-client.sh) copies firecracker binaries from GCS into /fc-versions/ using the existing flat layout ({version}/firecracker). After deploying this change, the orchestrator will fail to find the firecracker binary on all existing amd64 production nodes because no amd64/ subdirectory exists on disk. The create-build tool has a legacy-URL fallback but still saves to the new arch-prefixed path, so local dev works — but production infrastructure hasn't been updated to match.

Additional Locations (1)

Fix in Cursor Fix in Web

tomassrnka and others added 5 commits February 10, 2026 11:47
Add TARGET_ARCH environment variable that allows deploying for a
different architecture than the build host (e.g. deploying x86_64
sandboxes from an ARM64 Mac). When unset, falls back to runtime.GOARCH.

- Add TargetArch() helper in shared/pkg/utils
- Use TargetArch() for binary paths, OCI platform, and downloads
- Add BUILD_ARCH variable to all Makefiles (reads TARGET_ARCH)
- Keep runtime.GOARCH for hardware-dependent code (SMT, CPU detection)
- Document TARGET_ARCH in orchestrator README

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FirecrackerPath() now checks if the arch-prefixed path
({version}/{arch}/firecracker) exists on disk before using it.
If not found, falls back to the legacy flat path
({version}/firecracker) for backward compatibility with existing
production nodes that haven't migrated to the new layout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Same pattern as FirecrackerPath — try arch-prefixed path
({version}/{arch}/vmlinux.bin) first, fall back to legacy flat
path ({version}/vmlinux.bin) for existing production nodes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Make fetch-busybox a prerequisite of build-local so the correct
arch busybox binary is fetched automatically. On amd64 this is a
no-op (prints "Using bundled amd64 busybox").

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@tomassrnka
Copy link
Member Author

@cursoragent bugbot run

@cursor
Copy link

cursor bot commented Feb 10, 2026

Unable to authenticate your request. Please make sure to connect your GitHub account to Cursor. Go to Cursor

@tomassrnka
Copy link
Member Author

@cursoragent bugbot run

@tomassrnka
Copy link
Member Author

@claude review this draft PR

return arch
}

return runtime.GOARCH
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TARGET_ARCH is read at runtime but BUILD_ARCH is evaluated at Makefile parse time. If TARGET_ARCH is changed between builds without running make clean, the mismatch could cause binaries compiled for one arch to be used with paths expecting another arch (e.g. amd64 binary trying to load arm64 Firecracker/kernel).

// TargetArch returns the target architecture for builds and binary paths.
// If TARGET_ARCH is set, it is used; otherwise falls back to runtime.GOARCH.
// This allows cross-architecture deployment (e.g. deploying x86_64 from an ARM64 host).
func TargetArch() string {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No validation that TARGET_ARCH is a valid architecture (amd64 or arm64). Invalid values like "x86_64" or "aarch64" would silently create incorrect paths. Consider adding validation or documenting which values are acceptable.


.PHONY: fetch-busybox
fetch-busybox:
@ARCH=$$(dpkg --print-architecture 2>/dev/null || echo "amd64"); \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fetch-busybox target uses dpkg --print-architecture to detect the arch, but TargetArch() uses TARGET_ARCH env var. If a user sets TARGET_ARCH=arm64 on an amd64 host, this target will still use the amd64 busybox (since dpkg returns amd64), but the build will be for arm64, causing a binary mismatch.

return fmt.Errorf("mkdir firecracker dir: %w", err)
}

if _, err := os.Stat(dstPath); err == nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition: os.Stat checks if the file exists, then download() creates/writes it. If two create-build processes run concurrently for the same version, both could pass the stat check and attempt to write simultaneously, potentially corrupting the file. Consider using O_EXCL flag or file locking.

if err := download(ctx, archURL, dstPath, 0o644); err == nil {
return nil
} else if !errors.Is(err, errNotFound) {
return fmt.Errorf("failed to download arm64 kernel: %w", err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For arm64, if the arch-specific download (line 424) fails with a non-404 error (network issue, permissions, etc.), the code returns immediately without trying the legacy fallback. This differs from the Firecracker logic below which falls through to legacy on any 404. Consider consistent fallback behavior.

hugePages bool,
) error {
smt := true
smt := runtime.GOARCH != "arm64"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SMT is disabled based on runtime.GOARCH (compile-time host arch), not TARGET_ARCH. If cross-compiling (TARGET_ARCH=arm64 on amd64 host), the binary will incorrectly enable SMT for ARM64 VMs. Should check the actual target arch or defer this to runtime detection.


echo "Making configuration immutable"
$BUSYBOX chattr +i /etc/resolv.conf
$BUSYBOX chattr +i /etc/resolv.conf 2>/dev/null || true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silencing chattr failures could mask real issues beyond busybox compatibility. If chattr fails on a full-featured system due to filesystem type, permissions, or corrupted inodes, the script will continue silently. Consider checking if chattr exists first, or logging the failure reason.

tomassrnka and others added 9 commits February 10, 2026 15:01
Remove BUILD_ARCH variable from Makefiles. TARGET_ARCH is now
purely a runtime env var for path resolution and OCI platform.
Makefiles use $(shell go env GOARCH) directly — cross-compile
with GOARCH=arm64 make build-local.

This prevents stale binary mismatches when TARGET_ARCH changes
between builds without a clean rebuild.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Panic on invalid TARGET_ARCH values (e.g. "x86_64", "aarch64")
to prevent silent path misresolution. Only "amd64" and "arm64"
are accepted, matching the Go/Docker/Debian naming convention.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Busybox runs inside the VM (target arch), not on the build host.
Use TARGET_ARCH if set, falling back to dpkg --print-architecture.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both functions now follow the same pattern: try arch-specific URL
({version}/{arch}/binary) first for all architectures, fall back
to legacy flat URL ({version}/binary) only on 404. Non-404 errors
(network, permissions) are returned immediately in both cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
chattr is non-fatal (busybox on arm64 may lack it), but failures
are now logged with the actual error message so real issues
(wrong filesystem, permissions, corrupted inodes) are visible
in build output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SMT is a host CPU capability (ARM doesn't support it). The
orchestrator binary always runs on the same hardware as Firecracker,
so runtime.GOARCH correctly reflects the host. TARGET_ARCH is for
path resolution only.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Align setupKernel to save to {version}/{arch}/vmlinux.bin like
  setupFC, so HostKernelPath finds the file without fallback
- Atomic downloads: write to .tmp file and rename on success to
  prevent partial files from being reused on next run
- Handle http.NewRequestWithContext error instead of discarding
- Improve verifyImagePlatform error to show actual vs expected arch
- Fix README defaults (vcpu=2, memory=1024, disk=1024)
- Add -race flag to ARM64 CI unit tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add unit tests for TargetArch():
- Defaults to runtime.GOARCH when TARGET_ARCH unset
- Respects valid overrides (amd64, arm64)
- Panics on invalid values (x86_64, aarch64, mips)

Add tests for FirecrackerPath/HostKernelPath fallback:
- Returns arch-prefixed path when {version}/{arch}/ dir exists
- Falls back to legacy flat path when only {version}/ exists
- Falls back to flat path when neither exists
- Prefers arch-prefixed over legacy when both exist

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Run orchestrator tests on the arm64 CI runner (ubuntu-24.04-arm)
to exercise ARM64-specific code paths: config path fallback,
SMT detection, CPU info, and TargetArch-dependent logic.

CGO is enabled by default on the native arm64 runner, which
satisfies both the race detector and orchestrator's CGO deps.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@tomassrnka
Copy link
Member Author

@claude review this draft PR

} else if !errors.Is(err, errNotFound) {
return fmt.Errorf("failed to download kernel: %w", err)
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In setupKernel and setupFC, when the arch-specific download fails with a non-404 error, the code returns immediately with the error. However, when downloading the legacy fallback URL fails, no error wrapping distinguishes it from the arch-specific failure. This could make debugging difficult when both URLs fail for different reasons (e.g., arch URL returns 403 Forbidden, legacy returns network error). Consider wrapping the legacy fallback error to indicate it's from the fallback attempt.


return download(ctx, kernelURL, dstPath, 0o644)
return download(ctx, legacyURL, dstPath, 0o644)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The legacy fallback downloads to the arch-specific path (dstPath which includes arch subdirectory) but attempts to fetch from the legacy URL (no arch subdirectory). If the legacy URL succeeds, you'll have an x86_64 binary stored at /kernels/vmlinux-6.1.102/arm64/vmlinux.bin on ARM64 hosts - a silent arch mismatch. The download should either: (1) Use a different destination path for legacy downloads, or (2) Skip the fallback entirely when arch-specific paths exist, forcing users to upload proper arch-specific binaries.

}

if err := f.Close(); err != nil {
os.Remove(tmpPath)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The os.Rename call can fail if tmpPath and path are on different filesystems (e.g., /tmp is tmpfs and destination is on disk). Since this is downloading to user-specified directories, consider checking if rename fails with EXDEV (cross-device link) and falling back to copy+remove, or document that the destination directory must be on the same filesystem as temp files.

cp "$$TMPDIR/bin/busybox" ./internal/template/build/core/systeminit/busybox_1.36.1-2 && \
rm -rf "$$TMPDIR" busybox-static_*.deb && \
echo "✓ Replaced embedded busybox with arm64 binary (from busybox-static package)" || \
{ rm -rf "$$TMPDIR" busybox-static_*.deb; echo "⚠ Failed to fetch busybox-static — install it: apt install busybox-static"; exit 1; }; \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fetch-busybox target runs apt-get download without sudo, which will fail for non-root users. Since build-local is a developer-facing target (not CI), this creates a silent failure where the build continues with the x86_64 busybox on ARM64 hosts. Either: (1) Require sudo for this target, (2) Check if apt-get/dpkg-deb are available before attempting, or (3) Make this a manual prerequisite step documented in README rather than a build-local dependency.


echo "Making configuration immutable"
$BUSYBOX chattr +i /etc/resolv.conf
if $BUSYBOX chattr +i /etc/resolv.conf 2>/tmp/chattr_err; then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chattr error is redirected to /tmp/chattr_err but this file is never cleaned up. In a long-running VM or repeated builds, this could accumulate. Consider using stderr directly in the error message instead of writing to a temp file: $BUSYBOX chattr +i /etc/resolv.conf 2>&1 || echo "chattr +i failed (non-fatal): $?". This avoids filesystem pollution and potential information leakage across builds.

// TargetArch returns the target architecture for binary paths and OCI platform.
// If TARGET_ARCH is set to a valid value ("amd64" or "arm64"), it is used;
// otherwise falls back to runtime.GOARCH.
// Panics on invalid TARGET_ARCH values to prevent silent path misresolution.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TargetArch() calls os.Getenv on every invocation without caching. In hot paths (e.g., called for every sandbox creation), this could cause unnecessary overhead. Since TARGET_ARCH should not change during runtime, consider caching the result in a sync.Once or package-level variable initialized at startup. This also makes the panic-on-invalid behavior more predictable (fails at startup rather than mid-operation).

return arch
}

return runtime.GOARCH
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TargetArch() falls back to runtime.GOARCH without validation. If the binary is compiled for an unsupported architecture (e.g., mips, s390x), it will return invalid values that later cause path resolution failures. Consider validating runtime.GOARCH against validArchitectures and panicking with a clear message if the host architecture is unsupported, rather than letting it fail silently in downstream path operations.

Add infra-tests-arm64 runner entries for orchestrator and envd packages.
Uses matrix.runner to select the runner per job instead of hardcoding
infra-tests. The setup steps (hugepages, NBD, uffd) are arch-agnostic
and work identically on ARM64.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants