Skip to content

Commit bb50dc1

Browse files
authored
Merge branch 'main' into update-e2e-test-versions
2 parents f1209aa + a7d884d commit bb50dc1

File tree

81 files changed

+1864
-310
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

81 files changed

+1864
-310
lines changed

CLAUDE.md

Lines changed: 301 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,301 @@
1+
# FoundationDB Kubernetes Operator - Claude Development Guide
2+
3+
## Project Overview
4+
5+
The FoundationDB Kubernetes Operator is a sophisticated Kubernetes controller that manages FoundationDB clusters in Kubernetes environments. It automates deployment, scaling, backup, restore, and maintenance operations for FoundationDB distributed database clusters.
6+
7+
### Architecture Components
8+
9+
- **API Types** (`api/v1beta2/`): Custom Resource Definitions (CRDs) for FoundationDBCluster, FoundationDBBackup, FoundationDBRestore
10+
- **Controllers** (`controllers/`): Reconciliation logic for cluster lifecycle management
11+
- **Internal Packages** (`internal/`): Core business logic for coordination, replacements, maintenance, etc.
12+
- **PKG Packages** (`pkg/`): Reusable components like admin clients, pod managers, status checks
13+
- **E2E Tests** (`e2e/`): Comprehensive end-to-end testing with chaos engineering
14+
15+
### Key Patterns
16+
17+
- **Controller-Runtime**: Built on `sigs.k8s.io/controller-runtime` framework
18+
- **Reconciliation Loops**: Event-driven state reconciliation
19+
- **Mock-Based Testing**: Extensive mocking for unit tests
20+
- **Chaos Engineering**: Production-like failure testing with chaos-mesh
21+
22+
## Development Environment Setup
23+
24+
### Prerequisites
25+
26+
```bash
27+
# Go 1.24+ required
28+
go version
29+
30+
# Install dependencies
31+
make deps
32+
33+
# Install FoundationDB client package
34+
# For macOS: Download from https://github.com/apple/foundationdb/releases
35+
# For arm64 Mac: Make sure to install the arm64 package
36+
```
37+
38+
### Local Development
39+
40+
```bash
41+
# Clone repository
42+
git clone https://github.com/FoundationDB/fdb-kubernetes-operator
43+
cd fdb-kubernetes-operator
44+
45+
# Set up test certificates
46+
config/test-certs/generate_secrets.bash
47+
48+
# Build and deploy operator (requires local K8s cluster)
49+
make rebuild-operator
50+
51+
# Create test cluster
52+
kubectl apply -k ./config/tests/base
53+
```
54+
55+
## Build System & Tooling
56+
57+
### Primary Make Commands
58+
59+
| Command | Purpose |
60+
|---------|---------|
61+
| `make all` | Complete build pipeline: deps, generate, fmt, vet, test, build |
62+
| `make test` | Run unit tests (Ginkgo with race detection if TEST_RACE_CONDITIONS=1) |
63+
| `make lint` | Run golangci-lint with project rules |
64+
| `make fmt` | Format code using golines + goimports + golangci-lint --fix |
65+
| `make vet` | Run go vet static analysis |
66+
| `make generate` | Generate deepcopy methods and CRDs |
67+
| `make manifests` | Generate CRD YAML files |
68+
| `make container-build` | Build Docker image |
69+
| `make deploy` | Deploy operator to Kubernetes cluster |
70+
| `make rebuild-operator` | Build, push (if remote), deploy, and bounce operator |
71+
72+
### Environment Variables
73+
74+
- `IMG`: Operator image name (default: `fdb-kubernetes-operator:latest`)
75+
- `SIDECAR_IMG`: Sidecar image name
76+
- `REMOTE_BUILD`: Set to 1 for remote builds (enables image push)
77+
- `BUILD_PLATFORM`: Override build platform (e.g., `linux/amd64`)
78+
- `TEST_RACE_CONDITIONS`: Set to 1 to enable race detection in tests
79+
- `SKIP_TEST`: Set to 1 to skip tests in build
80+
81+
## Testing Framework
82+
83+
### Unit Testing (Ginkgo v2 + Gomega)
84+
85+
```go
86+
// Example test structure from controllers/suite_test.go
87+
var _ = Describe("ControllerName", func() {
88+
BeforeEach(func() {
89+
// Setup test environment
90+
k8sClient = mockclient.NewMockClient()
91+
})
92+
93+
It("should reconcile successfully", func() {
94+
// Test implementation
95+
Expect(result).To(BeNil())
96+
})
97+
})
98+
```
99+
100+
### E2E Testing
101+
102+
- **Location**: `e2e/` directory with test packages
103+
- **Framework**: Ginkgo + Gomega + chaos-mesh for failure injection
104+
- **Types**: Upgrades, HA failures, stress testing, maintenance mode
105+
- **Run**: `make test` with e2e labels
106+
107+
### Mock Objects
108+
109+
- **Kubernetes Client**: `mockclient.MockClient`
110+
- **FDB Admin Client**: `mock.DatabaseClientProvider`
111+
- **Pod Client**: `mockpodclient.NewMockFdbPodClient`
112+
113+
## Code Standards & Conventions
114+
115+
### Linting & Formatting
116+
117+
**golangci-lint Configuration** (`.golangci.yml`):
118+
- **Enabled Linters**: errcheck, govet, staticcheck, revive, misspell, ineffassign, unused
119+
- **Formatters**: gofmt with golines (120 char limit)
120+
- **Dependency Guard**: Restricted import paths for clean architecture
121+
122+
**Formatting Tools**:
123+
- `golines`: Line length formatting with gofmt base
124+
- `goimports`: Import organization
125+
- `golangci-lint run --fix`: Auto-fix issues
126+
127+
### Package Organization
128+
129+
```
130+
├── api/v1beta2/ # CRD types and API definitions
131+
├── controllers/ # Controller reconciliation logic
132+
├── internal/ # Internal business logic packages
133+
├── pkg/ # Reusable library packages
134+
├── e2e/ # End-to-end tests
135+
├── kubectl-fdb/ # kubectl plugin
136+
├── fdbclient/ # FDB client utilities
137+
└── setup/ # Operator setup and configuration
138+
```
139+
140+
### Naming Conventions
141+
142+
- **Types**: PascalCase (e.g., `FoundationDBCluster`)
143+
- **Functions**: PascalCase for exported, camelCase for internal
144+
- **Constants**: PascalCase or UPPER_SNAKE_CASE for public constants
145+
- **Files**: snake_case.go
146+
- **Test Files**: `*_test.go` with corresponding suite_test.go
147+
148+
### Error Handling
149+
150+
```go
151+
// Preferred error handling pattern
152+
err := someOperation()
153+
if err != nil {
154+
return reconcile.Result{}, fmt.Errorf("failed to perform operation: %w", err)
155+
}
156+
157+
// Use structured errors from internal/error_helper.go
158+
return internal.ReconcileResult{
159+
Message: "Operation failed",
160+
Err: err,
161+
}
162+
```
163+
164+
## Development Workflow
165+
166+
### 1. Understanding Existing Patterns
167+
168+
Before implementing new features:
169+
170+
```bash
171+
# Find similar implementations
172+
grep -r "similar_pattern" controllers/
173+
grep -r "ProcessGroup" api/v1beta2/
174+
175+
# Study existing controller structure
176+
ls controllers/*_controller.go
177+
```
178+
179+
### 2. Controller Development
180+
181+
**Controller Structure**:
182+
```go
183+
type FoundationDBClusterReconciler struct {
184+
client.Client
185+
Log logr.Logger
186+
Recorder record.EventRecorder
187+
DatabaseClientProvider DatabaseClientProvider
188+
PodLifecycleManager podmanager.PodLifecycleManager
189+
}
190+
191+
func (r *FoundationDBClusterReconciler) Reconcile(ctx context.Context, request ctrl.Request) (ctrl.Result, error) {
192+
// Implementation
193+
}
194+
```
195+
196+
### 3. API Changes
197+
198+
```bash
199+
# After modifying API types in api/v1beta2/
200+
make clean all # Generate new CRD structs.
201+
```
202+
203+
### 4. Testing Strategy
204+
205+
1. **Write unit tests first** using Ginkgo/Gomega
206+
2. **Use mock clients** for Kubernetes operations
207+
3. **Add e2e tests** for complex scenarios
208+
4. **Run race detection**: `TEST_RACE_CONDITIONS=1 make test`
209+
210+
### 5. Common Development Tasks
211+
212+
**Adding a new CRD field**:
213+
1. Modify struct in `api/v1beta2/*_types.go`
214+
2. Add validation tags and documentation
215+
3. Run `make generate manifests`
216+
4. Add tests in `api/v1beta2/*_test.go`
217+
218+
**Adding a new controller reconciler**:
219+
1. Create new file in `controllers/`
220+
2. Implement reconciliation logic
221+
3. Add to `main.go` and controller setup
222+
4. Write comprehensive tests
223+
224+
## Key Libraries & Dependencies
225+
226+
### Core Dependencies
227+
228+
- **controller-runtime** (`sigs.k8s.io/controller-runtime`): Kubernetes operator framework
229+
- **client-go** (`k8s.io/client-go`): Kubernetes API client
230+
- **FoundationDB Bindings** (`github.com/apple/foundationdb/bindings/go`): FDB Go client
231+
- **logr** (`github.com/go-logr/logr`): Structured logging interface
232+
233+
### Testing Dependencies
234+
235+
- **Ginkgo v2** (`github.com/onsi/ginkgo/v2`): BDD testing framework
236+
- **Gomega** (`github.com/onsi/gomega`): Assertion library
237+
- **chaos-mesh**: Chaos engineering for e2e tests
238+
239+
### Development Tools
240+
241+
- **controller-gen**: CRD and deepcopy generation
242+
- **kustomize**: Kubernetes configuration management
243+
- **golangci-lint**: Go linting
244+
- **golines + goimports**: Code formatting
245+
- **goreleaser**: Binary releases
246+
247+
## Debugging & Troubleshooting
248+
249+
### Local Debugging
250+
251+
```bash
252+
# View operator logs
253+
kubectl logs -f -l app=fdb-kubernetes-operator-controller-manager --container=manager
254+
255+
# Check cluster status
256+
kubectl get foundationdbcluster test-cluster -o yaml
257+
258+
# Access FDB CLI
259+
kubectl fdb exec -it test-cluster -- fdbcli
260+
```
261+
262+
### Common Issues
263+
264+
1. **Build Failures**: Ensure FoundationDB client is installed for your platform
265+
2. **Test Failures**: Check mock setup and race conditions
266+
3. **CRD Issues**: Regenerate with `make manifests` after API changes
267+
4. **Image Issues**: Verify BUILD_PLATFORM matches your cluster architecture
268+
269+
## Contributing Guidelines
270+
271+
### Before Opening PRs
272+
273+
1. **Run full test suite**: `make all`
274+
2. **Check formatting**: `make fmt`
275+
3. **Update documentation** if adding new features
276+
4. **Add/update tests** for new functionality
277+
5. **Follow existing patterns** in similar controllers
278+
279+
### Commit Messages
280+
281+
Follow conventional commit format:
282+
```
283+
feat(controller): add new reconciliation step for process replacement
284+
fix(api): correct validation for database configuration
285+
test(e2e): add chaos testing for network partitions
286+
```
287+
288+
### Pull Request Process
289+
290+
1. Create feature branch from `main`
291+
2. Implement changes following this guide
292+
3. Ensure all tests pass
293+
4. Update documentation if needed
294+
5. Reference any related GitHub issues
295+
296+
## Additional Resources
297+
298+
- [FoundationDB Documentation](https://apple.github.io/foundationdb/)
299+
- [controller-runtime Book](https://book.kubebuilder.io/)
300+
- [Operator Pattern](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/)
301+
- [Community Forums](https://forums.foundationdb.org)

api/v1beta2/foundationdb_status.go

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,9 @@ type FoundationDBStatusProcessInfo struct {
198198
// The time that the process has been up for.
199199
UptimeSeconds float64 `json:"uptime_seconds,omitempty"`
200200

201+
// RunLoopBusy represents the busyness of the run loop of the fdbserver.
202+
RunLoopBusy float64 `json:"run_loop_busy,omitempty"`
203+
201204
// Roles contains a slice of all roles of the process
202205
Roles []FoundationDBStatusProcessRoleInfo `json:"roles,omitempty"`
203206

api/v1beta2/foundationdb_status_test.go

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,8 @@ var _ = Describe("FoundationDBStatus", func() {
119119
},
120120
},
121121
},
122-
Messages: []FoundationDBStatusProcessMessage{},
122+
Messages: []FoundationDBStatusProcessMessage{},
123+
RunLoopBusy: 0.024864200000000003,
123124
},
124125
"eab0db1aa7aae81a50ca97e9814a1b7d": {
125126
Address: ProcessAddress{
@@ -161,7 +162,8 @@ var _ = Describe("FoundationDBStatus", func() {
161162
ID: "dfd679875a386d06",
162163
},
163164
},
164-
Messages: []FoundationDBStatusProcessMessage{},
165+
Messages: []FoundationDBStatusProcessMessage{},
166+
RunLoopBusy: 0.0080509299999999978,
165167
},
166168
"f483247d4d5f279ef02c549680cbde64": {
167169
Address: ProcessAddress{
@@ -208,7 +210,8 @@ var _ = Describe("FoundationDBStatus", func() {
208210
},
209211
},
210212
},
211-
Messages: []FoundationDBStatusProcessMessage{},
213+
Messages: []FoundationDBStatusProcessMessage{},
214+
RunLoopBusy: 0.0101502,
212215
},
213216
"f6e0f7fd80da429d20329ad95d793ca3": {
214217
Address: ProcessAddress{
@@ -240,7 +243,8 @@ var _ = Describe("FoundationDBStatus", func() {
240243
ID: "cbeb915c6cceb4a9",
241244
},
242245
},
243-
Messages: []FoundationDBStatusProcessMessage{},
246+
Messages: []FoundationDBStatusProcessMessage{},
247+
RunLoopBusy: 0.027578200000000001,
244248
},
245249
"f75644abdf1b06c803b5c3c124fdd0a0": {
246250
Address: ProcessAddress{
@@ -264,7 +268,8 @@ var _ = Describe("FoundationDBStatus", func() {
264268
ID: "1f953018ad2e746f",
265269
},
266270
},
267-
Messages: []FoundationDBStatusProcessMessage{},
271+
Messages: []FoundationDBStatusProcessMessage{},
272+
RunLoopBusy: 0.0075524900000000002,
268273
},
269274
"105bf6c041f8ec315d03e889c2746ecf": {
270275
Address: ProcessAddress{
@@ -292,7 +297,8 @@ var _ = Describe("FoundationDBStatus", func() {
292297
KVStoreAvailableBytes: ptr.To[int64](84178214912),
293298
},
294299
},
295-
Messages: []FoundationDBStatusProcessMessage{},
300+
Messages: []FoundationDBStatusProcessMessage{},
301+
RunLoopBusy: 0.0081051000000000005,
296302
},
297303
"78c1c84af4481f0df628d40358f0930a": {
298304
Address: ProcessAddress{
@@ -320,7 +326,8 @@ var _ = Describe("FoundationDBStatus", func() {
320326
KVStoreAvailableBytes: ptr.To[int64](84178214912),
321327
},
322328
},
323-
Messages: []FoundationDBStatusProcessMessage{},
329+
Messages: []FoundationDBStatusProcessMessage{},
330+
RunLoopBusy: 0.0088742700000000001,
324331
},
325332
"83084479b50c9c3a09b0286297be3796": {
326333
Address: ProcessAddress{
@@ -348,7 +355,8 @@ var _ = Describe("FoundationDBStatus", func() {
348355
KVStoreAvailableBytes: ptr.To[int64](84178202624),
349356
},
350357
},
351-
Messages: []FoundationDBStatusProcessMessage{},
358+
Messages: []FoundationDBStatusProcessMessage{},
359+
RunLoopBusy: 0.0089497099999999979,
352360
},
353361
},
354362
Data: FoundationDBStatusDataStatistics{

0 commit comments

Comments
 (0)