Skip to content

Commit 0c18dc4

Browse files
authored
Merge branch 'main' into tests/scale-1.1
2 parents 94e48af + 25c8ea4 commit 0c18dc4

File tree

6 files changed

+164
-36
lines changed

6 files changed

+164
-36
lines changed

.github/workflows/ci.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ jobs:
3434
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
3535

3636
- name: Setup Golang Environment
37-
uses: actions/setup-go@93397bea11091df50f3d7e59dc26a7711a8bcfbe # v4.1.0
37+
uses: actions/setup-go@0c52d547c9bc32b1aa3301fd7a9cb496313a4491 # v5.0.0
3838
with:
3939
go-version-file: go.mod
4040

@@ -63,7 +63,7 @@ jobs:
6363
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
6464

6565
- name: Setup Golang Environment
66-
uses: actions/setup-go@93397bea11091df50f3d7e59dc26a7711a8bcfbe # v4.1.0
66+
uses: actions/setup-go@0c52d547c9bc32b1aa3301fd7a9cb496313a4491 # v5.0.0
6767
with:
6868
go-version-file: go.mod
6969

@@ -105,7 +105,7 @@ jobs:
105105
fetch-depth: 0
106106

107107
- name: Setup Golang Environment
108-
uses: actions/setup-go@93397bea11091df50f3d7e59dc26a7711a8bcfbe # v4.1.0
108+
uses: actions/setup-go@0c52d547c9bc32b1aa3301fd7a9cb496313a4491 # v5.0.0
109109
with:
110110
go-version-file: go.mod
111111

.github/workflows/codeql-analysis.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ jobs:
5555
# queries: security-extended,security-and-quality
5656

5757
- name: Setup Golang Environment
58-
uses: actions/setup-go@93397bea11091df50f3d7e59dc26a7711a8bcfbe # v4.1.0
58+
uses: actions/setup-go@0c52d547c9bc32b1aa3301fd7a9cb496313a4491 # v5.0.0
5959
with:
6060
go-version-file: go.mod
6161
if: matrix.language == 'go'

.github/workflows/conformance.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ jobs:
3636
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
3737

3838
- name: Setup Golang Environment
39-
uses: actions/setup-go@93397bea11091df50f3d7e59dc26a7711a8bcfbe # v4.1.0
39+
uses: actions/setup-go@0c52d547c9bc32b1aa3301fd7a9cb496313a4491 # v5.0.0
4040
with:
4141
go-version-file: go.mod
4242

.github/workflows/lint.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ jobs:
2222
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
2323

2424
- name: Setup Golang Environment
25-
uses: actions/setup-go@93397bea11091df50f3d7e59dc26a7711a8bcfbe # v4.1.0
25+
uses: actions/setup-go@0c52d547c9bc32b1aa3301fd7a9cb496313a4491 # v5.0.0
2626
with:
2727
go-version-file: go.mod
2828

tests/graceful-recovery/graceful-recovery.md

Lines changed: 19 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -34,18 +34,18 @@ Ensure that NGF can recover gracefully from container failures without any user
3434
3. Check out the latest tag (unless you are installing the edge version from the main branch).
3535
4. Go into `deploy/manifests/nginx-gateway.yaml` and change `runAsNonRoot` from `true` to `false`.
3636
This allows us to insert our ephemeral container as root which enables us to restart the nginx-gateway container.
37-
5. Follow the [installation instructions](https://github.com/nginxinc/nginx-gateway-fabric/blob/main/docs/installation.md)
37+
5. Follow the [installation instructions](https://github.com/nginxinc/nginx-gateway-fabric/blob/main/site/content/installation/installing-ngf/manifests.md)
3838
to deploy NGINX Gateway Fabric using manifests and expose it through a LoadBalancer Service.
3939
6. In a separate terminal track NGF logs.
4040

4141
```console
42-
kubectl -n nginx-gateway logs -f deploy/nginx-gateway
42+
kubectl -n nginx-gateway logs -f deploy/nginx-gateway -c nginx-gateway
4343
```
4444

4545
7. In a separate terminal track NGINX container logs.
4646

4747
```console
48-
kubectl -n nginx-gateway logs -f <NGF_POD> -c nginx
48+
kubectl -n nginx-gateway logs -f deploy/nginx-gateway -c nginx
4949
```
5050

5151
8. In a separate terminal Exec into the NGINX container inside the NGF pod.
@@ -56,9 +56,7 @@ to deploy NGINX Gateway Fabric using manifests and expose it through a LoadBalan
5656

5757
9. In a different terminal, deploy the
5858
[https-termination example](https://github.com/nginxinc/nginx-gateway-fabric/tree/main/examples/https-termination).
59-
10. Inside the NGINX container, navigate to `/etc/nginx/conf.d` and check `http.conf` and `config-version.config` to see
60-
if the configuration and version were correctly updated.
61-
11. Send traffic through the example application and ensure it is working correctly.
59+
10. Send traffic through the example application and ensure it is working correctly.
6260

6361
### Run the tests
6462

@@ -80,25 +78,22 @@ if the configuration and version were correctly updated.
8078
4. Check for errors in the NGF and NGINX container logs.
8179
5. When the nginx-gateway container is back up, ensure traffic flows through the example application correctly.
8280
6. Open up the NGF and NGINX container logs and check for errors.
83-
7. Inside the NGINX container, check that `http.conf` was not changed and `config-version.conf` had its version set to `2`.
84-
8. Send traffic through the example application and ensure it is working correctly.
85-
9. Check that NGF can still process changes of resources.
81+
7. Send traffic through the example application and ensure it is working correctly.
82+
8. Check that NGF can still process changes of resources.
8683
1. Delete the HTTPRoute resources.
8784

8885
```console
8986
kubectl delete -f ../../examples/https-termination/cafe-routes.yaml
9087
```
9188

92-
2. Inside the NGINX container, check that `http.conf` and `config-version.conf` were correctly updated.
93-
3. Send traffic through the example application using the updated resources and ensure traffic does not flow.
94-
4. Apply the HTTPRoute resources.
89+
2. Send traffic through the example application using the updated resources and ensure traffic does not flow.
90+
3. Apply the HTTPRoute resources.
9591

9692
```console
9793
kubectl apply -f ../../examples/https-termination/cafe-routes.yaml
9894
```
9995

100-
5. Inside the NGINX container, check that `http.conf` and `config-version.conf` were correctly updated.
101-
6. Send traffic through the example application using the updated resources and ensure traffic flows correctly.
96+
4. Send traffic through the example application using the updated resources and ensure traffic flows correctly.
10297

10398
#### Restart NGINX container
10499

@@ -113,24 +108,21 @@ if the configuration and version were correctly updated.
113108

114109
4. When NGINX container is back up, ensure traffic flows through the example application correctly.
115110
5. Open up the NGINX container logs and check for errors.
116-
6. Exec back into the NGINX container and check that `http.conf` and `config-version.conf` were not changed.
117-
7. Check that NGF can still process changes of resources.
111+
6. Check that NGF can still process changes of resources.
118112
1. Delete the HTTPRoute resources.
119113

120114
```console
121115
kubectl delete -f ../../examples/https-termination/cafe-routes.yaml
122116
```
123117

124-
2. Inside the NGINX container, check that `http.conf` and `config-version.conf` were correctly updated.
125-
3. Send traffic through the example application using the updated resources and ensure traffic does not flow.
126-
4. Apply the HTTPRoute resources.
118+
2. Send traffic through the example application using the updated resources and ensure traffic does not flow.
119+
3. Apply the HTTPRoute resources.
127120

128121
```console
129122
kubectl apply -f ../../examples/https-termination/cafe-routes.yaml
130123
```
131124

132-
5. Inside the NGINX container, check that `http.conf` and `config-version.conf` were correctly updated.
133-
6. Send traffic through the example application using the updated resources and ensure traffic flows correctly.
125+
4. Send traffic through the example application using the updated resources and ensure traffic flows correctly.
134126

135127
#### Restart Node with draining
136128

@@ -156,26 +148,23 @@ if the configuration and version were correctly updated.
156148
docker restart kind-control-plane
157149
```
158150

159-
7. Open up both NGF and NGINX container logs and check for errors.
160-
8. Exec back into the NGINX container and check that `http.conf` and `config-version.conf` were not changed.
161-
9. Send traffic through the example application and ensure it is working correctly.
162-
10. Check that NGF can still process changes of resources.
151+
7. Check the logs of the old and new NGF and NGINX containers for errors.
152+
8. Send traffic through the example application and ensure it is working correctly.
153+
9. Check that NGF can still process changes of resources.
163154
1. Delete the HTTPRoute resources.
164155

165156
```console
166157
kubectl delete -f ../../examples/https-termination/cafe-routes.yaml
167158
```
168159

169-
2. Inside the NGINX container, check that `http.conf` and `config-version.conf` were correctly updated.
170-
3. Send traffic through the example application using the updated resources and ensure traffic does not flow.
171-
4. Apply the HTTPRoute resources.
160+
2. Send traffic through the example application using the updated resources and ensure traffic does not flow.
161+
3. Apply the HTTPRoute resources.
172162

173163
```console
174164
kubectl apply -f ../../examples/https-termination/cafe-routes.yaml
175165
```
176166

177-
5. Inside the NGINX container, check that `http.conf` and `config-version.conf` were correctly updated.
178-
6. Send traffic through the example application using the updated resources and ensure traffic flows correctly.
167+
4. Send traffic through the example application using the updated resources and ensure traffic flows correctly.
179168

180169
#### Restart Node without draining
181170

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
# Results for v1.1.0
2+
3+
<!-- TOC -->
4+
- [Results for v1.1.0](#results-for-v110)
5+
- [Summary](#summary)
6+
- [Versions](#versions)
7+
- [Tests](#tests)
8+
- [Restart nginx-gateway container](#restart-nginx-gateway-container)
9+
- [Restart NGINX container](#restart-nginx-container)
10+
- [Restart Node with draining](#restart-node-with-draining)
11+
- [Restart Node without draining](#restart-node-without-draining)
12+
- [Future Improvements](#future-improvements)
13+
<!-- TOC -->
14+
15+
16+
## Summary
17+
18+
- No new issues since 1.0.
19+
- One new error in the [Restart Node with draining](#restart-node-with-draining) test, but it is not actionable.
20+
21+
## Versions
22+
23+
NGF version:
24+
25+
26+
```text
27+
commit: d6bbdba28a0f9ae3f75864855b76b0fb34bee3e5
28+
date: 2023-12-05T18:43:51Z
29+
version: edge
30+
```
31+
32+
with NGINX:
33+
34+
```text
35+
nginx/1.25.3
36+
built by gcc 12.2.1 20220924 (Alpine 12.2.1_git20220924-r10)
37+
OS: Linux 5.15.49-linuxkit-pr
38+
```
39+
40+
41+
Kubernetes:
42+
43+
```text
44+
Server Version: version.Info{Major:"1", Minor:"28",
45+
GitVersion:"v1.28.0",
46+
GitCommit:"855e7c48de7388eb330da0f8d9d2394ee818fb8d",
47+
GitTreeState:"clean", BuildDate:"2023-08-15T21:26:40Z",
48+
GoVersion:"go1.20.7", Compiler:"gc",
49+
Platform:"linux/arm64"}
50+
```
51+
52+
## Tests
53+
54+
### Restart nginx-gateway container
55+
56+
No errors.
57+
58+
### Restart NGINX container
59+
60+
The NGF Pod was unable to recover after sending a SIGKILL signal to the NGINX master process.
61+
The following appeared in the NGINX logs:
62+
63+
```text
64+
2023/12/05 22:18:45 [emerg] 116#116: bind() to unix:/var/run/nginx/nginx-config-version.sock failed (98: Address in use)
65+
2023/12/05 22:18:45 [emerg] 116#116: bind() to unix:/var/lib/nginx/nginx-502-server.sock failed (98: Address in use)
66+
2023/12/05 22:18:45 [emerg] 116#116: bind() to unix:/var/lib/nginx/nginx-500-server.sock failed (98: Address in use)
67+
2023/12/05 22:18:45 [emerg] 116#116: bind() to unix:/var/run/nginx/nginx-status.sock failed (98: Address in use)
68+
2023/12/05 22:18:45 [notice] 116#116: try again to bind() after 500ms
69+
```
70+
71+
NGF cannot update NGINX after this and logs the following error:
72+
73+
```text
74+
{
75+
"level": "error",
76+
"ts": "2023-12-05T22:19:53Z",
77+
"logger": "eventLoop.eventHandler",
78+
"msg": "Failed to update NGINX configuration",
79+
"batchID": 22,
80+
"error": "failed to reload NGINX: open /proc/19/task/19/children: no such file or directory",
81+
"stacktrace": "github.com/nginxinc/nginx-gateway-fabric/internal/mode/static.(*eventHandlerImpl).HandleEventBatch\n\t/home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/internal/mode/static/handler.go:116\ngithub.com/nginxinc/nginx-gateway-fabric/internal/framework/events.(*EventLoop).Start.func1.1\n\t/home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/internal/framework/events/loop.go:74"
82+
}
83+
```
84+
85+
Known issue: https://github.com/nginxinc/nginx-gateway-fabric/issues/1108
86+
87+
88+
### Restart Node with draining
89+
90+
Previous NGF container error:
91+
92+
```json
93+
{
94+
"level": "error",
95+
"ts": "2023-12-05T21:43:31Z",
96+
"logger": "eventLoop.eventHandler",
97+
"msg": "Failed to update NGINX configuration",
98+
"batchID": 11,
99+
"error": "failed to reload NGINX: could not get expected config version 7: error getting client: Get \"http://config-version/version\": dial unix /var/run/nginx/nginx-config-version.sock: connect: no such file or directory",
100+
"stacktrace": "github.com/nginxinc/nginx-gateway-fabric/internal/mode/static.(*eventHandlerImpl).HandleEventBatch\n\t/home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/internal/mode/static/handler.go:116\ngithub.com/nginxinc/nginx-gateway-fabric/internal/framework/events.(*EventLoop).Start.func1.1\n\t/home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/internal/framework/events/loop.go:74"
101+
}
102+
```
103+
104+
This error is likely due to NGINX terminating during a reload attempt and does not consistently occur on a node restart.
105+
106+
No errors in previous NGINX container.
107+
No errors in new NGF/NGINX containers.
108+
109+
### Restart Node without draining
110+
111+
The NGF Pod was unable to recover the majority of times after running `docker restart kind-control-plane`.
112+
113+
The following appeared in the NGINX logs:
114+
115+
```text
116+
2023/12/05 21:53:51 [emerg] 29#29: bind() to unix:/var/run/nginx/nginx-status.sock failed (98: Address in use)
117+
2023/12/05 21:53:51 [notice] 29#29: try again to bind() after 500ms
118+
2023/12/05 21:53:51 [emerg] 29#29: bind() to unix:/var/run/nginx/nginx-status.sock failed (98: Address in use)
119+
2023/12/05 21:53:51 [notice] 29#29: try again to bind() after 500ms
120+
2023/12/05 21:53:51 [emerg] 29#29: bind() to unix:/var/run/nginx/nginx-status.sock failed (98: Address in use)
121+
2023/12/05 21:53:51 [notice] 29#29: try again to bind() after 500ms
122+
2023/12/05 21:53:51 [emerg] 29#29: bind() to unix:/var/run/nginx/nginx-status.sock failed (98: Address in use)
123+
2023/12/05 21:53:51 [notice] 29#29: try again to bind() after 500ms
124+
2023/12/05 21:53:51 [emerg] 29#29: bind() to unix:/var/run/nginx/nginx-status.sock failed (98: Address in use)
125+
2023/12/05 21:53:51 [notice] 29#29: try again to bind() after 500ms
126+
2023/12/05 21:53:51 [emerg] 29#29: still could not bind()
127+
```
128+
129+
The following appeared in the NGF logs:
130+
131+
```text
132+
failed to start control loop: cannot create nginx metrics collector: failed to get http://config-status/stub_status: Get "http://config-status/stub_status": dial unix /var/run/nginx/nginx-status.sock: connect: connection refused
133+
```
134+
135+
Known issue: https://github.com/nginxinc/nginx-gateway-fabric/issues/1108
136+
137+
## Future Improvements
138+
139+
- None

0 commit comments

Comments
 (0)