Objective 1: Identify and correct errors in the HealthCheck function #1

imkh · 2025-02-13T09:35:08Z

Originally created on 18 August 2024 at 18:24 GMT+2

This PR brings a number of quick changes to fix bugs that leads to erroneous results. Those changes are clearly separated in small reviewable commits, which would normally be opened as separate PRs, but are included here together in the context of this challenge.

1. Fix loop variable capture issue (`8ee6eb0`)

This commit fixes the common Go pitfall of loop variable capture. Before Go 1.22 (released on February 6, 2024), the current code would print the following:

$ go run main.go services.txt
Opening services.txt
2024/08/15 12:03:50 https://www.finconsgroup.com
2024/08/15 12:03:50 https://www.finconsgroup.com
2024/08/15 12:03:50 https://www.finconsgroup.com
2024/08/15 12:03:50 https://www.finconsgroup.com
2024/08/15 12:03:50 https://www.finconsgroup.com
2024/08/15 12:03:50 https://www.finconsgroup.com
Url: https://www.finconsgroup.com; Status: 200; Latency: 57ms
Url: https://www.finconsgroup.com; Status: 200; Latency: 73ms
Url: https://www.finconsgroup.com; Status: 200; Latency: 90ms
Url: https://www.finconsgroup.com; Status: 200; Latency: 91ms
Url: https://www.finconsgroup.com; Status: 200; Latency: 96ms
Url: https://www.finconsgroup.com; Status: 200; Latency: 103ms

Each iteration of the urls loop uses the same instance of the variable url, so the goroutines share that single variable, resulting in using the last value of the slice. This problem can be detected with go vet:

$ go vet      
# coding-challenge
# [coding-challenge]
./main.go:58:26: loop variable url captured by func literal
./main.go:63:18: loop variable url captured by func literal

To fix it, the recommended solution is to simply bind the loop variable to a new variable inside the loop, either by creating and assigning a new url or by passing a copy as an argument of the anonymous goroutine function.

This fix is not needed anymore since Go 1.22, but given that the current go.mod file specifies go 1.18, I elected to still include it in this PR.

2. Fix Results slice data race (`1e41f54` & `a1122f9`)

The code include a classic case of a data race, where the results slice is read and written by the multiple goroutines concurrently.

This problem can be detected with go run -race:

$ go run -race main.go services.txt
Opening services.txt
==================
WARNING: DATA RACE
Read at 0x00c0001142b8 by goroutine 12:
  main.HealthCheck.func1()
      /Users/mehdi/dev/tf1-coding-challenge-imkh/main.go:66 +0x199

Previous write at 0x00c0001142b8 by goroutine 11:
  main.HealthCheck.func1()
      /Users/mehdi/dev/tf1-coding-challenge-imkh/main.go:66 +0x285

Goroutine 12 (running) created at:
  main.HealthCheck()
      /Users/mehdi/dev/tf1-coding-challenge-imkh/main.go:54 +0xe5
  main.main()
      /Users/mehdi/dev/tf1-coding-challenge-imkh/main.go:37 +0x2c4

Goroutine 11 (finished) created at:
  main.HealthCheck()
      /Users/mehdi/dev/tf1-coding-challenge-imkh/main.go:54 +0xe5
  main.main()
      /Users/mehdi/dev/tf1-coding-challenge-imkh/main.go:37 +0x2c4
==================
Url: https://kubernetes.io; Status: 200; Latency: 78ms
Url: https://www.finconsgroup.com; Status: 200; Latency: 111ms
Url: https://stackoverflow.com; Status: 200; Latency: 151ms
Url: https://www.docker.com; Status: 403; Latency: 160ms
Url: https://www.google.com; Status: 200; Latency: 173ms
Url: https://go.dev; Status: 200; Latency: 208ms
Found 1 data race(s)
exit status 66

Multiple solutions exist:

Surrounding the results = append(results, result) with a sync.Mutex lock & unlock.
Using a channel.
Writing to a specific index of the results slice using the loop index, instead of appending.

Each of those solution have their pros and cons, but they don't make much difference for small test samples. The usual idiomatic approach in Go would probably be to use a channel, but it does require a bit more code. This is why I initially opted to use a mutex. The code is simple to read and understand, and we can clearly see that the goal is to prevent a data race.

One issue I quickly noticed is that the order of the urls input is not preserved in the results output. This made things more complicated when implementing the TestHealthCheck function, so I ended up going with the index write solution. I find that it's not necessarily the most explicit as to why it's done that way, so it's important to add a comment to explain the reasoning.

3. Close & drain the http.Get response body (`eb7f03e`)

The http.Get function documentation clearly states:

// When err is nil, resp always contains a non-nil resp.Body.
// Caller should close resp.Body when done reading from it.

The first step is to close the response body, even if we don't use it.

This problem can be detected with the third-party tool github.com/golangci/golangci-lint and the github.com/timakin/bodyclose linter:

$ golangci-lint run --disable-all -E bodyclose
main.go:58:25: response body must be closed (bodyclose)
                        resp, err := http.Get(url)

But that comment also technically says "when done reading from it". And the documentation of http.Response.Body states:

// The http Client and Transport guarantee that Body is always
// non-nil, even on responses without a body or responses with
// a zero-length body. It is the caller's responsibility to
// close Body. The default HTTP client's Transport may not
// reuse HTTP/1.x "keep-alive" TCP connections if the Body is
// not read to completion and closed.

Consequently, a second "optional" step is to drain it (meaning reading it to discard it). It's not strictly necessary in our use case, as our sample file of services will typically only have different URLs, so we don't need to worry about reusing TCP connections. Still, it's a good practice to do it, and we can simply read a limited amount of bytes to prevent potential performance issues.

Some resources:

4. Implement TestHealthCheck (`c453f46`)

The unit test is implemented by creating multiple mock servers, each for a status code of 201/308/404/500 + an invalid URL test case. The latency is based on the status code in milliseconds. Given that the order of the input is preserved, we cam then loop through the results to check that the values are what we expect.

Other potential improvements:

The http.Get method uses the DefaultClient of the http package. This client doesn't specify a timeout value, so a problematic server could cause the program to hang indefinitely. Creating a custom http.Client with a timeout value would fix this issue.
I noticed that the http.Get error handling doesn't set the Url field of the Result struct. I wasn't sure if it was on purpose or not since the error message usually includes the URL, so there's still feedback to the user.

…e results order

imkh added 5 commits February 13, 2025 10:34

Fix loop variable capture issue

ec24769

Fix Result slice data race with mutex

13b17f1

Close & drain the http.Get response body

2329891

Change data race fix from append with mutex to index write to preserv…

4b5ad97

…e results order

Implement TestHealthCheck

4fa688c

imkh changed the title ~~Fix loop variable capture issue~~ Objective 1: Identify and correct errors in the HealthCheck function Feb 13, 2025

imkh added the ready for review label Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Objective 1: Identify and correct errors in the HealthCheck function #1

Objective 1: Identify and correct errors in the HealthCheck function #1

Uh oh!

imkh commented Feb 13, 2025

Uh oh!

Uh oh!

Objective 1: Identify and correct errors in the HealthCheck function #1

Are you sure you want to change the base?

Objective 1: Identify and correct errors in the HealthCheck function #1

Uh oh!

Conversation

imkh commented Feb 13, 2025

1. Fix loop variable capture issue (8ee6eb0)

2. Fix Results slice data race (1e41f54 & a1122f9)

3. Close & drain the http.Get response body (eb7f03e)

4. Implement TestHealthCheck (c453f46)

Other potential improvements:

Uh oh!

Uh oh!

1. Fix loop variable capture issue (`8ee6eb0`)

2. Fix Results slice data race (`1e41f54` & `a1122f9`)

3. Close & drain the http.Get response body (`eb7f03e`)

4. Implement TestHealthCheck (`c453f46`)