Objective 2: Propose a better solution #2

imkh · 2025-02-13T09:42:36Z

Originally created on 19 August 2024 at 00:23 GMT+2

The goal of this PR is to propose a better architecture to the current program, one more robust and capable of handling a file with millions of URLs.

1. Refactor the program with the implementation of a worker pool

With a file containing millions of URLs, the previous code would create a goroutine for each, without any regards for system resources. A common pattern to handle this kind of use case is to implement a worker pool. We can combine 2 different channels for communication (one for the number of jobs to work through, and another for the results) with a fixed number of workers which will run concurrently and pick the URLs to process one-by-one, until they're all done.

This limits the number of goroutines to at most the number of concurrent workers (plus 2 or 3 more, for various setup and checking when it's all done), which prevents making too many requests & using too much memory at once. Communication is done through channels to track the progress of the work, and a WaitGroup is once again used to wait for all the workers to finish.

2. Remove the GetServices function and process URLs as soon as they're read

The file reading step is moved directly into the worker pool logic. As soon as a URL is read, it's sent to the corresponding channel so that it can be processed directly.

3. Return results as soon as they're processed

Similarly, the second channel outputs result as soon as they're done. This gives faster feedback to a user or another program calling the new code.

4. Encapsulated WorkerPool struct to offer sensible defaults and configuration options

The WorkerPool struct contains the entire concurrency logic and publicly exposes a constructor that sets sensible defaults and allows for the customization of the following options:

Custom HTTP client to specify a custom timeout value or other HTTP options (default client is set to a timeout of 30 seconds)
Number of concurrent workers (default is set to 10, and can be configured between 1 and 100). The main function also exposes a -workers command-line flags, with the value passed to the WorkerPool constructor.

Other potential improvements:

Adding a context to handle cancellation events such as SIGINT/CTRL+C or SIGTERM.
Error handling for scanner.Err(). I wasn't able to trigger the error in my testing, but I ran into issue with that in the past with other programs, particularly when reading from stdin.

* default values for number of worker & http client timeout * configuration for number of worker & custom http client * min and max number of workers

imkh added 5 commits February 13, 2025 10:41

Refactor HealthCheck into a worker pool

8003d78

Implement TestHealthCheck

ee203b5

Encapsulates worker pool's logic into WorkerPool struct to add:

83848a2

* default values for number of worker & http client timeout * configuration for number of worker & custom http client * min and max number of workers

Add -workers command-line flag to specify the number of workers

e197fa2

Move related code to separate files and add some comments

b7826bb

imkh added the ready for review label Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Objective 2: Propose a better solution #2

Objective 2: Propose a better solution #2

Uh oh!

imkh commented Feb 13, 2025

Uh oh!

Uh oh!

Objective 2: Propose a better solution #2

Are you sure you want to change the base?

Objective 2: Propose a better solution #2

Uh oh!

Conversation

imkh commented Feb 13, 2025

1. Refactor the program with the implementation of a worker pool

2. Remove the GetServices function and process URLs as soon as they're read

3. Return results as soon as they're processed

4. Encapsulated WorkerPool struct to offer sensible defaults and configuration options

Other potential improvements:

Uh oh!

Uh oh!