Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sieve]: Draft approaches #3626

Merged
merged 23 commits into from
Feb 12, 2024
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
e6b5315
[Sieve]: Draft approaches
colinleach Feb 7, 2024
9dc62dd
fixes various typos and random gibberish
colinleach Feb 7, 2024
cbd28bf
Update introduction.md
colinleach Feb 7, 2024
fa9cacf
Update exercises/practice/sieve/.approaches/comprehensions/content.md
colinleach Feb 11, 2024
519d83b
Update exercises/practice/sieve/.approaches/comprehensions/content.md
colinleach Feb 11, 2024
32d5fc2
Update exercises/practice/sieve/.approaches/comprehensions/content.md
colinleach Feb 11, 2024
f2a054c
Update exercises/practice/sieve/.approaches/comprehensions/content.md
colinleach Feb 11, 2024
507ae33
Update exercises/practice/sieve/.approaches/nested-loops/content.md
colinleach Feb 11, 2024
6ac9677
Update exercises/practice/sieve/.approaches/comprehensions/content.md
colinleach Feb 11, 2024
64d256d
Update exercises/practice/sieve/.approaches/comprehensions/snippet.txt
colinleach Feb 11, 2024
14c281e
Update exercises/practice/sieve/.approaches/introduction.md
colinleach Feb 11, 2024
58e3e91
Update exercises/practice/sieve/.approaches/nested-loops/content.md
colinleach Feb 11, 2024
8586554
Update exercises/practice/sieve/.approaches/nested-loops/snippet.txt
colinleach Feb 11, 2024
6384360
Update exercises/practice/sieve/.approaches/comprehensions/content.md
colinleach Feb 11, 2024
a5fcfcb
Removed graph from content.md
colinleach Feb 11, 2024
7185ade
Delete timeit_bar_plot.svg
colinleach Feb 11, 2024
227ac81
removed space from content.md
colinleach Feb 12, 2024
f52900d
Update exercises/practice/sieve/.approaches/nested-loops/content.md
BethanyG Feb 12, 2024
1c13bfd
Update exercises/practice/sieve/.approaches/nested-loops/content.md
BethanyG Feb 12, 2024
6600f74
Update exercises/practice/sieve/.approaches/introduction.md
BethanyG Feb 12, 2024
26d0e08
Update exercises/practice/sieve/.approaches/introduction.md
BethanyG Feb 12, 2024
ec44cdf
Update exercises/practice/sieve/.approaches/introduction.md
BethanyG Feb 12, 2024
88497c2
Code Block Corrections
BethanyG Feb 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions exercises/practice/sieve/.approaches/comprehensions/content.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Comprehensions

```python
def primes(number):
prime = (item for item in range(2, number+1)
if item not in (not_prime for item in range(2, number+1)
for not_prime in range(item*item, number+1, item)))
return list(prime)
```

Many of the solutions to Sieve use `comprehensions` or `generator-expressions` at some point, but this page is about examples that put almost *everything* into a single, elaborate `generator-expression` or `comprehension`.

The above example uses a `generator-expression` to do all the calculation.

There are at least two problems with this:
- Readability is poor.
- Performance is exceptionally bad, making this the slowest solution tested, for all input sizes.

Notice the many `for` clauses in the generator.

This makes the code similar to [nested loops][nested-loops], and run time scales quadratically with the size of `number`.
colinleach marked this conversation as resolved.
Show resolved Hide resolved
In fact, when this code is compiled, it _compiles to nested loops_ that have the additional overhead of generator setup and tracking.

```python
def primes(limit):
return [number for number in range(2, limit + 1)
if all(number % divisor != 0 for divisor in range(2, number))]

This second example using a `list-comprehension` with `all()` is certainly concise and _relatively_ readable, but the performance is again quite poor.

This is not quite a fully nested loop (_there is a short-circuit when `all()` evaluates to `False`_), but it is by no means "performant".
In this case, scaling with input size is intermediate between linear and quadratic, so not quite as bad as the first example.
colinleach marked this conversation as resolved.
Show resolved Hide resolved


[nested-loops]: https://exercism.org/tracks/python/exercises/sieve/approaches/nested-loops
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
def primes(limit):
return [number for number in range(2, limit + 1) if
all(number % divisor != 0 for divisor in range(2, number))]
40 changes: 40 additions & 0 deletions exercises/practice/sieve/.approaches/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
{
"introduction": {
"authors": [
"colinleach",
"BethanyG"
]
},
"approaches": [
{
"uuid": "85752386-a3e0-4ba5-aca7-22f5909c8cb1",
"slug": "nested-loops",
"title": "Nested Loops",
"blurb": "Relativevly clear solutions with explicit loops.",
"authors": [
"colinleach",
"BethanyG"
]
},
{
"uuid": "04701848-31bf-4799-8093-5d3542372a2d",
"slug": "set-operations",
"title": "Set Operations",
"blurb": "Performance enhancements with Python sets.",
"authors": [
"colinleach",
"BethanyG"
]
},
{
"uuid": "183c47e3-79b4-4afb-8dc4-0deaf094ce5b",
"slug": "comprehensions",
"title": "Comprehensions",
"blurb": "Ultra-concise code and its downsides.",
"authors": [
"colinleach",
"BethanyG"
]
}
]
}
80 changes: 80 additions & 0 deletions exercises/practice/sieve/.approaches/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Introduction

The key to this exercise is to keep track of:
- A list of numbers.
- Their status of possibly being prime.

## General Guidance

To solve this exercise, it is necessary to choose one or more appropriate data structures to store numbers and status, then decide the best way to scan through them.

There are many ways to implement the code, and the three broad approaches listed below are not sharply separated.

## Approach: Using nested loops

```python
def primes(number):
not_prime = []
prime = []

for item in range(2, number+1):
if item not in not_prime:
prime.append(item)
for element in range(item*item, number+1, item):
not_prime.append(element)

return prime
```

The theme here is nested, explicit `for` loops to move through ranges, testing validity as we go.

For details and another example see [`nested-loops`][approaches-nested].

## Approach: Using set operations

```python
def primes(number):
not_prime = set()
primes = []

for num in range(2, number+1):
if num not in not_prime:
primes.append(num)
not_prime.update(range (num*num, number+1, num))

return primes
```

In this group, the code uses the special features of the Python [`set`][sets] to improve efficiency.

For details and other examples see [`set-operations`][approaches-sets].

## Approach: Using complex or nested comprehensions

```python
def primes(limit):
return [number for number in range(2, limit + 1) if
all(number % divisor != 0 for divisor in range(2, number))]
- For bit arrays, there is the [`bitarray`][bitarray] package and [`bitstring.BitArray()`][bitstring].
- For arrays of booleans, we could use the NumPy package: `np.ones((number,), dtype=np.bool_)` will create a pre-dimensioned array of `True`.

BethanyG marked this conversation as resolved.
Show resolved Hide resolved
It should be stressed that these will not work in the Exercism test runner, and are mentioned here only for completeness.


## Which Approach to Use?

This exercise is for learning, and is not directly relevant to production code.

The point is to find a solution which is correct, readable, and remains reasonably fast for larger input values.

The "set operations" example above is clean, readable, and in benchmarking was the fastest code tested.

Further details of perfomance testing are given in the [Performance article][article-performance].

[approaches-nested]: https://exercism.org/tracks/python/exercises/sieve/approaches/nested-loops
[approaches-sets]: https://exercism.org/tracks/python/exercises/sieve/approaches/set-operations
[approaches-comps]: https://exercism.org/tracks/python/exercises/sieve/approaches/comprehensions
[article-performance]:https://exercism.org/tracks/python/exercises/sieve/articles/performance
[sets]: https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset
[bitarray]: https://pypi.org/project/bitarray/
[bitstring]: https://bitstring.readthedocs.io/en/latest/
37 changes: 37 additions & 0 deletions exercises/practice/sieve/.approaches/nested-loops/content.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Nested Loops

```python
def primes(number):
not_prime = []
prime = []

for item in range(2, number+1):
if item not in not_prime:
prime.append(item)
for element in range (item*item, number+1, item):
not_prime.append(element)

return prime
```

This is the type of code that many people might write as a first attempt.

It is very readable and passes the tests.

The clear disadvantage is that run time is quadratic in the input size: `O(n**2)`, so this approach scales poorly to large input values.

Part of the problem is the line `if item not in not_prime`, where `not-prime` is a list that may be long and unsorted.

This operation requires searching the entire list, so run time is linear in list length: not ideal within a loop repeated many times.

```python
def primes(number):
number += 1
prime = [True for item in range(number)]
for index in range(2, number):
if not prime[index]:
continue
for candidate in range(2 * index, number, index):
prime[candidate] = False
return [index for index, value in enumerate(prime) if index > 1 and value]
Relatively few programmers would have predicted such a major difference just by looking at the code, so if performance matters we should always test, not guess.
8 changes: 8 additions & 0 deletions exercises/practice/sieve/.approaches/nested-loops/snippet.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
def primes(number):
number += 1
prime = [True for item in range(number)]
for index in range(2, number):
if not prime[index]: continue
for candidate in range(2 * index, number, index):
prime[candidate] = False
return [index for index, value in enumerate(prime) if index > 1 and value]
65 changes: 65 additions & 0 deletions exercises/practice/sieve/.approaches/set-operations/content.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Set Operations

```python
def primes(number):
not_prime = set()
primes = []

for num in range(2, number+1):
if num not in not_prime:
primes.append(num)
not_prime.update(range(num*num, number+1, num))

return primes
```

This is the fastest method so far tested, at all input sizes.

With only a single loop, performance scales linearly: O(n).

A key step is the set `update()`.

Less commonly seen than `add()`, which takes single element, `update()` takes any iterator of hashable values as its parameter and efficiently adds all the elements in a single operation.

In this case, the iterator is a range resolving to all multiples, up to the limit, of the prime we just found.

Primes are collected in a list, in ascending order, so there is no need for a separate sort operation at the end.


```python
def primes(number):
numbers = set(item for item in range(2, number+1))

not_prime = set(not_prime for item in range(2, number+1)
for not_prime in range(item**2, number+1, item))

return sorted(list((numbers - not_prime)))
```

After a set comprehension in place of an explicit loop, the second example uses set-subtraction as a key feature in the return statement.

The resulting set needs to be converted to a list then sorted, which adds some overhead, [scaling as O(n *log* n)][sort-performance].

In performance testing, this code is about 4x slower than the the first example, but still scales as O(n).

```python
def primes(number: int) -> list[int]:
start = set(range(2, number + 1))
return sorted(start - {m for n in start for m in range(2 * n, number + 1, n)})
```

The third example is quite similar to the second, just moving the comprehension into the return statement.

Performance is very similar between examples 2 and 3 at all input values.

## Sets: strengths and weaknesses

Sets offer two main benefits which can be useful in this exercise:
- Entries are guaranteed to be unique.
- Determining whether the set contains a given value is a fast, constant-time operation.

Less positively:
- The exercise specification requires a list to be returned, which may involve a conversion.
- Sets have no guaranteed ordering, so two of the above examples incur the time penalty of sorting a list at the end.

[sort-performance]: https://en.wikipedia.org/wiki/Timsort
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
def primes(number):
not_prime = set()
primes = []
for num in range(2, number+1):
if num not in not_prime:
primes.append(num)
not_prime.update(range(num*num, number+1, num))
return primes
14 changes: 14 additions & 0 deletions exercises/practice/sieve/.articles/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"articles": [
{
"slug": "performance",
"uuid": "fdbee56a-b4db-4776-8aab-3f7788c612aa",
"title": "Performance deep dive",
"authors": [
"BethanyG",
"colinleach"
],
"blurb": "Results and analysis of timing tests for the various approaches."
}
]
}
Loading
Loading