Skip to content

Commit

Permalink
[Sieve]: Draft approaches (#3626)
Browse files Browse the repository at this point in the history
* [Sieve]: Draft approaches

* fixes various typos and random gibberish

* Update introduction.md

* Update exercises/practice/sieve/.approaches/comprehensions/content.md

Co-authored-by: BethanyG <BethanyG@users.noreply.github.com>

* Update exercises/practice/sieve/.approaches/comprehensions/content.md

Co-authored-by: BethanyG <BethanyG@users.noreply.github.com>

* Update exercises/practice/sieve/.approaches/comprehensions/content.md

Co-authored-by: BethanyG <BethanyG@users.noreply.github.com>

* Update exercises/practice/sieve/.approaches/comprehensions/content.md

Co-authored-by: BethanyG <BethanyG@users.noreply.github.com>

* Update exercises/practice/sieve/.approaches/nested-loops/content.md

Co-authored-by: BethanyG <BethanyG@users.noreply.github.com>

* Update exercises/practice/sieve/.approaches/comprehensions/content.md

Co-authored-by: BethanyG <BethanyG@users.noreply.github.com>

* Update exercises/practice/sieve/.approaches/comprehensions/snippet.txt

Co-authored-by: BethanyG <BethanyG@users.noreply.github.com>

* Update exercises/practice/sieve/.approaches/introduction.md

Co-authored-by: BethanyG <BethanyG@users.noreply.github.com>

* Update exercises/practice/sieve/.approaches/nested-loops/content.md

Co-authored-by: BethanyG <BethanyG@users.noreply.github.com>

* Update exercises/practice/sieve/.approaches/nested-loops/snippet.txt

Co-authored-by: BethanyG <BethanyG@users.noreply.github.com>

* Update exercises/practice/sieve/.approaches/comprehensions/content.md

Does this add a spurious extra space after the link?

Co-authored-by: BethanyG <BethanyG@users.noreply.github.com>

* Removed graph from content.md

To save us forgetting it later.

* Delete timeit_bar_plot.svg

I didn't intend to commit this in the first place.

* removed space from content.md

* Update exercises/practice/sieve/.approaches/nested-loops/content.md

* Update exercises/practice/sieve/.approaches/nested-loops/content.md

* Update exercises/practice/sieve/.approaches/introduction.md

* Update exercises/practice/sieve/.approaches/introduction.md

* Update exercises/practice/sieve/.approaches/introduction.md

* Code Block Corrections

Somehow, the closing of the codeblocks got dropped.  Added them back in, along with final typo corrections.

---------

Co-authored-by: BethanyG <BethanyG@users.noreply.github.com>
  • Loading branch information
colinleach and BethanyG authored Feb 12, 2024
1 parent 5dd5af1 commit 7e3a633
Show file tree
Hide file tree
Showing 17 changed files with 2,173 additions and 0 deletions.
36 changes: 36 additions & 0 deletions exercises/practice/sieve/.approaches/comprehensions/content.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Comprehensions

```python
def primes(number):
prime = (item for item in range(2, number+1)
if item not in (not_prime for item in range(2, number+1)
for not_prime in range(item*item, number+1, item)))
return list(prime)
```

Many of the solutions to Sieve use `comprehensions` or `generator-expressions` at some point, but this page is about examples that put almost *everything* into a single, elaborate `generator-expression` or `comprehension`.

The above example uses a `generator-expression` to do all the calculation.

There are at least two problems with this:
- Readability is poor.
- Performance is exceptionally bad, making this the slowest solution tested, for all input sizes.

Notice the many `for` clauses in the generator.

This makes the code similar to [nested loops][nested-loops], and run time scales quadratically with the size of `number`.
In fact, when this code is compiled, it _compiles to nested loops_ that have the additional overhead of generator setup and tracking.

```python
def primes(limit):
return [number for number in range(2, limit + 1)
if all(number % divisor != 0 for divisor in range(2, number))]
```

This second example using a `list-comprehension` with `all()` is certainly concise and _relatively_ readable, but the performance is again quite poor.

This is not quite a fully nested loop (_there is a short-circuit when `all()` evaluates to `False`_), but it is by no means "performant".
In this case, scaling with input size is intermediate between linear and quadratic, so not quite as bad as the first example.


[nested-loops]: https://exercism.org/tracks/python/exercises/sieve/approaches/nested-loops
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
def primes(limit):
return [number for number in range(2, limit + 1) if
all(number % divisor != 0 for divisor in range(2, number))]
40 changes: 40 additions & 0 deletions exercises/practice/sieve/.approaches/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
{
"introduction": {
"authors": [
"colinleach",
"BethanyG"
]
},
"approaches": [
{
"uuid": "85752386-a3e0-4ba5-aca7-22f5909c8cb1",
"slug": "nested-loops",
"title": "Nested Loops",
"blurb": "Relativevly clear solutions with explicit loops.",
"authors": [
"colinleach",
"BethanyG"
]
},
{
"uuid": "04701848-31bf-4799-8093-5d3542372a2d",
"slug": "set-operations",
"title": "Set Operations",
"blurb": "Performance enhancements with Python sets.",
"authors": [
"colinleach",
"BethanyG"
]
},
{
"uuid": "183c47e3-79b4-4afb-8dc4-0deaf094ce5b",
"slug": "comprehensions",
"title": "Comprehensions",
"blurb": "Ultra-concise code and its downsides.",
"authors": [
"colinleach",
"BethanyG"
]
}
]
}
99 changes: 99 additions & 0 deletions exercises/practice/sieve/.approaches/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Introduction

The key to this exercise is to keep track of:
- A list of numbers.
- Their status of possibly being prime.


## General Guidance

To solve this exercise, it is necessary to choose one or more appropriate data structures to store numbers and status, then decide the best way to scan through them.

There are many ways to implement the code, and the three broad approaches listed below are not sharply separated.


## Approach: Using nested loops

```python
def primes(number):
not_prime = []
prime = []

for item in range(2, number+1):
if item not in not_prime:
prime.append(item)
for element in range(item*item, number+1, item):
not_prime.append(element)

return prime
```

The theme here is nested, explicit `for` loops to move through ranges, testing validity as we go.

For details and another example see [`nested-loops`][approaches-nested].


## Approach: Using set operations

```python
def primes(number):
not_prime = set()
primes = []

for num in range(2, number+1):
if num not in not_prime:
primes.append(num)
not_prime.update(range (num*num, number+1, num))

return primes
```

In this group, the code uses the special features of the Python [`set`][sets] to improve efficiency.

For details and other examples see [`set-operations`][approaches-sets].


## Approach: Using complex or nested comprehensions


```python
def primes(limit):
return [number for number in range(2, limit + 1) if
all(number % divisor != 0 for divisor in range(2, number))]
```

Here, the emphasis is on implementing a solution in the minimum number of lines, even at the expense of readability or performance.

For details and another example see [`comprehensions`][approaches-comps].


## Using packages outside base Python


In statically typed languages, common approaches include bit arrays and arrays of booleans.

Neither of these is a natural fit for core Python, but there are external packages that could perhaps provide a better implementation:

- For bit arrays, there is the [`bitarray`][bitarray] package and [`bitstring.BitArray()`][bitstring].
- For arrays of booleans, we could use the NumPy package: `np.ones((number,), dtype=np.bool_)` will create a pre-dimensioned array of `True`.

It should be stressed that these will not work in the Exercism test runner, and are mentioned here only for completeness.

## Which Approach to Use?


This exercise is for learning, and is not directly relevant to production code.

The point is to find a solution which is correct, readable, and remains reasonably fast for larger input values.

The "set operations" example above is clean, readable, and in benchmarking was the fastest code tested.

Further details of performance testing are given in the [Performance article][article-performance].

[approaches-nested]: https://exercism.org/tracks/python/exercises/sieve/approaches/nested-loops
[approaches-sets]: https://exercism.org/tracks/python/exercises/sieve/approaches/set-operations
[approaches-comps]: https://exercism.org/tracks/python/exercises/sieve/approaches/comprehensions
[article-performance]:https://exercism.org/tracks/python/exercises/sieve/articles/performance
[sets]: https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset
[bitarray]: https://pypi.org/project/bitarray/
[bitstring]: https://bitstring.readthedocs.io/en/latest/
49 changes: 49 additions & 0 deletions exercises/practice/sieve/.approaches/nested-loops/content.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Nested Loops


```python
def primes(number):
not_prime = []
prime = []

for item in range(2, number+1):
if item not in not_prime:
prime.append(item)
for element in range (item*item, number+1, item):
not_prime.append(element)

return prime
```

This is the type of code that many people might write as a first attempt.

It is very readable and passes the tests.

The clear disadvantage is that run time is quadratic in the input size: `O(n**2)`, so this approach scales poorly to large input values.

Part of the problem is the line `if item not in not_prime`, where `not-prime` is a list that may be long and unsorted.

This operation requires searching the entire list, so run time is linear in list length: not ideal within a loop repeated many times.

```python
def primes(number):
number += 1
prime = [True for item in range(number)]
for index in range(2, number):
if not prime[index]:
continue
for candidate in range(2 * index, number, index):
prime[candidate] = False
return [index for index, value in enumerate(prime) if index > 1 and value]
```


At first sight, this second example looks quite similar to the first.

However, on testing it performs much better, scaling linearly with `number` rather than quadratically.

A key difference is that list entries are tested by index: `if not prime[index]`.

This is a constant-time operation independent of the list length.

Relatively few programmers would have predicted such a major difference just by looking at the code, so if performance matters we should always test, not guess.
8 changes: 8 additions & 0 deletions exercises/practice/sieve/.approaches/nested-loops/snippet.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
def primes(number):
number += 1
prime = [True for item in range(number)]
for index in range(2, number):
if not prime[index]: continue
for candidate in range(2 * index, number, index):
prime[candidate] = False
return [index for index, value in enumerate(prime) if index > 1 and value]
69 changes: 69 additions & 0 deletions exercises/practice/sieve/.approaches/set-operations/content.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Set Operations


```python
def primes(number):
not_prime = set()
primes = []

for num in range(2, number+1):
if num not in not_prime:
primes.append(num)
not_prime.update(range(num*num, number+1, num))

return primes
```


This is the fastest method so far tested, at all input sizes.

With only a single loop, performance scales linearly: `O(n)`.

A key step is the set `update()`.

Less commonly seen than `add()`, which takes single element, `update()` takes any iterator of hashable values as its parameter and efficiently adds all the elements in a single operation.

In this case, the iterator is a range resolving to all multiples, up to the limit, of the prime we just found.

Primes are collected in a list, in ascending order, so there is no need for a separate sort operation at the end.


```python
def primes(number):
numbers = set(item for item in range(2, number+1))

not_prime = set(not_prime for item in range(2, number+1)
for not_prime in range(item**2, number+1, item))

return sorted(list((numbers - not_prime)))
```

After a set comprehension in place of an explicit loop, the second example uses set-subtraction as a key feature in the return statement.

The resulting set needs to be converted to a list then sorted, which adds some overhead, [scaling as O(n *log* n)][sort-performance].

In performance testing, this code is about 4x slower than the first example, but still scales as `O(n)`.


```python
def primes(number: int) -> list[int]:
start = set(range(2, number + 1))
return sorted(start - {m for n in start for m in range(2 * n, number + 1, n)})
```

The third example is quite similar to the second, just moving the comprehension into the return statement.

Performance is very similar between examples 2 and 3 at all input values.


## Sets: strengths and weaknesses

Sets offer two main benefits which can be useful in this exercise:
- Entries are guaranteed to be unique.
- Determining whether the set contains a given value is a fast, constant-time operation.

Less positively:
- The exercise specification requires a list to be returned, which may involve a conversion.
- Sets have no guaranteed ordering, so two of the above examples incur the time penalty of sorting a list at the end.

[sort-performance]: https://en.wikipedia.org/wiki/Timsort
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
def primes(number):
not_prime = set()
primes = []
for num in range(2, number+1):
if num not in not_prime:
primes.append(num)
not_prime.update(range(num*num, number+1, num))
return primes
14 changes: 14 additions & 0 deletions exercises/practice/sieve/.articles/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"articles": [
{
"slug": "performance",
"uuid": "fdbee56a-b4db-4776-8aab-3f7788c612aa",
"title": "Performance deep dive",
"authors": [
"BethanyG",
"colinleach"
],
"blurb": "Results and analysis of timing tests for the various approaches."
}
]
}
Loading

0 comments on commit 7e3a633

Please sign in to comment.