feat: optimizing the prune function at the apriori_algorithm.py archive #12992

joaoneto9 · 2025-09-24T18:48:08Z

Describe your change:

Added an optimized version of the prune function using Counter to improve performance
when checking candidate itemsets for frequent items.

I used as a test base a gradually increasing size of the itemset list to demonstrate
the inefficiency of the original algorithm, which had a complexity of O(n * c * i),
where n is the size of itemset, c is the number of candidates, and i is the number of
items in each candidate.

The new solution reduces the complexity to O(n + c * i). Previously, the algorithm would
iterate over itemset (O(n)) and count occurrences for each item (O(n)) every time it
needed to check a candidate, resulting in repeated costly operations.

To optimize this, I used an auxiliary dictionary (via Counter) where each key is an
item and its value is the number of occurrences in itemset. This allows both the check
and count operations to be performed in constant time O(1).

As a result, the performance improvement is significant, at the cost of a small additional
memory usage, which is a worthwhile trade-off. This improvement can be observed by
comparing the execution of both algorithms (as shown in the attached image).

Here is the graph comparing both functions:
pruneOptimized_prune_algoritm_results.pdf

Unit tests were also conducted on my local machine to ensure the consistency of results between the two methods, but they are not included in this PR.

Add an algorithm?
Fix a bug or typo in an existing algorithm?
Add or change doctests? -- Note: Please avoid changing both code and tests in a single pull request.
Documentation change?

Checklist:

for more information, see https://pre-commit.ci

…cture

joaoneto9 · 2025-09-24T20:04:31Z

I hadn't realized that the itemset could be a list of lists. As a result, hashing these data structures was not possible, so I switched to using tuples, which are immutable, as keys for the Counter. After this change, I noticed a slight overhead, since each item now needs to be converted into a tuple to be checked within the Counter structure. Nonetheless, there is a significant efficiency gain in the worst-case scenario, and I believe it will also improve performance in average cases. I have not yet tested these other scenarios or generated their corresponding graphs. Below is the graph reflecting the new modification.

pruneOptimized_prune_algoritm_results.pdf

joaoneto9 and others added 2 commits September 24, 2025 15:19

feat: optimizing the prune function at the apriori_algorithm.py archive

def174d

[pre-commit.ci] auto fixes from pre-commit.com hooks

c2d0613

for more information, see https://pre-commit.ci

algorithms-keeper bot added the tests are failing Do not merge until tests pass label Sep 24, 2025

joaoneto9 added 2 commits September 24, 2025 15:51

fix: fixing the unsorted importing statment

839c43a

Merge branch 'master' of https://github.com/joaoneto9/Python

81a9d8d

algorithms-keeper bot added the awaiting reviews This PR is ready to be reviewed label Sep 24, 2025

pre-commit-ci bot and others added 3 commits September 24, 2025 18:54

[pre-commit.ci] auto fixes from pre-commit.com hooks

38e849b

for more information, see https://pre-commit.ci

fix: fixing the key structure to a tuple that can be an hashable stru…

789f76d

…cture

Merge branch 'master' of https://github.com/joaoneto9/Python

42fe4b6

algorithms-keeper bot removed tests are failing Do not merge until tests pass labels Sep 24, 2025

Merge branch 'master' into master

c88b71f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: optimizing the prune function at the apriori_algorithm.py archive #12992

feat: optimizing the prune function at the apriori_algorithm.py archive #12992

Uh oh!

joaoneto9 commented Sep 24, 2025

Uh oh!

joaoneto9 commented Sep 24, 2025

Uh oh!

Uh oh!

Uh oh!

feat: optimizing the prune function at the apriori_algorithm.py archive #12992

Are you sure you want to change the base?

feat: optimizing the prune function at the apriori_algorithm.py archive #12992

Uh oh!

Conversation

joaoneto9 commented Sep 24, 2025

Describe your change:

Checklist:

Uh oh!

joaoneto9 commented Sep 24, 2025

Uh oh!

Uh oh!