feat: optimizing the prune function at the apriori_algorithm.py archive #12992
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Describe your change:
Added an optimized version of the
prune
function usingCounter
to improve performancewhen checking candidate itemsets for frequent items.
I used as a test base a gradually increasing size of the
itemset
list to demonstratethe inefficiency of the original algorithm, which had a complexity of O(n * c * i),
where n is the size of
itemset
, c is the number of candidates, and i is the number ofitems in each candidate.
The new solution reduces the complexity to O(n + c * i). Previously, the algorithm would
iterate over
itemset
(O(n)) and count occurrences for each item (O(n)) every time itneeded to check a candidate, resulting in repeated costly operations.
To optimize this, I used an auxiliary dictionary (via
Counter
) where each key is anitem and its value is the number of occurrences in
itemset
. This allows both the checkand count operations to be performed in constant time O(1).
As a result, the performance improvement is significant, at the cost of a small additional
memory usage, which is a worthwhile trade-off. This improvement can be observed by
comparing the execution of both algorithms (as shown in the attached image).
Here is the graph comparing both functions:
pruneOptimized_prune_algoritm_results.pdf
Unit tests were also conducted on my local machine to ensure the consistency of results between the two methods, but they are not included in this PR.
Checklist: