[BUG] SAR needs to be modified due to a breaking change in scipy #1954

miguelgfierro · 2023-07-03T16:41:19Z

Description

With scipy 1.10.1, the item similarity matrix is a dense matrix

print(type(model.item_similarity))
print(type(model.user_affinity))
print(type(model.item_similarity) == np.ndarray)
print(type(model.item_similarity) == scipy.sparse._csr.csr_matrix)
print(model.item_similarity.shape)
print(model.item_similarity)

<class 'numpy.ndarray'>
<class 'scipy.sparse._csr.csr_matrix'>
True
False
(1646, 1646)
[[1.         0.10650888 0.03076923 ... 0.         0.         0.        ]
 [0.10650888 1.         0.15104167 ... 0.         0.00729927 0.00729927]
 [0.03076923 0.15104167 1.         ... 0.         0.         0.01190476]
 ...
 [0.         0.         0.         ... 1.         0.         0.        ]
 [0.         0.00729927 0.         ... 0.         1.         0.        ]
 [0.         0.00729927 0.01190476 ... 0.         0.         1.        ]]

but with scipy 1.11.1 the item similarity matrix is sparse

print(type(model.item_similarity))
print(type(model.user_affinity))
type(model.item_similarity) == np.ndarray
type(model.item_similarity) == scipy.sparse._csr.csr_matrix
print(model.item_similarity.shape)
<class 'numpy.ndarray'>
<class 'scipy.sparse._csr.csr_matrix'>
()

In which platform does it happen?

Related to #1951

How do we replicate the issue?

Expected behavior (i.e. solution)

Other Comments

We found that the issue was that during a division in Jaccard, scipy change the type. We talked to the authors of scipy and they told us that they did a breaking change in 1.11.0 scipy/scipy#18796 (comment)

The text was updated successfully, but these errors were encountered:

anargyri · 2023-07-04T12:03:41Z

After talking with @loomlike yesterday we found a couple of issues:

This docstring needs to be updated, we return a sparse csr matrix, not a numpy array https://github.com/microsoft/recommenders/blob/787ae309ec78a9b2b1f58931931cb117affc4ea9/recommenders/models/sar/sar_singlenode.py#L190
These lines are not consistent wrt. the type of item_similarity, in some it will be sparse in others numpy array https://github.com/microsoft/recommenders/blob/787ae309ec78a9b2b1f58931931cb117affc4ea9/recommenders/models/sar/sar_singlenode.py#L294
Jaccard, for example, will return a numpy array https://github.com/microsoft/recommenders/blob/787ae309ec78a9b2b1f58931931cb117affc4ea9/recommenders/utils/python_utils.py#L65
This casting to numpy array is the source of the bug, the behaviour of the casting must have changed during the recent scipy releases.

We can remove the line that casts to numpy array and ensure that everything is a sparse csr matrix. That said, the cooccurrence and item_similarity matrices are not expected to exhibit high levels of sparsity in general (but this depends on the threshold as well).

I see this commit was responsible for the casting, @gramhagen do you recall what was the rationale behind it?

miguelgfierro · 2023-07-04T14:32:05Z

@anargyri, do you think you will have time to work on this issue?

anargyri · 2023-07-04T14:40:37Z

@anargyri, do you think you will have time to work on this issue?

I can fix it, depending on how Scott responds.

anargyri · 2023-07-04T15:20:13Z

It looks like this was the original rationale behind using numpy array here #465
It's about efficiency of the multiplication when item_similarity is effectively dense.
So, how about I keep the type as numpy array but instead of the cast use csr_matrix.toarray() ?

miguelgfierro · 2023-07-05T13:48:43Z

yes, that could work. The issue though comes from the operation in the Jaccard and the other similarity matrices. In scpicy <1.10, we will get a dense matrix, in scipy >1.11 we will get a sparse, and then we will need to transform them to dense.

anargyri · 2023-07-10T10:33:41Z

Yes, but csr_matrix.toarray() returns a dense array by definition, in all scipy versions.

gramhagen · 2023-07-10T14:56:55Z

iirc that casting was done to speed up computation, so whatever method is needed to keep that should be fine.

Limit scipy <1.11.0 until #1954 is fixed

miguelgfierro · 2024-04-08T14:39:31Z

What Andreas and I discussed:

Maybe replacing x*y operation with x@y works https://docs.scipy.org/doc/scipy/reference/sparse.html#module-scipy.sparse
If 1 doesn't work, we will need a variable installation depending on the python version and a different computation

SimonYansenZhao · 2024-04-30T04:52:05Z

Fixed in PR #2083

miguelgfierro added the bug Something isn't working label Jul 3, 2023

miguelgfierro changed the title ~~[BUG] SAR is working weirdly with the latest scipy~~ [BUG] SAR needs to be modified due to a breaking change in spicy Jul 4, 2023

miguelgfierro mentioned this issue Jul 4, 2023

BUG: Different type when operating between scipy sparse matrix and a dense np matrix in Scipy 1.11.0 and 1.11.1 scipy/scipy#18796

Closed

miguelgfierro added a commit that referenced this issue Aug 19, 2023

Limit scipy <1.11.0 until #1954 is fixed

dee2fa1

miguelgfierro mentioned this issue Aug 19, 2023

Limit scipy <1.11.0 until #1954 is fixed #1971

Merged

4 tasks

miguelgfierro added a commit that referenced this issue Aug 29, 2023

Merge pull request #1971 from recommenders-team/miguelgfierro-patch-1

6eb15ab

Limit scipy <1.11.0 until #1954 is fixed

miguelgfierro mentioned this issue Aug 29, 2023

[BUG] Set scipy version back to use the latest one #1980

Closed

SimonYansenZhao mentioned this issue Sep 2, 2023

Add support for Python 3.10 and 3.11 #1937

Merged

4 tasks

SimonYansenZhao mentioned this issue Feb 22, 2024

[BUG] sar_movielens.ipynb - top_k = model.recommend_k_items(test, top_k=TOP_K, remove_seen=True) error #2015

Closed

anargyri changed the title ~~[BUG] SAR needs to be modified due to a breaking change in spicy~~ [BUG] SAR needs to be modified due to a breaking change in scipy Apr 8, 2024

miguelgfierro mentioned this issue Apr 8, 2024

SAR sparse multiplcation modification due to a breaking change in scipy #2083

Merged

5 tasks

SimonYansenZhao closed this as completed Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] SAR needs to be modified due to a breaking change in scipy #1954

[BUG] SAR needs to be modified due to a breaking change in scipy #1954

miguelgfierro commented Jul 3, 2023 •

edited

Loading

anargyri commented Jul 4, 2023 •

edited

Loading

miguelgfierro commented Jul 4, 2023

anargyri commented Jul 4, 2023

anargyri commented Jul 4, 2023

miguelgfierro commented Jul 5, 2023

anargyri commented Jul 10, 2023

gramhagen commented Jul 10, 2023

miguelgfierro commented Apr 8, 2024

SimonYansenZhao commented Apr 30, 2024

[BUG] SAR needs to be modified due to a breaking change in scipy #1954

[BUG] SAR needs to be modified due to a breaking change in scipy #1954

Comments

miguelgfierro commented Jul 3, 2023 • edited Loading

Description

In which platform does it happen?

How do we replicate the issue?

Expected behavior (i.e. solution)

Other Comments

anargyri commented Jul 4, 2023 • edited Loading

miguelgfierro commented Jul 4, 2023

anargyri commented Jul 4, 2023

anargyri commented Jul 4, 2023

miguelgfierro commented Jul 5, 2023

anargyri commented Jul 10, 2023

gramhagen commented Jul 10, 2023

miguelgfierro commented Apr 8, 2024

SimonYansenZhao commented Apr 30, 2024

miguelgfierro commented Jul 3, 2023 •

edited

Loading

anargyri commented Jul 4, 2023 •

edited

Loading