Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] SAR needs to be modified due to a breaking change in scipy #1954

Closed
miguelgfierro opened this issue Jul 3, 2023 · 9 comments
Closed
Labels
bug Something isn't working

Comments

@miguelgfierro
Copy link
Collaborator

miguelgfierro commented Jul 3, 2023

Description

With scipy 1.10.1, the item similarity matrix is a dense matrix

print(type(model.item_similarity))
print(type(model.user_affinity))
print(type(model.item_similarity) == np.ndarray)
print(type(model.item_similarity) == scipy.sparse._csr.csr_matrix)
print(model.item_similarity.shape)
print(model.item_similarity)

<class 'numpy.ndarray'>
<class 'scipy.sparse._csr.csr_matrix'>
True
False
(1646, 1646)
[[1.         0.10650888 0.03076923 ... 0.         0.         0.        ]
 [0.10650888 1.         0.15104167 ... 0.         0.00729927 0.00729927]
 [0.03076923 0.15104167 1.         ... 0.         0.         0.01190476]
 ...
 [0.         0.         0.         ... 1.         0.         0.        ]
 [0.         0.00729927 0.         ... 0.         1.         0.        ]
 [0.         0.00729927 0.01190476 ... 0.         0.         1.        ]]

but with scipy 1.11.1 the item similarity matrix is sparse

print(type(model.item_similarity))
print(type(model.user_affinity))
type(model.item_similarity) == np.ndarray
type(model.item_similarity) == scipy.sparse._csr.csr_matrix
print(model.item_similarity.shape)
<class 'numpy.ndarray'>
<class 'scipy.sparse._csr.csr_matrix'>
()

In which platform does it happen?

Related to #1951

How do we replicate the issue?

Expected behavior (i.e. solution)

Other Comments

We found that the issue was that during a division in Jaccard, scipy change the type. We talked to the authors of scipy and they told us that they did a breaking change in 1.11.0 scipy/scipy#18796 (comment)

@miguelgfierro miguelgfierro added the bug Something isn't working label Jul 3, 2023
@miguelgfierro miguelgfierro changed the title [BUG] SAR is working weirdly with the latest scipy [BUG] SAR needs to be modified due to a breaking change in spicy Jul 4, 2023
@anargyri
Copy link
Collaborator

anargyri commented Jul 4, 2023

After talking with @loomlike yesterday we found a couple of issues:

  1. This docstring needs to be updated, we return a sparse csr matrix, not a numpy array https://github.com/microsoft/recommenders/blob/787ae309ec78a9b2b1f58931931cb117affc4ea9/recommenders/models/sar/sar_singlenode.py#L190

  2. These lines are not consistent wrt. the type of item_similarity, in some it will be sparse in others numpy array https://github.com/microsoft/recommenders/blob/787ae309ec78a9b2b1f58931931cb117affc4ea9/recommenders/models/sar/sar_singlenode.py#L294

  3. Jaccard, for example, will return a numpy array https://github.com/microsoft/recommenders/blob/787ae309ec78a9b2b1f58931931cb117affc4ea9/recommenders/utils/python_utils.py#L65
    This casting to numpy array is the source of the bug, the behaviour of the casting must have changed during the recent scipy releases.

We can remove the line that casts to numpy array and ensure that everything is a sparse csr matrix. That said, the cooccurrence and item_similarity matrices are not expected to exhibit high levels of sparsity in general (but this depends on the threshold as well).

I see this commit was responsible for the casting, @gramhagen do you recall what was the rationale behind it?

@miguelgfierro
Copy link
Collaborator Author

@anargyri, do you think you will have time to work on this issue?

@anargyri
Copy link
Collaborator

anargyri commented Jul 4, 2023

@anargyri, do you think you will have time to work on this issue?

I can fix it, depending on how Scott responds.

@anargyri
Copy link
Collaborator

anargyri commented Jul 4, 2023

It looks like this was the original rationale behind using numpy array here #465
It's about efficiency of the multiplication when item_similarity is effectively dense.
So, how about I keep the type as numpy array but instead of the cast use csr_matrix.toarray() ?

@miguelgfierro
Copy link
Collaborator Author

yes, that could work. The issue though comes from the operation in the Jaccard and the other similarity matrices. In scpicy <1.10, we will get a dense matrix, in scipy >1.11 we will get a sparse, and then we will need to transform them to dense.

@anargyri
Copy link
Collaborator

Yes, but csr_matrix.toarray() returns a dense array by definition, in all scipy versions.

@gramhagen
Copy link
Collaborator

iirc that casting was done to speed up computation, so whatever method is needed to keep that should be fine.

miguelgfierro added a commit that referenced this issue Aug 29, 2023
@anargyri anargyri changed the title [BUG] SAR needs to be modified due to a breaking change in spicy [BUG] SAR needs to be modified due to a breaking change in scipy Apr 8, 2024
@miguelgfierro
Copy link
Collaborator Author

What Andreas and I discussed:

  1. Maybe replacing x*y operation with x@y works https://docs.scipy.org/doc/scipy/reference/sparse.html#module-scipy.sparse
  2. If 1 doesn't work, we will need a variable installation depending on the python version and a different computation

@SimonYansenZhao
Copy link
Collaborator

Fixed in PR #2083

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants