How to use item_exclude

Hi!

Thanks for a really impressive package.

In my world there are two common scenarios when building recommendation systems. You either want to recommend products that a customer has never liked (or bought) from your *whole* catalogue or you want to recommend products from a subset of the catalogue, e.g. products that are discounted. Most implementations of collaborative filtering focus on the first scenario. My question is how to use the `item_exclude` to tackle the second scenario. This is somewhat related to a previous [issue
](https://github.com/dselivanov/rsparse/issues/19)

For instance, say that we have 60 artists whose album are on sale in the [`lastfm` dataset](http://ocelma.net/MusicRecommendationDataset/lastfm-360K.html) that we want to recommend.

Example code from: http://dsnotes.com/post/2017-06-28-matrix-factorization-for-recommender-systems-part-2/
```{r}
set.seed(1)
library(data.table)
raw_data = fread("lastfm-dataset-360K/usersha1-artmbid-artname-plays.tsv",
                 showProgress = FALSE, encoding = "UTF-8",
                 quote = "")
setnames(raw_data, c("user_id", "artist_id", "artist_name", "number_plays"))

user_encoding <- raw_data[, .(uid = .GRP), keyby = user_id]

item_encoding = raw_data[, .(iid = .GRP, artist_name = artist_name[[1]]), keyby = artist_id]
```

Here I'll sample 60 artists "on sale" and create a table of items to exclude from the predictions. 

```{r}
on_sale <- sample(item_encoding$artist_name, 60)
items_exclude <- item_encoding[!(artist_name %in% on_sale)]
on_sale
 [1] "the bridge"                          "snippet"                            
 [3] "v.o.s."                              "the ullulators"                     
 [5] "藤井フミヤ"                          "erika jo"                           
 [7] "gore"                                "amaral"                             
 [9] "ceili rain"                          "schwarze puppen"                    
[11] "dan wheeler"                         "yuki suzuki"                        
[13] "krymplings"                          "olivia ruiz"                        
[15] "edgewater"                           "karl johan"                         
[17] "pamela z"                            "global spirit"                      
[19] "damien youth"                        "fires of babylon"                   
[21] "comic relief"                        "emmanuel horvilleur"                
[23] "sandra stephens"                     "cyclopede"                          
[25] "Михаил Боярский"                     "the great eastern"                  
[27] "radwimps"                            "papa austin with the great peso"    
[29] "phasen"                              "mari menari"                        
[31] "Холодне Сонце"                       "laura story"                        
[33] "mugwart"                             "errand boy"                         
[35] "erlend krauser"                      "göran fristorp"                     
[37] "mousse t & emma lanford"             "dj vlad & dirty harry"              
[39] "denim"                               "thomas leer & robert rental"        
[41] "the underdog project vs the sunclub" "sense club"                         
[43] "mary kiani"                          "ladies night"                       
[45] "tresk"                               "the peddlers"                       
[47] "quatuor ysaÿe"                       "brandhärd"                          
[49] "bittor aiape"                        "prince francis"                     
[51] "alex klaasen & martine sandifort"    "peppermint petty"                   
[53] "dave ramsey"                         "müşfik kenter"                      
[55] "shima & shikou duo"                  "jimmy j & cru-l-t"                  
[57] "ankarali yasemin"                    "marian opania"                      
[59] "madita"                              "zoltar"   
```

Below are some data manipulation to put data in a sparse matrix.

```{r}
library(Matrix)
raw_data[, artist_name := NULL]
dt = user_encoding[raw_data, .(artist_id, uid, number_plays), on = .(user_id = user_id)]
dt = item_encoding[dt, .(iid, uid, number_plays), on = .(artist_id = artist_id)]
rm(raw_data)

X = sparseMatrix(i = dt$uid, j = dt$iid, x = dt$number_plays, 
                 dimnames = list(user_encoding$user_id, item_encoding$artist_name))
N_CV = 1000L
cv_uid = sample(nrow(user_encoding), N_CV)

X_train = X[-cv_uid, ]
X_cv = X[cv_uid, ]
rm(X)
```

Here we fit the model. 

```{r}
make_confidence = function(x, alpha) {
  x_confidence = x
  stopifnot(inherits(x, "sparseMatrix"))
  x_confidence@x = 1 + alpha * x@x
  x_confidence
}
library(rsparse)
model = WRMF$new(x_train = x_train, x_cv = X_cv, rank = 8, feedback = "implicit")
set.seed(1)
alpha = 0.01
X_train_conf = make_confidence(X_train, alpha)
X_cv_history_conf = make_confidence(X_cv_history, alpha)
user_embeddings = model$fit_transform(X_train_conf, n_iter = 10L, n_threads = 8)
new_user_embeddings = model$transform(X_cv_history_conf)
```
Now, I want to recommend *only* the artists that are on sale, so I pass the excluded artists to the `items_exclude` argument.

```{r}
new_user_1 = X_cv[1:1, , drop = FALSE]
new_user_predictions = model$predict(new_user_1, k = 60, items_exclude = items_exclude$artist_name)

head(data.frame(segmentid = t(attr(new_user_predictions, "ids"))))
  e9dc15dfabe0bdac615143623e1fe83ba4e2daa5
1                                   bjÃ¶rk
2                  einstÃ¼rzende neubauten
3                                     isis
4                        frÃ©dÃ©ric chopin
5                               sigur rÃ³s
6                        ë\u008f™ë°©ì‹ ê¸°
```

However, these recommendations are not the ones on sale? 

I suppose this would be clearer for me with a *vignette*, that I can see is on its way, however, in the meanwhile, how should one use the `item_exclude` argument?

Furthermore, say we want to maximize the recommendations here, i.e. put `k = 60`, would that work for multiple users?




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use item_exclude #23

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How to use item_exclude #23

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions