Skip to content

Commit

Permalink
FEA: update
Browse files Browse the repository at this point in the history
  • Loading branch information
shanlei committed Oct 15, 2020
1 parent 4d4e528 commit 2c0540d
Show file tree
Hide file tree
Showing 3 changed files with 43 additions and 49 deletions.
84 changes: 35 additions & 49 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,13 @@ This is a repository of public data sources for Recommender Systems (RS).
### Movies
- [MovieLens](https://grouplens.org/datasets/movielens/): GroupLens Research has collected and made available rating datasets from their movie web site.
- [Netflix](https://www.kaggle.com/netflix-inc/netflix-prize-data): This is the official data set used in the Netflix Prize competition.
- [douban](https://www.kaggle.com/utmhikari/doubanmovieshortcomments): Douban Movie is a Chinese website that allows Internet users to share their comments and viewpoints about movies. This dataset contains more than 2 million short comments of 28 movies in Douban Movie website.
- [Douban](https://www.kaggle.com/utmhikari/doubanmovieshortcomments): Douban Movie is a Chinese website that allows Internet users to share their comments and viewpoints about movies. This dataset contains more than 2 million short comments of 28 movies in Douban Movie website.

### Music
- [LastFM](https://grouplens.org/datasets/hetrec-2011/): This dataset contains social networking, tagging, and music artist listening information from a set of 2K users from Last.fm online music system.
- [LFM-1b](http://www.cp.jku.at/datasets/LFM-1b/): This dataset contains more than one billion music listening events created by more than 120,000 users of Last.fm. Each listening event is characterized by artist, album, and track name, and includes a timestamp.
- [Yahoo Music](https://webscope.sandbox.yahoo.com/catalog.php?datatype=r): This dataset represents a snapshot of the Yahoo! Music community's preferences for various musical artists.

### Anime
- [Anime](https://www.kaggle.com/CooperUnion/anime-recommendations-database): This dataset contains information on user preference data from myanimelist.net. Each user is able to add anime to their completed list and give it a rating and this data set is a compilation of those ratings.

Expand All @@ -24,12 +25,13 @@ This is a repository of public data sources for Recommender Systems (RS).
- [YOOCHOOSE](https://2015.recsyschallenge.com/challenge.html): This dataset was constructed by YOOCHOOSE GmbH to support participants in the RecSys Challenge 2015.
- [Retailrocket](https://www.kaggle.com/retailrocket/ecommerce-dataset): The data has been collected from a real-world ecommerce website. It is raw data, i.e. without any content transformations, however, all values are hashed due to confidential issues.
- [Ta Feng](https://www.kaggle.com/chiranjivdas09/ta-feng-grocery-dataset): The dataset contains a Chinese grocery store transaction data from November 2000 to February 2001.
- [Amazon](http://jmcauley.ucsd.edu/data/amazon/):

### Advertisng

* [Criteo](https://www.kaggle.com/c/criteo-display-ad-challenge/data): This dataset was collected from Criteo, which consists of a portion of Criteo's traffic over a period of several days.
* [Avazu](https://www.kaggle.com/c/avazu-ctr-prediction/data): This dataset is used in avazu ctr prediction contest.
* [Ipinyou](https://pan.baidu.com/s/1kTwX2mF#list/path=%2F): This dataset was provided by iPinYou, which contains all training datasets and leaderboard testing datasets of the three seasons iPinYou Global RTB(Real-Time Bidding) Bidding Algorithm Competition.
* [Ipinyou](http://contest.ipinyou.com): This dataset was provided by iPinYou, which contains all training datasets and leaderboard testing datasets of the three seasons iPinYou Global RTB(Real-Time Bidding) Bidding Algorithm Competition.

### Check-in

Expand All @@ -42,11 +44,11 @@ This is a repository of public data sources for Recommender Systems (RS).

### Game

* [Steam](http://cseweb.ucsd.edu/~wckang/steam_reviews.json.gz): This dataset is reviews and game information from Steam, which contains 7,793,069 reviews, 2,567,538 users, and 32,135 games. In addition to the review text, the data also includes the users' play hours in each review.
* [Steam](https://github.com/kang205/SASRec): This dataset is reviews and game information from Steam, which contains 7,793,069 reviews, 2,567,538 users, and 32,135 games. In addition to the review text, the data also includes the users' play hours in each review.

### Pinterest

* [pinterest](https://github.com/hexiangnan/neural_collaborative_filtering/tree/master/Data): This dataset is originally constructed by paper Learning image and user features for recommendations in social networks for evaluating content-based image recommendation, and processed by paper Neural Collaborative Filtering.
* [Pinterest](https://github.com/hexiangnan/neural_collaborative_filtering/tree/master/Data): This dataset is originally constructed by paper Learning image and user features for recommendations in social networks for evaluating content-based image recommendation, and processed by paper Neural Collaborative Filtering.

### Website

Expand All @@ -65,51 +67,35 @@ This is a repository of public data sources for Recommender Systems (RS).
常规推荐数据集


| SN | Dataset | \#User | \#Item | \#Inteaction | Sparsity | Interaction Type | TimeStamp | User Context | Item Context | Interaction Context | Reference \(Paper/Competition/Website\) |
|----|-----------------|-----------|-----------|--------------|----------|-------------------------------------|-----------|--------------|--------------|---------------------|-----------------------------------------------|
| 1 | ml\-100k | 943 | 1,682 | 100,000 | 93\.70% | Rating <br> \[1\-5\] |||| | The MovieLens Datasets: History and Context\. |
| 2 | ml\-1m | 6,040 | 3,952 | 1,000,209 | 95\.81% | Rating <br> \[1\-5\] |||| | The MovieLens Datasets: History and Context\. |
| 3 | ml\-10m | 69,878 | 10,681 | 10,000,054 | 98\.69% | Rating <br> \[0\.5\-5\] half\-stars || || | The MovieLens Datasets: History and Context\. |
| 4 | ml\-20m | 138,493 | 27,278 | 20,000,263 | 99\.47 | Rating <br> \[0\.5\-5\] half\-stars || || | The MovieLens Datasets: History and Context\. |
| 5 | Anime | 73,515 | 11,200 | 7,813,737 | 99\.05% | Rating <br> \[\-1, 1\-10\] | | || | Kaggle: https://www.kaggle.com/CooperUnion/anime-recommendations-database |
| 6 | Epinions | 116,260 | 41,269 | 188,478 | 99\.99% | Rating <br> \[1\-5\] || | || Leveraging Social Connections to Improve Personalized Ranking for Collaborative Filtering.|
| 7 | Yelp | 1,968,703 | 209,393 | 8,021,122 | 99\.99% | Rating <br> \[1\-5\] ||||| Yelp Dataset: https://www.yelp.com/dataset |
| 8 | FourSquare\_NYC | 1,083 | 38,333 | 227,428 | 99\.45% | Click || || | Kaggle: https://www.kaggle.com/chetanism/foursquare-nyc-and-tokyo-checkin-dataset |
| 9 | FourSquare\_TKY | 2,293 | 61,858 | 537,703 | 99\.62% | Click || || | Kaggle: https://www.kaggle.com/chetanism/foursquare-nyc-and-tokyo-checkin-dataset |
| 10 | Avazu | | | 40,428,967 | | Click <br> \[0, 1\] || | || Kaggle: https://www.kaggle.com/c/avazu-ctr-prediction/data |
| 11 | Netflix | 480,189 | 17,770 | 100,480,507 | 98\.82% | Rating <br> \[1\-5\] || | | | Kaggle: https://www.kaggle.com/netflix-inc/netflix-prize-data |
| 12 | Tmall-Buy | 885,759 | 1,144,124 | 9,348,756 | 99\.99% | Buy || | | | IJCAI16 Contest: https://tianchi.aliyun.com/dataset/dataDetail?dataId=53 |
| 13 | Tmall-Click | 626,041 | 2,200,291 | 35,179,371 | 99\.99% | Click || | | | IJCAI16 Contest: https://tianchi.aliyun.com/dataset/dataDetail?dataId=53 |
| 14 | Tmall-Buy-Sums | 885,759 | 1,144,124 | 7,592,214 | 99\.99% | Buy || | || IJCAI16 Contest: https://tianchi.aliyun.com/dataset/dataDetail?dataId=53 |
| 15 | Tmall-Click-Sums | 626,041 | 2,200,291 | 24,363,557 | 99\.99% | Click || | || IJCAI16 Contest: https://tianchi.aliyun.com/dataset/dataDetail?dataId=53 |
| 16 | Adult | | | 32,561 | | income>=50k <br> \[0, 1\] | | | || Adult Dataset: http://archive.ics.uci.edu/ml/datasets/Adult |
| 17 | Gowalla |107,092 | 1,280,969 | 3,981,334 | 99\.99% | Click || | || |
| 18 | LastFM | 1,892 | 17,632 | 92,834 | 99\.72% | Click | | | || HetRec 2011 Dataset: https://grouplens.org/datasets/hetrec-2011/ |
| 19 | DIGINETICA | 600,684 | 184,047 | 993,483 | 99.99% | Click || || | CIKM Cup 2016 Track 2: Personalized E-Commerce Search Challenge<br>https://competitions.codalab.org/competitions/11161 |
| 20 | lfm1b-artists |120,322 | 3,123,496 | 65,133,026 | 99\.98% | Click ||||| |
| 21 | lfm1b-albums |120,322 | 15,641,432 | 117,997,821 | 99\.99% | Click ||||||
| 22 | lfm1b-tracks |120,322 | 31,634,450 | 319,951,294 | 99\.99% | Click ||||| |
| 23 | criteo | | | 45,850,617 | | Click | | | || |
| 24 | Book-crossing | 105284 | 340557 | 1149780 | 99.99% | Rating<br>[0-10] | ||| | Improving Recommendation Lists Through Topic Diversification |
| 25 | steam | 2,567,538 | 32,135 | 7,793,069 | 99\.99% | Buy || ||| Self-Attentive Sequential Recommendation. |
| 26 | Yahoo Music | 1,948,882 | 98,211 | 11,557,943 | 99.99% | Rating<br>[0, 100] | | || | Yahoo Rating and Classification Data: https://webscope.sandbox.yahoo.com/catalog.php?datatype=r |
| 27 | YOOCHOOSE-Buys | 509,696 | 19,949 | 1,150,753 | 99.99% | Click || | || RecSys Challenge 2015: https://2015.recsyschallenge.com/challenge.html |
| 28 | YOOCHOOSE-Clicks | 9,249,729 | 52,739 | 33,003,944 | 99.99% | Click || | || RecSys Challenge 2015: https://2015.recsyschallenge.com/challenge.html |
| 29 | Pinterest | 55,187 | 9,911 | 1,445,622 | 99.74% | | | | | | Neural Collaborative Filtering |
| 30 | IPinyou-View | 12,930,288 | 131 | 15,354,629 | 99.09% | View | |||| iPinYou Global RTB Bidding Algorithm Competition: http://contest.ipinyou.com |
| 31 | IPinyou-Click | 11,597 | 118 | 12,683 | 99.07% | Click | |||| iPinYou Global RTB Bidding Algorithm Competition: http://contest.ipinyou.com |
| 32 | IPinyou-View-Sums | 12,930,288 | 131 | 14,697,046 | 99.13% | View | |||| iPinYou Global RTB Bidding Algorithm Competition: http://contest.ipinyou.com |
| 33 | IPinyou-Click-Sums | 11,597 | 118 | 11,615 | 99.15% | Click | |||| iPinYou Global RTB Bidding Algorithm Competition: http://contest.ipinyou.com |
| 34 | Phishing-website | | | 11,055 | | | | | || An Assessment of Features Related to Phishing Websites using an Automated Technique. |
| 35 | Retailrocket-View | 1,404,179 | 234,838 | 2,664,312 | 99.99% | Click || | | | Kaggle: https://www.kaggle.com/retailrocket/ecommerce-dataset |
| 36 | Retailrocket-Addtocart | 37,722 | 23,903 | 69,332 | 99.99% | Click || | | | Kaggle: https://www.kaggle.com/retailrocket/ecommerce-dataset |
| 37 | Retailrocket-Transaction | 11,719 | 12,025 | 22,457 | 99.98% | Click || | | | Kaggle: https://www.kaggle.com/retailrocket/ecommerce-dataset |
| 38 | Ta Feng | 32,266 | 23,812 | 817,741 | 99.89% | Click ||||| Kaggle: https://www.kaggle.com/chiranjivdas09/ta-feng-grocery-dataset |
| 39 | Jester | 73,421 | 101 | 4,136,360 | 44.22% | rating<br>[-10, 10] | | | | | [Eigentaste: A Constant Time Collaborative Filtering Algorithm](http://www.ieor.berkeley.edu/~goldberg/pubs/eigentaste.pdf) |
| 40 | douban | 738,701 | 28 | 2,125,056 | 89.73% | rating<br>[0,5] || | || Kaggle: https://www.kaggle.com/utmhikari/doubanmovieshortcomments |
| 41 | KDD2010-bridge-algebra2006_2007 | 1840 | 584946 | 2289726 | 99.79% | rating<br>[0,1] | | | || KDD Cup 2010: https://pslcdatashop.web.cmu.edu/KDDCup/downloads.jsp |
| 42 | KDD2010-bridge-algebra2008_2009 | 3310 | 1259272 | 8918054 | 99.79% | rating<br/>[0,1] | | | || KDD Cup 2010: https://pslcdatashop.web.cmu.edu/KDDCup/downloads.jsp |
| 43 | KDD2010-bridge-bridge-to-algebra2006_2007 | 1146 | 208232 | 3686871 | 98.46% | rating<br/>[0,1] | | | || KDD Cup 2010: https://pslcdatashop.web.cmu.edu/KDDCup/downloads.jsp |
| SN | Dataset | \#User | \#Item | \#Inteaction | Sparsity | Interaction Type | TimeStamp | User Context | Item Context | Interaction Context |
|----|-------------------|-----------|-----------|--------------|----------|----------------------------|-----------|--------------|--------------|---------------------|
| 1 | [MovieLens](https://github.com/RUCAIBox/RecommenderSystems-Datasets/dataset_info/Movielens) | \- | \- | \- | \- | Rating |||| |
| 2 | Anime | 73,515 | 11,200 | 7,813,737 | 99\.05% | Rating <br> \[\-1, 1\-10\] | | || |
| 3 | Epinions | 116,260 | 41,269 | 188,478 | 99\.99% | Rating <br> \[1\-5\] || | ||
| 4 | Yelp | 1,968,703 | 209,393 | 8,021,122 | 99\.99% | Rating <br> \[1\-5\] |||||
| 5 | Netflix | 480,189 | 17,770 | 100,480,507 | 98\.82% | Rating <br> \[1\-5\] || | | |
| 6 | Book\-crossing | 105,284 | 340,557 | 1,149,780 | 99\.99% | Rating <br> \[0\-10\] | ||| |
| 7 | Jester | 73,421 | 101 | 4,136,360 | 44\.22% | Rating <br> \[\-10, 10\] | | | | |
| 8 | Douban | 738,701 | 28 | 2,125,056 | 89\.73% | Rating <br> \[0,5\] || | ||
| 9 | Yahoo Music | 1,948,882 | 98,211 | 11,557,943 | 99\.99% | Rating <br> \[0, 100\] | | || |
| 10 | KDD2010 | \- | \- | \- | \- | Rating | | | ||
| 11 | Amazon | \- | \- | \- | \- | Rating || || |
| 12 | Pinterest | 55,187 | 9,911 | 1,445,622 | 99\.74% | \- | | | | |
| 13 | Gowalla | 107,092 | 1,280,969 | 3,981,334 | 99\.99% | Check-in || | ||
| 14 | LastFM | 1,892 | 17,632 | 92,834 | 99\.72% | Click | | | ||
| 15 | DIGINETICA | 600,684 | 184,047 | 993,483 | 99\.99% | Click || || |
| 16 | Steam | 2,567,538 | 32,135 | 7,793,069 | 99\.99% | Buy || |||
| 17 | Ta Feng | 32,266 | 23,812 | 817,741 | 99\.89% | Click |||||
| 18 | FourSquare | \- | \- | \- | \- | Check-in || || |
| 19 | Tmall | \- | \- | \- | \- | Click/Buy || | ||
| 20 | YOOCHOOSE | \- | \- | \- | \- | Click/Buy || | ||
| 21 | IPinyou | \- | \- | \- | \- | View/Click | ||||
| 22 | Retailrocket | \- | \- | \- | \- | view/Addtocart/Transaction || | | |
| 23 | LFM-1b | 120,322 | 3,123,496 | 65,133,026 | 99\.98% | Click |||||
| 24 | Criteo | \- | \- | 45,850,617 | \- | Click | | | ||
| 25 | Avazu | \- | \- | 40,428,967 | \- | Click <br> \[0, 1\] || | ||
| 26 | Phishing\-website | \- | \- | 11,055 | \- | | | | ||
| 27 | Adult | \- | \- | 32,561 | \- | income>=50k <br> \[0, 1\] | | | ||


KG-aware推荐数据集
Expand Down
Empty file added dataset_info/IPinyou/README.md
Empty file.
8 changes: 8 additions & 0 deletions dataset_info/MovieLens/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
## Movielens

| SN | Dataset | \#User | \#Item | \#Inteaction | Sparsity | Interaction Type | TimeStamp | User Context | Item Context | Interaction Context |
|----|----------|---------|--------|--------------|----------|-------------------------------------|-----------|--------------|--------------|---------------------|
| 1 | ml\-100k | 943 | 1,682 | 100,000 | 93\.70% | Rating <br> \[1\-5\] |||| |
| 2 | ml\-1m | 6,040 | 3,952 | 1,000,209 | 95\.81% | Rating <br> \[1\-5\] |||| |
| 3 | ml\-10m | 69,878 | 10,681 | 10,000,054 | 98\.69% | Rating <br> \[0\.5\-5\] half\-stars || || |
| 4 | ml\-20m | 138,493 | 27,278 | 20,000,263 | 99\.47% | Rating <br> \[0\.5\-5\] half\-stars || || |

0 comments on commit 2c0540d

Please sign in to comment.