The article "Content-Based Recommendation Systems" presents several aspects regarding content-based systems. The reading can be separated in two parts: item representation and user profiling. As for item representation, the author first presented structured data for item characterization. Then it was compared to unstructured data in terms of the different approaches needed to represent the item, their feedbacks, and their difficulty. Considerations for treating unstructured data was also addresed. As an example, the authors discussed how polysemous word and synonyms make document representations a challenging task when characterizing it by word frequency. As for user profiling, different methods were presented. Decision trees, rule induction, K-NN, linear classifiers (SVM), probablistic methods and naive bayes are some examples Finally, limitations and extensions of content-based recommender systems were addresed.
This article is a presentation of content-based recommendation systems and its state-of-the-art. What I disliked about the reading is its emphasis on text classification being the main topic content-based RecSys. First, structured data and unstructured data were presented as different sources of information to represent items. That same section was unbalanced towards unstructured data as it was more discussed than structured data. Unstructured data was treated as text classification/topic extraction of an item plain-text description. That in the first place is not a good generalization as it only applies to items that have text descriptions. Afterwards, when presenting user profiling, the techniques showed were biased towards unstructured text as data for model training. Especially on the sections: Relevance Feedback and Rocchio's Algorithm, Linear Classifiers and Probabilistic Methods and Naive Bayes. This sections correspond to a large portion of the article. The problem with this inclination towards unstructured text is that its field of study is different from content-based RecSys. It deviates the focus to another topic and in order to be thorough with it, it is necessary to draw forms of analysis from other fields such as document retrieval. If the goal is to deliver a discussion of content-based recommender systems, deviating from it should be avoided.
As an extension of content-based recommender systems, the combination of content and collaborative information was presented as an alternative for a recommendation system. We've seen that collaborative filtering and other methods have had great results and given the fact that it is possible to combine content information with this methods it may be possible to improve the quality of reccomendations in various aspects. There are some domains in which content-based rules are the most straightforward way to lead to recommendations. For instance, in music recommendation it is direct to recommend songs and albums from the same artist or genres the user has listened to before. I see content-based techniques as an aid or supplement to other methods that have proved great results.