推荐引擎：使用Mahout协同过滤

Mahout is a collection of machine learning algorithms intended to perform the following operations as recommendation (Collaborative Filtering), Clustering and Classification. Initially to implement recommendation we need an input data file where every line contains one record each. Each record should have the user ID, Item ID and preference value in order separated by comma.

Input File – input.txt

501,1002,5

501,1012,3

510,1002,2

515,1002,5

501,1020,1

…

The point to be considered here that we need the User Id and Item ID to be integers, alpha numeric characters won’t serve our purpose. Also the larger the input files better the quality of recommendations produced

Recommenders

Recommenders are broadly classified into two categories based on the method or approach they use in generating recommendations

1. User Based Recommendations

Recommendations are derived from how similar items are to items, ie based on the items a user has already more similar items are recommended

2. Item Based Recommendations

Recommendations are derived on how similar users to users are. ie to make recommendations for a user(User1) we take into account an user/users who shares similar tastes and based on the items they possess we recommend items to User1

When we make mahout recommendations the key components involved are

Data Model

It is an encapsulation used by Mahout to hold input data. It helps efficient access to data by various recommender algorithms.

Similarity Algorithm

There are various kind of Similarity algorithms available and mahout has implementations of all the popular ones like Person Correlation, Cosine Measure, Euclidean Distance, Log Likelihood, Tanimoto coefficient etc

User Neighborhood

This is applicable for user based recommendations, user based recommendations are made based on user to user similarity. We form a neighborhood of most similar users that share almost same tastes so that we get better recommendations. And the algorithms thet we use to select user neighborhood are

1. Nearest N User Neighborhood

Here we specify the neighborhood size, ie exactly the number of most similar uses to be considered for generating recommendations say 100,500 etc

2. Threshold User Neighborhood

We don’t specify the neighborhood size, rather we specify a similarity measure which is a value between -1 and +1. If we specify a value .7 then only the users that share a similarity greater than ).7 would be considered in neighborhood. Higher the value more similar the users are

Recommender

It is the final computing object which couples together the datamodel, similarity algorithm and neighborhood to generate recommendations based on the same

Samples code snippets to generate user and item based recommendations are given below

User Based Recommender

import java.io.File;

import java.io.IOException;

import java.util.List;

import org.apache.mahout.cf.taste.common.TasteException;

import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;

importorg.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;

importorg.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;

importorg.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;

import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;

import org.apache.mahout.cf.taste.recommender.RecommendedItem;

import org.apache.mahout.cf.taste.recommender.Recommender;

import org.apache.mahout.cf.taste.similarity.UserSimilarity;

public class UserRecommender {

public static void main(String args[])

{

// specifying the user id to which the recommendations have to be generated for

int userId=510;

//specifying the number of recommendations to be generated

int noOfRecommendations=5;

try

{

// Data model created to accept the input file

FileDataModel dataModel = new FileDataModel(newFile("D://input.txt"));

/*Specifies the Similarity algorithm*/

UserSimilarity userSimilarity = newPearsonCorrelationSimilarity(dataModel);

/*NearestNUserNeighborhood is preferred in situations where we need to have control on the exact no of neighbors*/

UserNeighborhood neighborhood =newNearestNUserNeighborhood(100, userSimilarity, dataModel);

/*Initalizing the recommender */

Recommender recommender =newGenericUserBasedRecommender(dataModel, neighborhood, userSimilarity);

//calling the recommend method to generate recommendations

List<RecommendedItem> recommendations =recommender.recommend(userId, noOfRecommendations);

for (RecommendedItem recommendedItem : recommendations)

System.out.println(recommendedItem.getItemID());

}

catch (IOException e) {

// TODO Auto-generated catch block

e.printStackTrace();

} catch (TasteException e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

Item Based Recommender

import java.io.File;

import java.io.IOException;

import java.util.List;

import org.apache.mahout.cf.taste.common.TasteException;

import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;

importorg.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender;

importorg.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;

importorg.apache.mahout.cf.taste.recommender.ItemBasedRecommender;

import org.apache.mahout.cf.taste.recommender.RecommendedItem;

import org.apache.mahout.cf.taste.similarity.ItemSimilarity;

public class ItemRecommender {

public static void main(String args[])

{

// specifying the user id to which the recommendations have to be generated for

int userId=510;

//specifying the number of recommendations to be generated

int noOfRecommendations=5;

try

{

// Data model created to accept the input file

FileDataModel dataModel = new FileDataModel(newFile("D://input.txt"));

/*Specifies the Similarity algorithm*/

ItemSimilarity itemSimilarity = newPearsonCorrelationSimilarity(dataModel);

/*Initalizing the recommender */

ItemBasedRecommender recommender =newGenericItemBasedRecommender(dataModel, itemSimilarity);

//calling the recommend method to generate recommendations

List<RecommendedItem> recommendations =recommender.recommend(userId, noOfRecommendations);

for (RecommendedItem recommendedItem : recommendations)

System.out.println(recommendedItem.getItemID());

}

catch (IOException e) {

// TODO Auto-generated catch block

e.printStackTrace();

} catch (TasteException e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

Note: To get some recommendations you a sufficiently large input file. A few lines of input won’t gain you any recommendations

参考：基于 Apache Mahout 构建社会化推荐引擎：http://www.ibm.com/developerworks/cn/java/j-lo-mahout/

标签 : mahout, tech, 数据挖掘

发表评论

IT瘾于2013年8月19日下午02时56分00秒发布 #

发表评论发送引用通报

Re: 推荐引擎：使用Mahout协同过滤 Anonymous于2026年3月16日下午02时18分18秒评论 #
标题
正文	HTML : b, strong, i, em, blockquote, br, p, pre, a href="", ul, ol, li, sub, sup
OpenID Login	(Not me?)
姓名
电子邮件
网站
记住我	是否
电邮地址不会公开在网页上，您留下的电子邮件仅用于本文有新评论时通知您（以后可以随时拿掉）。

推荐引擎：使用Mahout协同过滤

Re: 推荐引擎：使用Mahout协同过滤