使用implicit搭建实时推荐系统
- - 标点符Implicit是一个开源的系统过滤项目,其包含多种流行的推荐算法,主要应用场景是针对隐性反馈行为进行推荐. ALS(alternating least squares),最小交替二乘法. BRP(Bayesian Personalized Ranking),贝叶斯个性化排序. 使用Cosine, TFIDF 或 BM25的近邻模型.
Implicit是一个开源的系统过滤项目,其包含多种流行的推荐算法,主要应用场景是针对隐性反馈行为进行推荐。包含的算法主要有:
数据准备
Implicit输入需要使用的数据格式为user_id/item_id/rating,其中对于隐性评分的场景,可以根据具体情况进行设置,比如:
模型训练
import pandas as pd import numpy as np import scipy.sparse as sparse import implicit df = pd.read_csv("./data/user_visit.csv") df['user_label'], user_idx = pd.factorize(df['user_id ']) df['item_label'], item_idx = pd.factorize(df['item_id ']) sparse_item_user = sparse.csr_matrix((df['rating'].astype(float), (df['item_label'], df['user_label']))) sparse_user_item = sparse.csr_matrix((df['rating'].astype(float), (df['user_label'], df['item_label']))) model = implicit.als.AlternatingLeastSquares(factors=50, regularization=0.1, iterations=50) model.fit(sparse_item_user) data = { 'model.item_factors': model.item_factors, 'model.user_factors': model.user_factors, 'item_labels': item_idx, } als_model_file = "user_visit.npz" np.savez(als_model_file, **data)
注意:
模型使用
# 加载模型 data = np.load(als_model_file, allow_pickle=True) model = implicit.als.AlternatingLeastSquares(factors=data['model.item_factors'].shape[1]) model.item_factors = data['model.item_factors'] model.user_factors = data['model.user_factors'] model._YtY = model.item_factors.T.dot(model.item_factors) item_labels = data['item_labels'] # 基于酒店推荐: item_id= 1024 item_lable = list(item_labels).index(item_id) related = model.similar_items(item_lable, N=10) for item_lable, score in related: print(item_labels[item_lable], score) # 基于用户推荐 user_id = 10 user_label = list(user_idx).index(user_id) sparse_user_items = sparse_item_user.T.tocsr() recommendations = model.recommend(user_label, sparse_user_items) for item_id, score in recommendations: print(item_idx[item_id], score)
实时推荐
实时推荐的方案是使用离线模型结合实时行为进行推荐,而不是把整个模型部署到线上实时运行。中间主要区别是用户ID是不存在的,所以不能使用userid进行直接推荐。具体实现方式如下:
item_ids = [1024,2046] item_weights = [2,3] user_label = 0 user_items = None item_lb = item_lb = [list(item_labels).index(i) for i in item_ids] user_ll = [0] * len(item_ids) confidence = [10] * len(item_ids) if item_weights is None else item_weights user_items = sparse.csr_matrix((confidence, (user_ll, item_lb))) recommendations = model.recommend(user_label, user_items, N=10, recalculate_user=True) for item_id, score in recommendations: print(item_labels[item_id], score) #根据返回的结果,获取推荐理由: itemid = list(item_labels).index(2048) model.explain(user_label, user_items, itemid, user_weights=None, N=1)
参考资料:
Related posts: