最近把数据,算法,产品相关的东西,做了个精简缩影版(总结缩减版),和大家分享下
内外兼修,内功是基础,招式是提升,反反复复,小步慢跑,快速迭代
书籍推荐:
数据挖掘概念与技术
数据挖掘原理 http://book.douban.com/subject/1103515/
神书
数据挖掘导论
机器学习 CMU
数据挖掘:实用机器学习工具与技术(英文版·第3版)
数据挖掘:概念、模型、方法和算法
Pattern Recognition And Machine Learning
集体智慧编程
模式分类
数据挖掘算法 paper:
[1]Agrawal R, Srikant R (1994) Fast algorithms for mining
association rules. In: Proceedings of the 20th VLDB conference, pp
487–499.
[2]Breiman L, Friedman JH, Olshen RA, Stone CJ.Classification
and regression trees. Wadsworth,Belmont
[3]Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood
from incomplete data via the EM algorithm (with discussion). J Roy
Stat Soc B 39:1–38
[4]Langville AN, Meyer CD (2006) Google’s PageRank and beyond:
the science of search engine rankings. Princeton University Press,
Princeton
[5]Pei, Jian; Han, Jiawei; and Lakshmanan, Laks V. S.; Mining
frequent itemsets with convertible constraints, in Proceedings of
the 17th International Conference on Data Engineering, April 2–6,
2001, Heidelberg, Germany, 2001, pages 433-442.
[6]MacQueen, J. B. (1967). Some Methods for classification and
Analysis of Multivariate Observations. Proceedings of 5th Berkeley
Symposium on Mathematical Statistics and Probability. University of
California Press. pp. 281–297.
[7]Quinlan JR (1979) Discovering rules by induction from large
collections of examples. In: Michie D (ed),Expert systems in the
micro electronic age. Edinburgh University Press,Edinburgh
[8]Quinlan JR (1993) C4.5: Programs for machine learning.
Morgan Kaufmann Publishers, San Mateo
[9]Hosmer, David W.; Lemeshow, Stanley (2000). Applied
Logistic Regression (2nd ed.)
[10]Rish, Irina. An empirical study of the naive Bayes
classifier. IJCAI 2001 Workshop on Empirical Methods in Artificial
Intelligence.
[11] Ho, Tin Kam . Random Decision Forest. Proceedings of the
3rd International Conference on Document Analysis and Recognition,
Montreal, QC, 14–16 August 1995. pp. 278–282.
[12]Friedman, J. H. Greedy Function Approximation: A Gradient
Boosting Machine.
[13]Friedman, J. H. Stochastic Gradient Boosting.
[14]Jerry Ye, Jyh-Herng Chow, Jiang Chen, Zhaohui Zheng.
Stochastic Gradient Boosted Distributed Decision Trees.2009.
[15]Fix E, Hodges JL, Jr (1951) Discriminatory analysis,
nonparametric discrimination. USAF School of Aviation Medicine,
Randolph Field, Tex., Project 21-49-004, Rept. 4, Contract
AF41(128)-31, February 1951
机器学习/数据挖掘 Tools:
统计学习工具/模型原型 R SAS Clementine
支持向量机学习包 libsvm
大数据量线性机器学习分类预测包 liblinear linear-svm&LR
hadoop机器学习包 mahout
数据仓库/ETL hive/hql sql
云计算
hadoop
数据爬取/转换/模型原型开发 python php
-------------------------------------------------------------
-------数据挖掘应用之一--------------------------------------
推荐算法 paper:
[1]Sarwar, B., Karypis, G., Konstan, J., & Riedl, J.
(2001). Item-Based Collaborative Filtering Recommendation
Algorithms. Proceedings of the 10th International Conference on
World Wide Web (pp. 285-295). Hong Kong: ACM.
[2]Jiahui Liu and Elin Pedersen and Peter Dolan.Personalized
News Recommendation Based on Click Behavior.ace2010 International
Conference on Intelligent User Interfs.
[3]Y. Koren. Collaborative Filtering with Temporal Dynamics.
In KDD, 2009.
[4] Yi Ding , Xue Li, Time weight collaborative filtering,
Proceedings of the 14th ACM international conference on Information
and knowledge management, October 31-November 05, 2005, Bremen,
Germany.
[5] Shumeet Baluja , Rohan Seth , D. Sivakumar , Yushi Jing ,
Jay Yagnik , Shankar Kumar , Deepak Ravichandran , Mohamed Aly,
Video suggestion and discovery for youtube: taking random walks
through the view graph, Proceeding of the 17th international
conference on World Wide Web, April 21-25, 2008, Beijing,
China.
[6]Azarias Reda, Yubin Park, Mitul Tiwari, Christian Posse,
and Sam Shah. Metaphor: A System for Related Search
Recommendations.In the 21st International Conference on Information
and Knowledge Management (CIKM 2012).
[7]James Davidson , Benjamin Liebald , Junning Liu , Palash
Nandy , Taylor Van Vleet , Ullas Gargi , Sujoy Gupta , Yu He , Mike
Lambert , Blake Livingston , Dasarathi Sampath, The YouTube video
recommendation system, Proceedings of the fourth ACM conference on
Recommender systems, September 26-30, 2010, Barcelona, Spain.
[8]Abhinandan S. Das , Mayur Datar , Ashutosh Garg , Shyam
Rajaram, Google news personalization: scalable online collaborative
filtering, Proceedings of the 16th international conference on
World Wide Web, May 08-12, 2007, Banff, Alberta, Canada.
[9]Robert M. Bell , Yehuda Koren, Lessons from the Netflix
prize challenge, ACM SIGKDD Explorations Newsletter, v.9 n.2,
December 2007.
[10]Noam Koenigstein , Nir Nice , Ulrich Paquet , Nir
Schleyen, The Xbox recommender system, Proceedings of the sixth ACM
conference on Recommender systems, September 09-13, 2012, Dublin,
Ireland.
[11]G. Adomavicius, A. Tuzhilin, Toward the next generation of
recommender systems: a survey of the state-of-the-art and possible
extensions, IEEE Transactions on Knowledge and Data Engineering 17
(2005) 734-749.
[12]T. Zhou, J. Ren, M. Medo, Y.-C. Zhang, Bipartite network
projection and personal recommendation, Physical Review E 76 (2007)
046115.
[13] T. Zhou, Z. Kuscsik, J.-G. Liu, M. Medo, J.R. Wakeling,
Y.-C. Zhang, Solving the apparent diversity–accuracy dilemma of
recommender systems, Proceedings of the National Academy of
Sciences of the United States of America 107 (2010)
4511-4515.
[14]Steffen Rendle (2012): Factorization Machines with libFM,
to appear in ACM Trans. Intell. Syst. Technol., 3(3), May.
[15]Daniel Lemire, Anna Maclachlan. Slope One Predictors for
Online Rating-Based Collaborative Filtering.
数据存储 paper:
[1]Giuseppe DeCandia, Deniz Hastorun, Madan Jampani,
Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan
Sivasubramanian, Peter Vosshall, Werner Vogels. Dynamo: amazon's
highly available key-value store.Proceedings of twenty-first ACM
SIGOPS symposium on Operating systems principles.
[2]Fay Chang, Jeffrey Dean, Sanjay Ghemawat,Wilson Hsieh,
Deborah Wallach, Mike Burrows,
Tushar Chandra, Andrew Fikes, and Robert Gruber. Bigtable: A
Distributed Storage System for
Structured Data. In Proceedings of the 7th USENIX Symposium on
Operating Systems Design and Implementation (OSDI ’06), Berkeley,
CA, USA, 2006.
[3]Roshan Sumbaly, Jay Kreps, Alex Feinberg, Lei Gao, and Sam
Shah. Serving Large-Scale Batch Computed Data with Project
Voldemort.10th USENIX conference on File and Storage Technologies
(FAST 2012).
[4]Brian F. Cooper , Raghu Ramakrishnan , Utkarsh Srivastava ,
Adam Silberstein , Philip Bohannon , Hans-Arno Jacobsen , Nick Puz
, Daniel Weaver , Ramana Yerneni, PNUTS: Yahoo!'s hosted data
serving platform, Proceedings of the VLDB Endowment, v.1 n.2,
August 2008.
[5]Adam Silberstein, Jianjun Chen, David Lomax, Brad McMillan,
Masood Mortazavi, P. P. S. Narayan, Raghu Ramakrishnan, Rusty
Sears.PNUTS in Flight: Web-Scale Data Serving at Yahoo.
IEEE Internet Computing , Volume 16 Issue 1.
[6]Lamport, Leslie. Time, clocks, and the ordering of events
in a distributed system.
[7]Prince Mahajan, Lorenzo Alvisi, and Mike Dahlin.
Consistency, Availability, and Convergence. Technical Report (UTCS
TR-11-22)
数据存储 Tools:
Voldemort linkedin key-value存储
http://www.project-voldemort.com/voldemort/ i
love it
Redis http://redis.io/
hbase http://hbase.apache.org/
cassandra http://cassandra.apache.org/
hypertable http://hypertable.org/
Data product 数据产品
搜索优化
推荐引擎
广告投放
CRM(会员营销)
导购
数据统计宏观分析工具
社会化中小金融贷款(欢迎交流)
-----wish u good luck-------------
青春就应该这样绽放 游戏测试:三国时期谁是你最好的兄弟!! 你不得不信的星座秘密