Programming Collective Intelligence 读书总结

标签： 程序园 | 发表时间：2011-10-19 12:31 | 作者：崔添翼透明

出处：http://cuitianyi.com

Making Recommendations (Collaborative Filtering)

User-based

Finding similar users

User as vector based on item score

Euclidean distance
Pearson correlation

Reverse users and items, we can find similar items to a given item

Sort and recommend items based on

sum(user similarity * user’s item score) for each other user

Item-based

Find item similarities

These results can be cached and periodically updated

Sort and recommend items based on

sum((item similarity * user’s item score) / sum(item similarity)) for each user’s item

Significantly faster and better for sparse dataset

Discovering Groups (Clustering)

Supervised Learning

use example inputs and outputs
neural networks, decision trees, support-vector machines, and Bayesian filtering

Word Vectors of texts
Hierarchical Clustering

choose two nearest vectors to combine
results in binary tree

Can cluster articles or words

transpose the matrix

Dendrogram drawing
K-Means clustering

randomly place k centroids
assign every item to the nearest centroid, and move the centroid to the average location of all items assigned to them

Searching and Ranking

word index stored in relational database
ranking

content-based

various metrics: word frequency, document location, word distance

use inbound links

simple count
PageRank algorithm

random walk
sparse matrix multiplication iterations

use link text

learning from clicks

click-tracking neuro-network (multilayer perception network, i.e. MLP network)

one hidden layer

Optimization

stochastic optimization

numerical solution
cost function

random searching
hill climbing

increase the most promising dimension of a vector

simulated annealing

variable: temperature, starts very high and gradually gets lower
worse solution being accepted depending on temperature

generic algorithms

mutate, crossover, …

Document Filtering (to be expanded…)

use words as features
naive Bayesian classifier
the Fisher method

Modeling with Decision Trees

Algorithm: CART (Classification and Regression Trees)

choose the best split from all possible splits

Gini impurity
information entropy

sum of p(x)log(p(x))

recursively build the whole tree
then can be used to classify new observations
pruning the tree

when it becomes overfitted
checking pairs of nodes that have a common parent to see if merging them would increase the entropy by less than a specified threshold

Dealing with

missing data

use both branches

numerical outcomes

use variance instead of entropy

Building Price Models

k-nearest neighbors (kNN)

weighted
may need scaling or normalizing
to estimate the probability density

cross-validation

divide data into training sets and test sets

Advanced Classification: Kernel Methods and SVMs

basic linear classification

using dot-products to determine distance

kernel methods

define another dot-product == move the points into different space

support-vector machines

find the line that is as far away as possible from classes

Finding Independent Features

non-negative matrix factorization

factor the article-word matrix into two matrix

the features matrix: row for features, column for words
the weight matrix: row for articles, column for features

Evolving Intelligence

creating an algorithm that creating algorithms
mutation, crossover/breeding
use trees to represent algorithm to enable evolving

use to guess numerical functions or, game AI

Algorithm Summary

Supervised Learning

Bayesian Classifier
Decision Tree Classifier
Neural Networks
Support-Vector Machines

Unsupervised Learning

k-Nearest Neighbors
Clustering
Multidimensional Scaling
Non-Negative Matrix Factorization

Optimization

Programming Collective Intelligence 读书总结

- 透明 - 崔添翼 § 翼若垂天之云

assign every item to the nearest centroid, and move the centroid to the average location of all items assigned to them. checking pairs of nodes that have a common parent to see if merging them would increase the entropy by less than a specified threshold.

[论文阅读笔记]An Overview of Business Intelligence Technology

- Ian - IT·行·思·录

2011年8月这一期的CACM上有一篇“An Overview of Business Intelligence Technology”，总结了商业智能(Business Intelligence, BI)的运行组成部分和相关关键技术，对于理解整个商业智能的架构很有帮助. 这篇文章特别说明了一些BI领域在“大数据（big data）”时代面临的挑战和需要关注的技术，并对在内存处理、分布式、统计等比较流行和实用的技术的应用进行了介绍.

Functional Programming for Java Developers 讀書摘要

- - ihower { blogging }

這是我之前念 Functional Programming for Java Developers 一書的摘要記錄. 這本書很薄只有90頁，是一本蠻不錯的 Functional Programming 概念入門勸敗書. 近來 Functional Programming (函數式編程，以下簡稱FP) 的重要性提昇就是為了因應 Concurrency 的需求.

Programming Collective Intelligence 读书总结

相关 [programming collective intelligence] 推荐：

Programming Collective Intelligence 读书总结

[论文阅读笔记]An Overview of Business Intelligence Technology

Functional Programming for Java Developers 讀書摘要

相关文章

订阅