Deep Learning 基础知识

标签: 数据挖掘 | 发表时间:2013-01-22 12:21 | 作者:黄言之
出处:http://blog.sina.com.cn/netreview

Deep Learning是机器学习研究中的新领域,其目的是让机器学习更加接近人工智能。这里有两篇综述文档:

 

下面这篇文档则是关于Deep learning的最新介绍文档:

Yoshua Bengio, Learning Deep Architectures for AI, Foundations and Trends in Machine Learning, 2(1), 2009

 

Depth

The computations involved in producing an output from an input can be represented by a flow graph: a flow graph is a graph representing a computation, in which each node represents an elementary computation and a value (the result of the computation, applied to the values at the children of that node). Consider the set of computations allowed in each node and possible graph structures and this defines a family of functions. Input nodes have no children. Output nodes have no parents.

The flow graph for the expression sin(a^2+b/a) could be represented by a graph with two input nodes a and b , one node for the division b/a taking a and b as input (i.e. as children), one node for the square (taking only a as input), one node for the addition (whose value would be a^2+b/a) and taking as input the nodes a^2 and b/a , and finally one output node computing the sinus, and with a single input coming from the addition node.

A particular property of such flow graphs is depth: the length of the longest path from an input to an output.

Traditional feedforward neural networks can be considered to have depth equal to the number of layers (i.e. the number of hidden layers plus 1, for the output layer). Support Vector Machines (SVMs) have depth 2 (one for the kernel outputs or for the feature space, and one for the linear combination producing the output).

Motivations for Deep Architectures

The main motivations for studying learning algorithms for deep architectures are the following:

Insufficient depth can hurt

Depth 2 is enough in many cases (e.g. logical gates, formal [threshold] neurons, sigmoid-neurons, Radial Basis Function [RBF] units like in SVMs) to represent any function with a given target accuracy. But this may come with a price: that the required number of nodes in the graph (i.e. computations, and also number of parameters, when we try to learn the function) may grow very large. Theoretical results showed that there exist function families for which in fact the required number of nodes may grow exponentially with the input size. This has been shown for logical gates, formal neurons, and RBF units. In the latter case Hastad has shown families of functions which can be efficiently (compactly) represented with O(n) nodes (for n inputs) when depth is d , but for which an exponential number ( O(2^n)) of nodes is needed if depth is restricted to d-1 .

One can see a deep architecture as a kind of factorization. Most randomly chosen functions can’t be represented efficiently, whether with a deep or a shallow architecture. But many that can be represented efficiently with a deep architecture cannot be represented efficiently with a shallow one (see the polynomials example in the Bengio survey paper). The existence of a compact and deep representation indicates that some kind of structure exists in the underlying function to be represented. If there was no structure whatsoever, it would not be possible to generalize well.

The brain has a deep architecture

For example, the visual cortex is well-studied and shows a sequence of areas each of which contains a representation of the input, and signals flow from one to the next (there are also skip connections and at some level parallel paths, so the picture is more complex). Each level of this feature hierarchy represents the input at a different level of abstraction, with more abstract features further up in the hierarchy, defined in terms of the lower-level ones.

Note that representations in the brain are in between dense distributed and purely local: they are sparse: about 1% of neurons are active simultaneously in the brain. Given the huge number of neurons, this is still a very efficient (exponentially efficient) representation.

Cognitive processes seem deep

  • Humans organize their ideas and concepts hierarchically.
  • Humans first learn simpler concepts and then compose them to represent more abstract ones.
  • Engineers break-up solutions into multiple levels of abstraction and processing

It would be nice to learn / discover these concepts (knowledge engineering failed because of poor introspection?). Introspection of linguistically expressible concepts also suggests a sparse representation: only a small fraction of all possible words/concepts are applicable to a particular input (say a visual scene).

Breakthrough in Learning Deep Architectures

Before 2006, attempts at training deep architectures failed: training a deep supervised feedforward neural network tends to yield worse results (both in training and in test error) then shallow ones (with 1 or 2 hidden layers).

Three papers changed that in 2006, spearheaded by Hinton’s revolutionary work on Deep Belief Networks (DBNs):

The following key principles are found in all three papers:

  • Unsupervised learning of representations is used to (pre-)train each layer.
  • Unsupervised training of one layer at a time, on top of the previously trained ones. The representation learned at each level is the input for the next layer.
  • Use supervised training to fine-tune all the layers (in addition to one or more additional layers that are dedicated to producing predictions).

Deep Learning算法的核心思想:

  • 无监督学习用于每一层网络的(预)训练((pre-)train)
  • 每次用无监督学习训练一层,将其训练结果作为其下一层的输入
  • 用监督学习去调整所有层

 

The DBNs use RBMs for unsupervised learning of representation at each layer. The Bengio et al paper explores and compares RBMs and auto-encoders (neural network that predicts its input, through a bottleneck internal layer of representation). The Ranzato et al paper uses sparse auto-encoder (which is similar to sparse coding) in the context of a convolutional architecture. Auto-encoders and convolutional architectures will be covered later in the course.

Since 2006, a plethora of other papers on the subject of deep learning has been published, some of them exploiting other principles to guide training of intermediate representations. See Learning Deep Architectures for AI for a survey.

 

--------------------------

Deep Learning是多层次表示和抽象的学习,有助于理解图像、声音、文本等数据。要深入了解deep learning算法,可以看下面的示例:

Theano的使用手册

Theano是一个python库,能够更容易的写deep learning的模型,同时也提供了一些GPU上训练的选项。

建议先读 Theano basic tutorial,然后通读 Getting Started chapter(介绍了符号、算法中使用的数据集下载,以及我们使用的随机梯度下降法的优化算法)。

The purely supervised learning algorithms are meant to be read in order:

  1. Logistic Regression - using Theano for something simple
  2. Multilayer perceptron - introduction to layers
  3. Deep Convolutional Network - a simplified version of LeNet5

The unsupervised and semi-supervised learning algorithms can be read in any order (the auto-encoders can be read independently of the RBM/DBN thread):

Building towards including the mcRBM model, we have a new tutorial on sampling from energy models:

  • HMC Sampling - hybrid (aka Hamiltonian) Monte-Carlo sampling with scan()
Building towards including the Contractive auto-encoders tutorial, we have the code for now:

 

参考文献:

http://deeplearning.net/tutorial/

http://www.iro.umontreal.ca/~pift6266/H10/notes/deepintro.html


  青春就应该这样绽放   游戏测试:三国时期谁是你最好的兄弟!!   你不得不信的星座秘密

相关 [deep learning 基础] 推荐:

Deep Learning 基础知识

- - 互联网旁观者
Deep Learning是机器学习研究中的新领域,其目的是让机器学习更加接近人工智能. 下面这篇文档则是关于Deep learning的最新介绍文档:. A particular property of such flow graphs is depth: the length of the longest path from an input to an output..

关于深度学习——Deep Learning

- - 互联网旁观者
转载自: http://blog.csdn.net/abcjennifer/article/details/7826917. Deep Learning是机器学习中一个非常接近AI的领域,其动机在于建立、模拟人脑进行分析学习的神经网络,最近研究了机器学习中一些深度学习的相关知识,本文给出一些很有用的资料和心得.

【deep learning学习笔记】最近读的几个ppt(二)

- - CSDN博客推荐文章
Andrew Ng今年三月份在清华做的一篇报告,ppt,109页. 从图像识别,讲特征表示,如:月hi辆摩托车的图像,在电脑中表示为像素矩阵,不过在人脑中,由轮子、把手等各个组件组成. 进一步讲特征表示,如:NLP中,一句话,可以句法分析为一个树状表示,树的每一层,都是一种特征表示. 此外,词性标注、wordnet语义等都是特征.

Learning to Rank 简介

- - 博客园_首页
  去年实习时,因为项目需要,接触了一下Learning to Rank(以下简称L2R),感觉很有意思,也有很大的应用价值. L2R将机器学习的技术很好的应用到了排序中,并提出了一些新的理论和算法,不仅有效地解决了排序的问题,其中一些算法(比如LambdaRank)的思想非常新颖,可以在其他领域中进行借鉴.

Linkin Park - Rolling In The Deep(live中英字幕)

- Alen - 音动我心 – Mtime时光网

DPI (Deep Packet Inspection) 深度包检测技术 - 简书

- -
近年来,网络新业务层出不穷,有对等网络(Peer-to-Peer,简称 P2P)、VoIP、流媒体、Web TV、音视频聊天、互动在线游戏和虚拟现实等. 这些新业务的普及为运营商吸纳了大量的客户资源,同时也对网络的底层流量模型和上层应用模式产生了很大的冲击,带来带宽管理、内容计费、信息安全、舆论管控等一系列新的问题.

转载的一些machine learning的网站总结

- knighter - 增强视觉 | 计算机视觉 增强现实
转载自demonstrate 的 blog. 这里搜集了一些常见的和 machine learning 相关的网站,按照 topic 来分. http://www.gaussianprocess.org 包括相关的书籍(有 Carl Edward Rasmussen 的书),相关的程序以及分类的 paper 列表.

机器学习全球公开课-Machine Learning

- Sosi - 丕子
很久之前关于机器学习Machine Learning的公开课咱们就都开始看了,也都知道了,本科的时候就看过,是从itunes上下载的视频,当然是斯坦福的Professor Andrew Ng 讲的,不过看不懂,也听不大懂,哎,惭愧. 现在还好多了,拿来看看理解下,不过我还是基本没看过这个视频. 现在也有带有字幕的了,像是网易公开课的,中英文字幕很给力啊,都三集了,不过这里的视频当然不知道是否是合法的,但是不管怎么样,肯定给学习的人有了很大的启蒙和帮助.

新学期第2周:Pragmatic Thinking and Learning 笔记

- 透明 - 崔添翼 § 翼若垂天之云
以下是我在阅读 Pragmatic Thinking and Learning: Refactoring Your Wetware (中文版为程序员的思维修炼)时做的英文笔记,未经整理. Everything is interconnected in nonlinear systems (small things can have unexpectedly large effects), and you’re part of it even you don’t know.

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

- -