您的位置:首页 > 其它

CalTech machiine learning, video 10 note(Neural Network)

2014-09-27 18:38 351 查看
15:15 2014-09-27

start CalTech machine learning, video 10

Neural Networks

15:16 2014-09-27

stochastic gradient descent

15:20 2014-09-27

NN(Neural Network) is easy to implemented

because the algorithm I'm going to introduce today

15:21 2014-09-27

outline:

* Stochastic gradient descent

* Neural network model

* Backpropagation algorithm

15:22 2014-09-27

GD == Gradient Descent

15:23 2014-09-27

logistic regression

15:25 2014-09-27

think of the average direction that you're 

going to descent along

15:28 2014-09-27

SGD == Stochastic Gradient Descent

// randomized gradientdescent

15:30 2014-09-27

this is an error surface, it's the typical

error surface that you encount

15:33 2014-09-27

benefits of SGD:

* cheaper computation

* randomization

* simple

15:36 2014-09-27

learning rate

15:37 2014-09-27

biological function => biological structure

15:50 2014-09-27

biological inspirations

15:51 2014-09-27

PDE == Partial Differential Equation

15:59 2014-09-27

creating layers of perceptrons

16:04 2014-09-27

the multilayer perceptron

16:07 2014-09-27

so this multilayer perceptron implements a thing

which single perceptron fails in.

16:08 2014-09-27

feedforward

16:09 2014-09-27

that's what the neural network do, that's the 

only thing they do. 

16:14 2014-09-27

they have a way to get that solution.

16:15 2014-09-27

and the way they're going to do it is, instead of

get perceptrons which are hard threshold, they're

going to soften the threshold.

16:15 2014-09-27

not that they like soft threshold, but soften threshold

are more smoother,twice differentiable.

16:17 2014-09-27

the neural network

16:17 2014-09-27

each layer has a nonlinearity

16:17 2014-09-27

soft threshold

16:19 2014-09-27

the 1st column is the input x

16:19 2014-09-27

follow the rules of derivation from one 

layer to another

16:20 2014-09-27

the intermediate values, we're going to call

hidden layers, the user doesn't see them, they

put the input, 

16:20 2014-09-27

if you open the blackbox, you see there're layers

16:21 2014-09-27

the soft threshold, we're going to use the tanh

// hyperbolic tangent

16:23 2014-09-27

you can see now why we're using it.

16:24 2014-09-27

if your signal is very small, it looks as if

linear, if your signal is extreme large, it's 

as if hard threshold.

16:24 2014-09-27

how the network operates

16:25 2014-09-27

the parameters of neural networks are called w

weights, belong to any layer to any neural

16:26 2014-09-27

when you use the network, this is a recursive 

definition

16:35 2014-09-27

Apply x to input variable of the network

16:36 2014-09-27

after long iteration, you will end up with

16:37 2014-09-27

that is the entire operation of a neural network

16:37 2014-09-27

Applying SGD // Stochastic Gradient Descent

16:39 2014-09-27

all the weigths w determines h(x) // final hypothesis

16:40 2014-09-27

by definition, I have some error

16:41 2014-09-27

Error on example (xn, yn) is e(h(xn), yn)

16:41 2014-09-27

h is determined by w, which is the active 

quantity when we're learning

16:42 2014-09-27

to implement SGD, we need the gradient vector,

16:43 2014-09-27

it makes a bigger difference when you find

an efficient algorithm to do something

16:44 2014-09-27

FFT == Fast Fourier Transform

16:45 2014-09-27

that simple factor makes the field enormously

active, just by that algorithm(FFT)

16:47 2014-09-27

backpropagration algorithm

16:47 2014-09-27

let take part of the neural network

16:48 2014-09-27

feeding through some weight into this guy

16:48 2014-09-27

contributing to the signal going into next guy

16:48 2014-09-27

signal goes into next nonlinearity to the output

16:49 2014-09-27

we can evaluate every partial derivative of e(w)

for every weight

16:49 2014-09-27

we can do this analytically, there is nothing mysterious

16:50 2014-09-27

propagating backward since the name backpropagation

16:59 2014-09-27

backpropagate δto get other δs, 

this is the essence of the algorithm // backpropagation

17:06 2014-09-27

so I'm going to apply the chain rule again

17:08 2014-09-27

backward propagation algorithm

17:13 2014-09-27

initialize all weights at random

17:19 2014-09-27

final remark: hidden layer

17:20 2014-09-27

if you think what the hidden layers do,

they're just doing a nonlinear transform.

17:21 2014-09-27

learned nonlinear transform

17:22 2014-09-27

these are learned features

17:22 2014-09-27

don't look at the data before you choose

the transform, the network is looking at the 

data all it wants, it's actually just adjusting

the way to get the proper transform

17:25 2014-09-27

I already chage the data for the proper VC

17:25 2014-09-27

can I interprete what the hidden layers are doing?
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: