CalTech machiine learning, video 10 note(Neural Network)

start CalTech machine learning, video 10

Neural Networks

stochastic gradient descent

NN(Neural Network) is easy to implemented

because the algorithm I'm going to introduce today

* Stochastic gradient descent

* Neural network model

* Backpropagation algorithm

GD == Gradient Descent

logistic regression

think of the average direction that you're 

going to descent along

SGD == Stochastic Gradient Descent

// randomized gradientdescent

this is an error surface, it's the typical

error surface that you encount

benefits of SGD:

* cheaper computation

* randomization

* simple

learning rate

biological function => biological structure

biological inspirations

PDE == Partial Differential Equation

creating layers of perceptrons

the multilayer perceptron

so this multilayer perceptron implements a thing

which single perceptron fails in.

that's what the neural network do, that's the 

only thing they do. 

they have a way to get that solution.

and the way they're going to do it is, instead of

get perceptrons which are hard threshold, they're

going to soften the threshold.

not that they like soft threshold, but soften threshold

are more smoother,twice differentiable.

the neural network

each layer has a nonlinearity

soft threshold

the 1st column is the input x

follow the rules of derivation from one 

layer to another

the intermediate values, we're going to call

hidden layers, the user doesn't see them, they

put the input, 

if you open the blackbox, you see there're layers

the soft threshold, we're going to use the tanh

// hyperbolic tangent

you can see now why we're using it.

if your signal is very small, it looks as if

linear, if your signal is extreme large, it's 

as if hard threshold.

how the network operates

the parameters of neural networks are called w

weights, belong to any layer to any neural

when you use the network, this is a recursive 


Apply x to input variable of the network

after long iteration, you will end up with

that is the entire operation of a neural network

Applying SGD // Stochastic Gradient Descent

all the weigths w determines h(x) // final hypothesis

by definition, I have some error

Error on example (xn, yn) is e(h(xn), yn)

h is determined by w, which is the active 

quantity when we're learning

to implement SGD, we need the gradient vector,

it makes a bigger difference when you find

an efficient algorithm to do something

FFT == Fast Fourier Transform

that simple factor makes the field enormously

active, just by that algorithm(FFT)

backpropagration algorithm

let take part of the neural network

feeding through some weight into this guy

contributing to the signal going into next guy

signal goes into next nonlinearity to the output

we can evaluate every partial derivative of e(w)

for every weight

we can do this analytically, there is nothing mysterious

propagating backward since the name backpropagation

backpropagate δto get other δs, 

this is the essence of the algorithm // backpropagation

so I'm going to apply the chain rule again

backward propagation algorithm

initialize all weights at random

final remark: hidden layer

if you think what the hidden layers do,

they're just doing a nonlinear transform.

learned nonlinear transform

these are learned features

don't look at the data before you choose

the transform, the network is looking at the 

data all it wants, it's actually just adjusting

the way to get the proper transform

I already chage the data for the proper VC

can I interprete what the hidden layers are doing?
