CalTech machiine learning, video 10 note(Neural Network)
2014-09-27 18:38
351 查看
15:15 2014-09-27
start CalTech machine learning, video 10
Neural Networks
15:16 2014-09-27
stochastic gradient descent
15:20 2014-09-27
NN(Neural Network) is easy to implemented
because the algorithm I'm going to introduce today
15:21 2014-09-27
outline:
* Stochastic gradient descent
* Neural network model
* Backpropagation algorithm
15:22 2014-09-27
GD == Gradient Descent
15:23 2014-09-27
logistic regression
15:25 2014-09-27
think of the average direction that you're
going to descent along
15:28 2014-09-27
SGD == Stochastic Gradient Descent
// randomized gradientdescent
15:30 2014-09-27
this is an error surface, it's the typical
error surface that you encount
15:33 2014-09-27
benefits of SGD:
* cheaper computation
* randomization
* simple
15:36 2014-09-27
learning rate
15:37 2014-09-27
biological function => biological structure
15:50 2014-09-27
biological inspirations
15:51 2014-09-27
PDE == Partial Differential Equation
15:59 2014-09-27
creating layers of perceptrons
16:04 2014-09-27
the multilayer perceptron
16:07 2014-09-27
so this multilayer perceptron implements a thing
which single perceptron fails in.
16:08 2014-09-27
feedforward
16:09 2014-09-27
that's what the neural network do, that's the
only thing they do.
16:14 2014-09-27
they have a way to get that solution.
16:15 2014-09-27
and the way they're going to do it is, instead of
get perceptrons which are hard threshold, they're
going to soften the threshold.
16:15 2014-09-27
not that they like soft threshold, but soften threshold
are more smoother,twice differentiable.
16:17 2014-09-27
the neural network
16:17 2014-09-27
each layer has a nonlinearity
16:17 2014-09-27
soft threshold
16:19 2014-09-27
the 1st column is the input x
16:19 2014-09-27
follow the rules of derivation from one
layer to another
16:20 2014-09-27
the intermediate values, we're going to call
hidden layers, the user doesn't see them, they
put the input,
16:20 2014-09-27
if you open the blackbox, you see there're layers
16:21 2014-09-27
the soft threshold, we're going to use the tanh
// hyperbolic tangent
16:23 2014-09-27
you can see now why we're using it.
16:24 2014-09-27
if your signal is very small, it looks as if
linear, if your signal is extreme large, it's
as if hard threshold.
16:24 2014-09-27
how the network operates
16:25 2014-09-27
the parameters of neural networks are called w
weights, belong to any layer to any neural
16:26 2014-09-27
when you use the network, this is a recursive
definition
16:35 2014-09-27
Apply x to input variable of the network
16:36 2014-09-27
after long iteration, you will end up with
16:37 2014-09-27
that is the entire operation of a neural network
16:37 2014-09-27
Applying SGD // Stochastic Gradient Descent
16:39 2014-09-27
all the weigths w determines h(x) // final hypothesis
16:40 2014-09-27
by definition, I have some error
16:41 2014-09-27
Error on example (xn, yn) is e(h(xn), yn)
16:41 2014-09-27
h is determined by w, which is the active
quantity when we're learning
16:42 2014-09-27
to implement SGD, we need the gradient vector,
16:43 2014-09-27
it makes a bigger difference when you find
an efficient algorithm to do something
16:44 2014-09-27
FFT == Fast Fourier Transform
16:45 2014-09-27
that simple factor makes the field enormously
active, just by that algorithm(FFT)
16:47 2014-09-27
backpropagration algorithm
16:47 2014-09-27
let take part of the neural network
16:48 2014-09-27
feeding through some weight into this guy
16:48 2014-09-27
contributing to the signal going into next guy
16:48 2014-09-27
signal goes into next nonlinearity to the output
16:49 2014-09-27
we can evaluate every partial derivative of e(w)
for every weight
16:49 2014-09-27
we can do this analytically, there is nothing mysterious
16:50 2014-09-27
propagating backward since the name backpropagation
16:59 2014-09-27
backpropagate δto get other δs,
this is the essence of the algorithm // backpropagation
17:06 2014-09-27
so I'm going to apply the chain rule again
17:08 2014-09-27
backward propagation algorithm
17:13 2014-09-27
initialize all weights at random
17:19 2014-09-27
final remark: hidden layer
17:20 2014-09-27
if you think what the hidden layers do,
they're just doing a nonlinear transform.
17:21 2014-09-27
learned nonlinear transform
17:22 2014-09-27
these are learned features
17:22 2014-09-27
don't look at the data before you choose
the transform, the network is looking at the
data all it wants, it's actually just adjusting
the way to get the proper transform
17:25 2014-09-27
I already chage the data for the proper VC
17:25 2014-09-27
can I interprete what the hidden layers are doing?
start CalTech machine learning, video 10
Neural Networks
15:16 2014-09-27
stochastic gradient descent
15:20 2014-09-27
NN(Neural Network) is easy to implemented
because the algorithm I'm going to introduce today
15:21 2014-09-27
outline:
* Stochastic gradient descent
* Neural network model
* Backpropagation algorithm
15:22 2014-09-27
GD == Gradient Descent
15:23 2014-09-27
logistic regression
15:25 2014-09-27
think of the average direction that you're
going to descent along
15:28 2014-09-27
SGD == Stochastic Gradient Descent
// randomized gradientdescent
15:30 2014-09-27
this is an error surface, it's the typical
error surface that you encount
15:33 2014-09-27
benefits of SGD:
* cheaper computation
* randomization
* simple
15:36 2014-09-27
learning rate
15:37 2014-09-27
biological function => biological structure
15:50 2014-09-27
biological inspirations
15:51 2014-09-27
PDE == Partial Differential Equation
15:59 2014-09-27
creating layers of perceptrons
16:04 2014-09-27
the multilayer perceptron
16:07 2014-09-27
so this multilayer perceptron implements a thing
which single perceptron fails in.
16:08 2014-09-27
feedforward
16:09 2014-09-27
that's what the neural network do, that's the
only thing they do.
16:14 2014-09-27
they have a way to get that solution.
16:15 2014-09-27
and the way they're going to do it is, instead of
get perceptrons which are hard threshold, they're
going to soften the threshold.
16:15 2014-09-27
not that they like soft threshold, but soften threshold
are more smoother,twice differentiable.
16:17 2014-09-27
the neural network
16:17 2014-09-27
each layer has a nonlinearity
16:17 2014-09-27
soft threshold
16:19 2014-09-27
the 1st column is the input x
16:19 2014-09-27
follow the rules of derivation from one
layer to another
16:20 2014-09-27
the intermediate values, we're going to call
hidden layers, the user doesn't see them, they
put the input,
16:20 2014-09-27
if you open the blackbox, you see there're layers
16:21 2014-09-27
the soft threshold, we're going to use the tanh
// hyperbolic tangent
16:23 2014-09-27
you can see now why we're using it.
16:24 2014-09-27
if your signal is very small, it looks as if
linear, if your signal is extreme large, it's
as if hard threshold.
16:24 2014-09-27
how the network operates
16:25 2014-09-27
the parameters of neural networks are called w
weights, belong to any layer to any neural
16:26 2014-09-27
when you use the network, this is a recursive
definition
16:35 2014-09-27
Apply x to input variable of the network
16:36 2014-09-27
after long iteration, you will end up with
16:37 2014-09-27
that is the entire operation of a neural network
16:37 2014-09-27
Applying SGD // Stochastic Gradient Descent
16:39 2014-09-27
all the weigths w determines h(x) // final hypothesis
16:40 2014-09-27
by definition, I have some error
16:41 2014-09-27
Error on example (xn, yn) is e(h(xn), yn)
16:41 2014-09-27
h is determined by w, which is the active
quantity when we're learning
16:42 2014-09-27
to implement SGD, we need the gradient vector,
16:43 2014-09-27
it makes a bigger difference when you find
an efficient algorithm to do something
16:44 2014-09-27
FFT == Fast Fourier Transform
16:45 2014-09-27
that simple factor makes the field enormously
active, just by that algorithm(FFT)
16:47 2014-09-27
backpropagration algorithm
16:47 2014-09-27
let take part of the neural network
16:48 2014-09-27
feeding through some weight into this guy
16:48 2014-09-27
contributing to the signal going into next guy
16:48 2014-09-27
signal goes into next nonlinearity to the output
16:49 2014-09-27
we can evaluate every partial derivative of e(w)
for every weight
16:49 2014-09-27
we can do this analytically, there is nothing mysterious
16:50 2014-09-27
propagating backward since the name backpropagation
16:59 2014-09-27
backpropagate δto get other δs,
this is the essence of the algorithm // backpropagation
17:06 2014-09-27
so I'm going to apply the chain rule again
17:08 2014-09-27
backward propagation algorithm
17:13 2014-09-27
initialize all weights at random
17:19 2014-09-27
final remark: hidden layer
17:20 2014-09-27
if you think what the hidden layers do,
they're just doing a nonlinear transform.
17:21 2014-09-27
learned nonlinear transform
17:22 2014-09-27
these are learned features
17:22 2014-09-27
don't look at the data before you choose
the transform, the network is looking at the
data all it wants, it's actually just adjusting
the way to get the proper transform
17:25 2014-09-27
I already chage the data for the proper VC
17:25 2014-09-27
can I interprete what the hidden layers are doing?
相关文章推荐
- review CalTech machine learning, video 10 note(Neural Network)
- CalTech machine learning, video 14 note(Support Vector Machine)
- CalTech machine learning, video 7 note(the VC dimension)
- CalTech machine learning, video 13 note(validation)
- CalTech machine learning video 8 note(Bias-Variance Tradeoff)
- CalTech machine learning video 5 note , training vs. testing
- CalTech machine learning, video 9 note(the Linear Model II)
- CalTech machine learning, video 16 note(RBF)
- CalTech machine learning, video 6 note,theory of generalization
- CalTech machine learning, video 9 Review note(the Linear Model II)
- CalTech machine learning, video 12 note(regularization)
- CalTech machine learning, video 11 note(overfitting)
- CalTech machine learning, video 4(Error & Noise) note
- CalTech machine learning, video 15 note(Kernel Method)
- Spring v3.0.2 Learning Note 10 - Annotation-based Transaction Management
- Note for video Machine Learning and Data Mining——Linear Model
- Note for video Machine Learning and Data Mining——training vs Testing
- CalTech machine learning video I note
- Note for video Machine Learning and Data Mining——Theory of Generalization
- Note for video Machine Learning and Data Mining——error and noise