caffe 在已有模型上继续训练
2015-08-19 15:20
302 查看
一、
caffe 支持在别人的模型上继续训练。 下面是给的例子
caffe-master0818\examples\imagenet\resume_training.sh
二 ,caffe同样支持多次降学习率训练
比如 caffe-master0818\examples\cifar10\train_full.sh
另一个模型训练
这样就不用每次都手动降学习率了。
对于大的模型来说,多次降学习率还是很重要的, 实验结果表明, 当第一次学习率 不再下降的时候,再次降学习率。能够进一步降低损失函数
学习率太跳不到最低点,太小跳不出局部最优点, 所以刚开始学习率要大一些, 防止进入局部最优点。
-------------------------------------------------------------------------------------------------------------------------------------------
三、利用配置文件,配置
当然降学习率也可以通过配置文件配置。
http://caffe.berkeleyvision.org/tutorial/solver.html caffe 官方例子
To use a learning rate policy like this, you can put the following lines somewhere in your solver prototxt file:
Under the above settings, we’ll always use
We’ll begin training at a
the first 100,000 iterations, then multiply the learning rate by
and train at α′=αγ=(0.01)(0.1)=0.001=10−3 for
iterations 100K-200K, then at α′′=10−4 for
iterations 200K-300K, and finally train until iteration 350K (since we have
Note that the momentum setting μ effectively
multiplies the size of your updates by a factor of 11−μ after
many iterations of training, so if you increase μ,
it may be a good idea to decrease αaccordingly
(and vice versa).
For example, with μ=0.9,
we have an effective update size multiplier of 11−0.9=10.
If we increased the momentum to μ=0.99,
we’ve increased our update size multiplier to 100, so we should drop α (
by a factor of 10.
Note also that the above settings are merely guidelines, and they’re definitely not guaranteed to be optimal (or even work at all!) in every situation. If learning diverges (e.g., you start to see very large or
values or outputs), try dropping the
that works.
[1] A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet
Classification with Deep Convolutional Neural Networks. Advances in Neural
Information Processing Systems, 2012.
caffe 支持在别人的模型上继续训练。 下面是给的例子
caffe-master0818\examples\imagenet\resume_training.sh
#!/usr/bin/env sh ./build/tools/caffe train \ --solver=models/bvlc_reference_caffenet/solver.prototxt \ --snapshot=models/bvlc_reference_caffenet/caffenet_train_10000.solverstate.h5
二 ,caffe同样支持多次降学习率训练
比如 caffe-master0818\examples\cifar10\train_full.sh
#!/usr/bin/env sh TOOLS=./build/tools $TOOLS/caffe train \ --solver=examples/cifar10/cifar10_full_solver.prototxt # reduce learning rate by factor of 10 $TOOLS/caffe train \ --solver=examples/cifar10/cifar10_full_solver_lr1.prototxt \ // 这里学习率了lr1配置文件 --snapshot=examples/cifar10/cifar10_full_iter_60000.solverstate.h5 # reduce learning rate by factor of 10 $TOOLS/caffe train \ --solver=examples/cifar10/cifar10_full_solver_lr2.prototxt \ // <span style="font-family: Arial, Helvetica, sans-serif;"> 这里学习率了lr1配置文件</span> --snapshot=examples/cifar10/cifar10_full_iter_65000.solverstate.h5
另一个模型训练
#!/usr/bin/env sh TOOLS=./build/tools $TOOLS/caffe train \ --solver=examples/cifar10/cifar10_quick_solver.prototxt # reduce learning rate by factor of 10 after 8 epochs $TOOLS/caffe train \ --solver=examples/cifar10/cifar10_quick_solver_lr1.prototxt \ --snapshot=examples/cifar10/cifar10_quick_iter_4000.solverstate.h5
这样就不用每次都手动降学习率了。
对于大的模型来说,多次降学习率还是很重要的, 实验结果表明, 当第一次学习率 不再下降的时候,再次降学习率。能够进一步降低损失函数
学习率太跳不到最低点,太小跳不出局部最优点, 所以刚开始学习率要大一些, 防止进入局部最优点。
-------------------------------------------------------------------------------------------------------------------------------------------
三、利用配置文件,配置
当然降学习率也可以通过配置文件配置。
http://caffe.berkeleyvision.org/tutorial/solver.html caffe 官方例子
To use a learning rate policy like this, you can put the following lines somewhere in your solver prototxt file:
base_lr: 0.01 # begin training at a learning rate of 0.01 = 1e-2 lr_policy: "step" # learning rate policy: drop the learning rate in "steps" # by a factor of gamma every stepsize iterations gamma: 0.1 # drop the learning rate by a factor of 10 # (i.e., multiply it by a factor of gamma = 0.1) stepsize: 100000 # drop the learning rate every 100K iterations max_iter: 350000 # train for 350K iterations total momentum: 0.9
Under the above settings, we’ll always use
momentumμ=0.9.
We’ll begin training at a
base_lrof α=0.01=10−2 for
the first 100,000 iterations, then multiply the learning rate by
gamma(γ)
and train at α′=αγ=(0.01)(0.1)=0.001=10−3 for
iterations 100K-200K, then at α′′=10−4 for
iterations 200K-300K, and finally train until iteration 350K (since we have
max_iter: 350000) at α′′′=10−5.
Note that the momentum setting μ effectively
multiplies the size of your updates by a factor of 11−μ after
many iterations of training, so if you increase μ,
it may be a good idea to decrease αaccordingly
(and vice versa).
For example, with μ=0.9,
we have an effective update size multiplier of 11−0.9=10.
If we increased the momentum to μ=0.99,
we’ve increased our update size multiplier to 100, so we should drop α (
base_lr)
by a factor of 10.
Note also that the above settings are merely guidelines, and they’re definitely not guaranteed to be optimal (or even work at all!) in every situation. If learning diverges (e.g., you start to see very large or
NaNor
infloss
values or outputs), try dropping the
base_lr(e.g.,
base_lr: 0.001) and re-training, repeating this until you find a
base_lrvalue
that works.
[1] A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet
Classification with Deep Convolutional Neural Networks. Advances in Neural
Information Processing Systems, 2012.
相关文章推荐
- jquery选中多个标签、选中多个class的标签
- win10开始菜单有哪些样式?win10设置开始菜单的详细方法
- HDU 4715 Difference Between Primes
- js监视移动设备屏幕翻转事件
- js实现文本框只允许输入数字并限制数字大小的方法
- Gson解析json数组
- CloudNotes之桌面客户端篇:笔记撰写样式的支持
- js观察者模式
- css优先机制
- 初学node---1
- AngularJS 从入门到精通(过滤器(Filter) )
- js中的json语法
- 利用grunt+browserify预编译js模板文件,支持commonJS加载
- JS提交对象数组到服务端的方法总结(C#实例)
- 采用SharedPreferences保存用户偏好设置参数
- 贴近用户体验的jQuery日期选择插件
- 前端的最后是逻辑和数学
- arcgis api for javascript 图层控制
- javascript使用正则表达式格式化货币,金额
- Jquery基础教程之DOM操作