您的位置:首页 > 其它

深度学习实验记录

2017-10-03 11:46 288 查看
#**IMPORTANT**
#Pleasenotethatthislearningratescheduleisheavilydependentonthe
#hardwarearchitecture,batchsizeandanychangestothemodelarchitecture
#specification.Selectingafinelytunedlearningratescheduleisan
#empiricalprocessthatrequiressomeexperimentation.PleaseseeREADME.md
#moreguidanceanddiscussion.
#
#With8TeslaK40'sandabatchsize=256,thefollowingsetupachieves
#precision@1=73.5%after100hoursand100Ksteps(20epochs).
#Learningratedecayfactorselectedfromhttp://arxiv.org/abs/1404.5997.
打开TensorBoard:tensorboard--logdir=/tmp/imagenet_train

imagenet训练数据1000k,Inceptionv3network在1060上训练batch_size=32,32examples/sec,

20小时跑了70kstep后共训练数据32*70k=2100k,2epochs的训练数据,loss从13降到8,并且降低的趋势走平了。

55小时跑了204kstep后共训练数据32*204k=6400k,6epochs的训练数据,loss从13降到7,从120k开始趋势接近平了。

4天1小时(97H)跑了360kstep后共训练数据32*360k=10000k,10epochs的训练数据,loss还是7左右,loss从120k开始趋势接近平了。

Eval:precision@1=0.5584recall@5=0.8052[50016examples]

类似的问题:https://stackoverflow.com/questions/38259166/training-tensorflow-inception-v3-imagenet-on-modest-hardware-setup他也没达到最优:

2016-06-0612:07:52.245005:precision@1=0.5767recall@5=0.8143[50016examples]
2016-06-0922:35:10.118852:precision@1=0.5957recall@5=0.8294[50016examples]
2016-06-1415:30:59.532629:precision@1=0.6112recall@5=0.8396[50016examples]
2016-06-2013:57:14.025797:precision@1=0.6136recall@5=0.8423[50016examples]

Onasmallhardwaresetuplikeyours,itwillbedifficulttoachievemaximumperformance.GenerallyspeakingforCNN's,thebestperformanceiswiththelargestbatchsizespossible.ThismeansthatforCNN'sthetrainingprocedureisoftenlimitedbythemaximumbatchsizethatcanfitinGPUmemory.


                                            
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: