Why c-lang for deep learning, scala for cluster machine learning ?
2015-08-09 11:47
351 查看
The choice of programming language for different machine learning task (both for academic purposes and industry purposes) are made to achieve optimization of the algorithm in terms of execution speed.
For a machine learning task in which the algorithm is largely sequential like deep learning, the bottleneck is the sequential part of the computation. To deal with this situation, an implementation of the algorithm in a programming language which has the
fastest average speed (in this case, C programming language) will yield lowest execution time possible when well programmed.
When the algorithm of the machine learning task contains substantial non sequential components, the execution time of the non sequential parts of the algorithm can be reduced in proportion to the amount of parallel execution achieved. Thus, for highly parallable
algorithms such as TF-IDF in which non sequential computation is the bottleneck, an acceleration can be achieved by allocate parallel computations to different CPU hardware threads / CPUs. The requirement of distributed computation raises the problem of maintainability.
To be specific, the problem is how to conveniently maintain the computation cluster when the computation nodes may not be hosted in the same OS environment and same CPU type. Luckily, JVM based programming language solved this problem beautifully by abstract
the computation environment away from the OS and the physical computer.
For a machine learning task in which the algorithm is largely sequential like deep learning, the bottleneck is the sequential part of the computation. To deal with this situation, an implementation of the algorithm in a programming language which has the
fastest average speed (in this case, C programming language) will yield lowest execution time possible when well programmed.
When the algorithm of the machine learning task contains substantial non sequential components, the execution time of the non sequential parts of the algorithm can be reduced in proportion to the amount of parallel execution achieved. Thus, for highly parallable
algorithms such as TF-IDF in which non sequential computation is the bottleneck, an acceleration can be achieved by allocate parallel computations to different CPU hardware threads / CPUs. The requirement of distributed computation raises the problem of maintainability.
To be specific, the problem is how to conveniently maintain the computation cluster when the computation nodes may not be hosted in the same OS environment and same CPU type. Luckily, JVM based programming language solved this problem beautifully by abstract
the computation environment away from the OS and the physical computer.
相关文章推荐
- 无法安装Apple mobile device support 导致ipad(iphone)无法连接itunes的解决办法
- iOS7之后如何获得APP唯一的身份标识
- Insertion Sort List
- sql作业启停服务器
- EL表达式
- UIImageView和UIImage
- Tomcat源码解析(一)下载源码与导入eclipse
- 重学java23种设计模式(15)解释器模式
- 单双列集合
- 诚信胜于生命
- CURL不使用文件存取COOKIE
- 【Python】 天气查询
- javascript立即调用函数表达式
- hdu5365
- HDU 1988 Cube Stacking (数据结构-并检查集合)
- UITextField
- android.graphics.drawable.Drawable注释翻译
- 键盘操作、小程序查看器
- 双击退出程序
- C语言中32位和64位的数据类型长度