Ubuntu14.04+Theano+OpenCL+libgpuarray实现GPU运算
2016-07-08 12:40
351 查看
博客已经迁移至Marcovaldo’s blog (http://marcovaldong.github.io/)
上一篇博客介绍了如何使用Theano+logistic regression来实现kaggle上的数字手写识别,文末提到了CPU计算实在太慢,因此在做完这个实验之后,博主查阅了Theano的文档,了解到Theano官方仅支持CUDA进行GPU运算,不支持OpenCL,也就是说Theano官方仅支持N卡。原因是,CUDA和OpenCL是两个GPU计算平台,CUDA仅支持N卡,OpenCL支持所有的显卡,二者的具体区别还请自行查询。无奈博主的笔记本有一张intel的集成显卡和AMD的一张入门独显,而Theano非官方的提供了libgpuarray来支持OpenCL,因此博主花了大量的时间来尝试安装libgpuarray。
libgpuarray支持的OS有Debian6,Ubuntu14.04,MAC OS X10.11和win7,而网上能找到的成功安装libgpuarray的只有两篇博文,全是在MAC OS上,下面给出博文链接,供后面的同学参考:
https://www.robberphex.com/2016/05/521
http://codechina.org/2016/04/how-to-install-theano-on-mac-os-x-ei-caption-with-opencl-support/
博主的最初OS是win7,整个6月的空闲时间几乎都用在安装libgpuarray上了,遇到了无数个坑,然并卵,最终也没能成功。这里列出在win7上安装libgpuarray需要的一些环境,供后面的同学参考:
最新的AMD显卡驱动,具体可前往AMD官网查询
AMD APP SDK,其提供了OpenCL
Cmake >= 3.0 (cmake)
g++,一般我们可以通过wingw或TDW-GCC来安装
visual studio
clBLAS (clblas)
libcheck
7月份在win7上装了Ubuntu14.04的双系统,尝试在Ubuntu上实现Theano+OpenCL的GPU运算,最终libgpuarray算是安装成功吧,只是还不能用A卡来计算,具体问题文末介绍。下面介绍整个过程。
在安装好Ubuntu14.04之后,第一件事就是换驱动。找到附加驱动,如下图所示,系统初始使用的驱动是开源的,我们选择来自fglrx的专有驱动,然后点击“应用更改”按钮,静静的等它装完重启。
重启后打开终端,输入fglrxinfo,终端会返回显卡信息,如下所示:
再在终端输入fgl_glxgears,会跳出一个测试窗口(旋转的方块),这就证明显卡驱动安装成功。这里,博主找到了安装驱动的比较好的方法,供后面的同学参考。
http://forum.ubuntu.org.cn/viewtopic.php?t=445434
http://www.tuicool.com/articles/6N3e2ir
AMDSDK默认会安装在/opt/下,这时候在终端输入clinfo命令会返回OpenCL平台信息和计算设备信息,下面给出我的笔记本的数据:
另外还要在/root/.bashrc文件中添加环境变量,具体如下:
到这里,AMD APP SDK就算是安装好了,下面再给出我参考的几篇博文:
https://www.blackmoreops.com/2013/11/22/install-amd-app-sdk-kali-linux/
http://blog.csdn.net/vblittleboy/article/details/8979288
然后我们就进入了python的一个虚拟环境venv,下面的操作全是在venv中进行的。首先安装Theano和libgpuarray的一些依赖包,具体要求看libgpuarray官方文档
安装scipy时可能会报错,可参考下面链接来修复:
http://stackoverflow.com/questions/11114225/installing-scipy-and-numpy-using-pip
然后是安装Theano,注意版本号为0.8.2的稳定Theano跟libgpuarray是不同步的,在使用时会报错,具体文末会提到。这里我安装的是Theano(0.9.0dev):
这里还用到了libcheck,因此装上它:
下面开始安装libgpuarray
下面开始测试一下,Theano官方给出了一段测试程序,我们命名为test.py,程序如下:
先是仅用Theano和CPU,结果如下:
再是加了THEANO_FLAGS=mode=FAST_RUN的:
下面使用OpenCL的时候就报错,网上没有找到有效的解决方法,希望有遇到过的大神给指点迷津,具体如下:
到这里,如果你没有下面的这个问题,你的libgpuarray应该就算装好了。
接下来我会抽时间翻译一下libgpuarray的官方安装文档,供后来的同学参考。
现在的深度计算工具都是官方支持N卡,A卡在这方面实在太吃亏了,希望各个深度学习工具能尽快做出支持A卡的API。
最后鸣谢robberphex和Tinyfool,二位的博客我提供了思路。
https://www.robberphex.com/2016/05/521
http://codechina.org/2016/04/how-to-install-theano-on-mac-os-x-ei-caption-with-opencl-support/
http://m.blog.csdn.net/article/details?id=43987599
http://forum.ubuntu.org.cn/viewtopic.php?t=445434
http://www.tuicool.com/articles/6N3e2ir
https://www.blackmoreops.com/2013/11/22/install-amd-app-sdk-kali-linux/
http://blog.csdn.net/vblittleboy/article/details/8979288
http://blog.csdn.net/zahuopuboss/article/details/50927432
http://stackoverflow.com/questions/27971707/using-pythontheano-with-opencl-in-an-amd-gpu
http://stackoverflow.com/questions/11114225/installing-scipy-and-numpy-using-pip
上一篇博客介绍了如何使用Theano+logistic regression来实现kaggle上的数字手写识别,文末提到了CPU计算实在太慢,因此在做完这个实验之后,博主查阅了Theano的文档,了解到Theano官方仅支持CUDA进行GPU运算,不支持OpenCL,也就是说Theano官方仅支持N卡。原因是,CUDA和OpenCL是两个GPU计算平台,CUDA仅支持N卡,OpenCL支持所有的显卡,二者的具体区别还请自行查询。无奈博主的笔记本有一张intel的集成显卡和AMD的一张入门独显,而Theano非官方的提供了libgpuarray来支持OpenCL,因此博主花了大量的时间来尝试安装libgpuarray。
libgpuarray支持的OS有Debian6,Ubuntu14.04,MAC OS X10.11和win7,而网上能找到的成功安装libgpuarray的只有两篇博文,全是在MAC OS上,下面给出博文链接,供后面的同学参考:
https://www.robberphex.com/2016/05/521
http://codechina.org/2016/04/how-to-install-theano-on-mac-os-x-ei-caption-with-opencl-support/
博主的最初OS是win7,整个6月的空闲时间几乎都用在安装libgpuarray上了,遇到了无数个坑,然并卵,最终也没能成功。这里列出在win7上安装libgpuarray需要的一些环境,供后面的同学参考:
最新的AMD显卡驱动,具体可前往AMD官网查询
AMD APP SDK,其提供了OpenCL
Cmake >= 3.0 (cmake)
g++,一般我们可以通过wingw或TDW-GCC来安装
visual studio
clBLAS (clblas)
libcheck
7月份在win7上装了Ubuntu14.04的双系统,尝试在Ubuntu上实现Theano+OpenCL的GPU运算,最终libgpuarray算是安装成功吧,只是还不能用A卡来计算,具体问题文末介绍。下面介绍整个过程。
安装Ubuntu14.04双系统
我的win7/Ubuntu14.04双系统安装过程参考了http://m.blog.csdn.net/article/details?id=43987599 这篇博文比较简单,这里不再展开。安装AMD显卡驱动
博主开始是死在了这里,AMD驱动装坏了好几次,装坏了的结果就是重启后不能进入图形界面。然后只能在tty或者initramfs进行修复,这对于博主这种第一次接触linux的人来说太困难了,往往修复好了还是不能用,只好重装系统,整个过程重装了七八次。这里我介绍一种安装驱动的方法,比较简单快速(至少我是一次就成功了)。在安装好Ubuntu14.04之后,第一件事就是换驱动。找到附加驱动,如下图所示,系统初始使用的驱动是开源的,我们选择来自fglrx的专有驱动,然后点击“应用更改”按钮,静静的等它装完重启。
重启后打开终端,输入fglrxinfo,终端会返回显卡信息,如下所示:
marcovaldo@marcovaldong:~$ fglrxinfo display: :0 screen: 0 OpenGL vendor string: Advanced Micro Devices, Inc. OpenGL renderer string: AMD Radeon HD 7400M Series OpenGL version string: 4.5.13399 Compatibility Profile Context 15.201.1151
再在终端输入fgl_glxgears,会跳出一个测试窗口(旋转的方块),这就证明显卡驱动安装成功。这里,博主找到了安装驱动的比较好的方法,供后面的同学参考。
http://forum.ubuntu.org.cn/viewtopic.php?t=445434
http://www.tuicool.com/articles/6N3e2ir
安装AMD APP SDK
前往AMD官网下载SDK(注意OS和位数),我这里下载的是Linux64位版AMD APP SDK 3.0。文件解压后出现一个.sh文件,终端输入命令sudo sh AMD-APP-SDK-v3.0.130.136-GA-linux64.sh
AMDSDK默认会安装在/opt/下,这时候在终端输入clinfo命令会返回OpenCL平台信息和计算设备信息,下面给出我的笔记本的数据:
marcovaldo@marcovaldong:~$ clinfo Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.0 AMD-APP (1800.11) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices Platform Name: AMD Accelerated Parallel Processing Number of devices: 2 Device Type: CL_DEVICE_TYPE_GPU Vendor ID: 1002h Board name: AMD Radeon HD 7400M Series Device Topology: PCI[ B#1, D#0, F#0 ] Max compute units: 2 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 4 Native vector width double: 0 Max clock frequency: 700Mhz Address bits: 32 Max memory allocation: 134217728 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 16384 Max image 2D height: 16384 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 2048 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 536870912 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Max pipe arguments: 0 Max pipe active reservations: 0 Max pipe packet size: 0 Max global variable size: 0 Max global variable preferred total size: 0 Max read/write image args: 0 Max on device events: 0 Queue on device max size: 0 Max on device queues: 0 Queue on device preferred size: 0 SVM capabilities: Coarse grain buffer: No Fine grain buffer: No Fine grain system: No Atomics: No Preferred platform atomic alignment: 0 Preferred global atomic alignment: 0 Preferred local atomic alignment: 0 Kernel Preferred work group size multiple: 64 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue on Host properties: Out-of-Order: No Profiling : Yes Queue on Device properties: Out-of-Order: No Profiling : No Platform ID: 0x7f98e6833430 Name: Caicos Vendor: Advanced Micro Devices, Inc. Device OpenCL C version: OpenCL C 1.2 Driver version: 1800.11 Profile: FULL_PROFILE Version: OpenCL 1.2 AMD-APP (1800.11) Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_amd_image2d_from_buffer_read_only cl_khr_spir cl_khr_gl_event Device Type: CL_DEVICE_TYPE_CPU Vendor ID: 1002h Board name: Max compute units: 4 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 8 Preferred vector width double: 4 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 8 Native vector width double: 4 Max clock frequency: 2299Mhz Address bits: 64 Max memory allocation: 2147483648 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 64 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 4096 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 64 Cache size: 32768 Global memory size: 6161788928 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 32768 Max pipe arguments: 16 Max pipe active reservations: 16 Max pipe packet size: 2147483648 Max global variable size: 1879048192 Max global variable preferred total size: 1879048192 Max read/write image args: 64 Max on device events: 0 Queue on device max size: 0 Max on device queues: 0 Queue on device preferred size: 0 SVM capabilities: Coarse grain buffer: No Fine grain buffer: No Fine grain system: No Atomics: No Preferred platform atomic alignment: 0 Preferred global atomic alignment: 0 Preferred local atomic alignment: 0 Kernel Preferred work group size multiple: 1 Error correction support: 0 Unified memory for Host and Device: 1 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue on Host properties: Out-of-Order: No Profiling : Yes Queue on Device properties: Out-of-Order: No Profiling : No Platform ID: 0x7f98e6833430 Name: Intel(R) Core(TM) i3-2350M CPU @ 2.30GHz Vendor: GenuineIntel Device OpenCL C version: OpenCL C 1.2 Driver version: 1800.11 (sse2,avx) Profile: FULL_PROFILE Version: OpenCL 1.2 AMD-APP (1800.11) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event
另外还要在/root/.bashrc文件中添加环境变量,具体如下:
# AMD APP SDK export AMDAPPSDKROOT="/opt/AMDAPPSDK-3.0" export AMDAPPSDKSAMPLESROOT="/opt/AMDAPPSDK-3.0/"" export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:"/opt/AMDAPPSDK-3.0/lib/x86_64":"/opt/AMDAPPSDK-3.0/lib/x86" export ATISTREAMSDKROOT=$AMDAPPSDKROOT
到这里,AMD APP SDK就算是安装好了,下面再给出我参考的几篇博文:
https://www.blackmoreops.com/2013/11/22/install-amd-app-sdk-kali-linux/
http://blog.csdn.net/vblittleboy/article/details/8979288
升级python
Ubuntu14.04自带的python版本是2.7.6的,我这里把它升级成了2.7.11的,具体方法是在终端输入下面三条命令:sudo add-apt-repository ppa:fkrull/deadsnakes-python2.7 sudo apt-get update sudo apt-get upgrade
安装libgpuarray
为了防止安装过程出现错误影响整个python的环境,这里我们使用python的虚拟环境。sudo apt-get install python-virtualenv sudo apt-get install python-pip virtualenv venv source venv/bin/activate
然后我们就进入了python的一个虚拟环境venv,下面的操作全是在venv中进行的。首先安装Theano和libgpuarray的一些依赖包,具体要求看libgpuarray官方文档
pip install numpy pip install Cython pip install Scipy
安装scipy时可能会报错,可参考下面链接来修复:
http://stackoverflow.com/questions/11114225/installing-scipy-and-numpy-using-pip
然后是安装Theano,注意版本号为0.8.2的稳定Theano跟libgpuarray是不同步的,在使用时会报错,具体文末会提到。这里我安装的是Theano(0.9.0dev):
pip install git+https://github.com/Theano/Theano.git # 这里我使用的是robberphex的CSDN镜像,在此表示感谢 # pip install git+https://code.csdn.net/u010096836/theano.git
这里还用到了libcheck,因此装上它:
sudo apt-get install check
下面开始安装libgpuarray
git clone https://github.com/Theano/libgpuarray.git cd libgpuarray mkdir Build cd Build cmake . -DCMAKE_INSTALL_PREFIX=../venv/ -DCMAKE_BUILD_TYPE=Release make install export LIBRARY_PATH=$LIBRARY_PATH:$PWD/../venv/lib export CPATH=$CPATH:$PWD/../venv/ python setup.py build python setup.py install
下面开始测试一下,Theano官方给出了一段测试程序,我们命名为test.py,程序如下:
from theano import function, config, shared, tensor, sandbox import numpy import time vlen = 10 * 30 * 768 # 10 x #cores x # threads per core iters = 1000 rng = numpy.random.RandomState(22) x = shared(numpy.asarray(rng.rand(vlen), config.floatX)) f = function([], tensor.exp(x)) print(f.maker.fgraph.toposort()) t0 = time.time() for i in range(iters): r = f() t1 = time.time() print("Looping %d times took %f seconds" % (iters, t1 - t0)) print("Result is %s" % (r,)) if numpy.any([isinstance(x.op, tensor.Elemwise) and ('Gpu' not in type(x.op).__name__) for x in f.maker.fgraph.toposort()]): print('Used the cpu') else: print('Used the gpu')
先是仅用Theano和CPU,结果如下:
(venv)marcovaldo@marcovaldong:~/desktop$ python test.py [Elemwise{exp,no_inplace}(<TensorType(float64, vector)>)] Looping 1000 times took 7.7898850441 seconds Result is [ 1.23178032 1.61879341 1.52278065 ..., 2.20771815 2.29967753 1.62323285] Used the cpu
再是加了THEANO_FLAGS=mode=FAST_RUN的:
(venv)marcovaldo@marcovaldong:~/desktop$ THEANO_FLAGS=mode=FAST_RUN,floatX=float32 python test.py [Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)] Looping 1000 times took 3.86811089516 seconds Result is [ 1.23178029 1.61879337 1.52278066 ..., 2.20771813 2.29967761 1.62323284] Used the cpu (venv)marcovaldo@marcovaldong:~/desktop$ THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python test.py [Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)] Looping 1000 times took 3.84727883339 seconds Result is [ 1.23178029 1.61879337 1.52278066 ..., 2.20771813 2.29967761 1.62323284] Used the cpu
下面使用OpenCL的时候就报错,网上没有找到有效的解决方法,希望有遇到过的大神给指点迷津,具体如下:
(venv)marcovaldo@marcovaldong:~/desktop$ THEANO_FLAGS=mode=FAST_RUN,device=opencl0:0,floatX=float32 python test.py ERROR (theano.sandbox.gpuarray): Could not initialize pygpu, support disabled Traceback (most recent call last): File "/home/marcovaldo/myvenv/venv/local/lib/python2.7/site-packages/theano/sandbox/gpuarray/__init__.py", line 96, in <module> init_dev(config.device) File "/home/marcovaldo/myvenv/venv/local/lib/python2.7/site-packages/theano/sandbox/gpuarray/__init__.py", line 47, in init_dev "Make sure Theano and libgpuarray/pygpu " RuntimeError: ('Wrong major API version for gpuarray:', -9997, 'Make sure Theano and libgpuarray/pygpu are in sync.') [Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)] Looping 1000 times took 3.86138486862 seconds Result is [ 1.23178029 1.61879337 1.52278066 ..., 2.20771813 2.29967761 1.62323284] Used the cpu
到这里,如果你没有下面的这个问题,你的libgpuarray应该就算装好了。
RuntimeError: ('Wrong major API version for gpuarray:', -9997, 'Make sure Theano and libgpuarray/pygpu are in sync.') RuntimeError: ('Wrong major API version for gpuarray:', -9998, 'Make sure Theano and libgpuarray/pygpu are in sync.')
接下来我会抽时间翻译一下libgpuarray的官方安装文档,供后来的同学参考。
现在的深度计算工具都是官方支持N卡,A卡在这方面实在太吃亏了,希望各个深度学习工具能尽快做出支持A卡的API。
最后鸣谢robberphex和Tinyfool,二位的博客我提供了思路。
参考链接
http://deeplearning.net/software/libgpuarray/installation.htmlhttps://www.robberphex.com/2016/05/521
http://codechina.org/2016/04/how-to-install-theano-on-mac-os-x-ei-caption-with-opencl-support/
http://m.blog.csdn.net/article/details?id=43987599
http://forum.ubuntu.org.cn/viewtopic.php?t=445434
http://www.tuicool.com/articles/6N3e2ir
https://www.blackmoreops.com/2013/11/22/install-amd-app-sdk-kali-linux/
http://blog.csdn.net/vblittleboy/article/details/8979288
http://blog.csdn.net/zahuopuboss/article/details/50927432
http://stackoverflow.com/questions/27971707/using-pythontheano-with-opencl-in-an-amd-gpu
http://stackoverflow.com/questions/11114225/installing-scipy-and-numpy-using-pip
相关文章推荐
- win7 64位下 VS2012搭建OpenCL开发环境( Intel显卡)
- Jpeg 库的解码OpenCL优化
- Jpeg 库的解码OpenCL优化
- OpenCL编程bug
- 开博第一篇,写点关于GPGPU的东西吧
- VS2013 + OpenCL Configuration
- opencl初步
- 遇到的OpenCL kernel文件中参数的限制情况
- clEnqueueNDRangeKernel()的各参数意义以及相关注意事项
- Opencl编程错误总结
- OpenCL在kernal文件中加入第三方头文件等
- 名词解释---opencl
- OpenCL多线程累加计算
- size_t
- opencl(1)
- 零学习opencl
- 初学习opencl
- [1]-i.MX6Q OpenCL 学习-i.MX6Q介绍
- i.MX6Q OpenCL with QT4.8.4
- ubuntu14.04 amd显卡 OpenCL caffe安装