您的位置:首页 > 其它

[DL] GTX1080 + Ubuntu16.04 + CUDA 8.0RC + Tensorflow + Theano + keras

2016-08-11 17:52 519 查看
https://www.douban.com/note/568373446/?type=like


[DL] GTX1080 + Ubuntu16.04 + CUDA 8.0RC + Tensorflow + Theano + keras



 Fizben 2016-07-05
10:17:01

最近尝鲜配了一台三块1080的机器,部署了TF+Theano+keras的训练环境

过程中有不少坑,在这里记一下:)


 
# ubuntu u盘安装 Faild to copy file from CD-ROM:

用win32diskimager烧录ISO镜像

# 系统启动时提示nouveau error: unkown chipset

# nouveau无法识别GTX1080 - 禁用nouveau

vi /etc/modprobe.d/blacklist.conf

# 添加:

blacklist nouveau

sudo update-initramfs -u

sudo reboot

# 准备系统环境

sudo apt-get install build-essential wget

# 安装gcc g++ 4.8

sudo apt-get install gcc-4.8 gcc-4.8-multilib g++-4.8 g++-4.8-multilib

sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 60

sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.8 50

sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 60

sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.8 50

# 切换gcc g++版本

sudo update-alternatives --config gcc

sudo update-alternatives --config g++

# 移除gcc g++ 4.8

# sudo update-alternatives --remove gcc /usr/bin/gcc-4.8

# sudo update-alternatives --remove g++ /usr/bin/g++-4.8

# CUDA 8.0RC

# https://developer.nvidia.com/cuda-release-candidate-download
# 安装cuda toolkit

# 切换到gcc-4.8

sudo dpkg -i cuda-repo-ubuntu1604-8-0-rc_8.0.27-1_amd64.deb

sudo apt-get update

sudo apt-get install cuda

# 配置环境变量

echo "export CUDA_HOME=/usr/local/cuda" >> ~/.bashrc

echo "export PATH=/usr/local/cuda/bin:$PATH" >> ~/.bashrc

echo "export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH" >> ~/.bashrc

# 安装cuDNN

tar -xf cudnn-8.0-linux-x64-v5.0-ga.tgz

sudo cp -f cuda/lib64/*.* /usr/local/cuda/lib64/

sudo cp -f cuda/include/*.* /usr/local/cuda/include/

# 注意:GeForce GTX 1080 Developers must re-install the latest driver from
www.nvidia.com/drivers after installing any of these CUDA Toolkits.

# 注意:gcc-4.8无法编译nvidia driver

# 注意:安装驱动时需要允许dkms

# 切换到gcc-5

sudo sh NVIDIA-Linux-x86_64-*.run

# 卸载驱动:sudo nvidia-uninstall

# 测试

cd /usr/local/cuda/samples/1_Utilities/deviceQuery

sudo make

./deviceQuery

# modprobe: ERROR: could not insert 'nvidia_361_uvm': Invalid argument

# 这是因为cuda8.0自带了361版本的nvidia driver,需要将其卸载

sudo apt-get remove nvidia-361

---------------------------------------

The following packages will be REMOVED:

cuda cuda-8-0 cuda-demo-suite-8-0 cuda-drivers cuda-runtime-8-0 nvidia-361 nvidia-361-dev

0 upgraded, 0 newly installed, 7 to remove and 76 not upgraded.

After this operation, 312 MB disk space will be freed.

Do you want to continue? [Y/n] y (别怕,没问题)

---------------------------------------

# Tensorflow 0.9.0 pip install (目前不支持CUDA8.0)

sudo apt-get install python-pip python-dev

sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.9.0-cp27-none-linux_x86_64.whl
# 测试

python -c "import tensorflow"

# ImportError: libcudart.so.7.5: cannot open shared object file: No such file or directory (目前不支持CUDA8.0)

# Tensorflow 0.9.0 docker install (目前不支持CUDA8.0)

sudo docker pull tensorflow/tensorflow:r0.9-gpu

# Tensorflow 0.9.0 build from source

# 安装bazel

sudo apt-get install openjdk-8-jdk

echo "deb http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list

curl https://storage.googleapis.com/bazel-apt/doc/apt-key.pub.gpg | sudo apt-key add -

sudo apt-get update

sudo apt-get install bazel

# 编译tensorflow

sudo apt-get install python-numpy swig python-dev

mkdir ~/github && cd ~/github

git clone --recurse-submodules https://github.com/tensorflow/tensorflow
cd ~/github/tensorflow && ./configure

---------------------------------------

Please specify the location of python. [Default is /usr/bin/python]:

Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] n

No Google Cloud Platform support will be enabled for TensorFlow

Do you wish to build TensorFlow with GPU support? [y/N] y

GPU support will be enabled for TensorFlow

Please specify which gcc nvcc should use as the host compiler. [Default is /usr/bin/gcc]:

Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0

Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:

Please specify the Cudnn version you want to use. [Leave empty to use system default]: 5 (not 5.0)

Please specify the location where cuDNN 5 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:

Please specify a list of comma-separated Cuda compute capabilities you want to build with.

You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.

[Default is: "3.5,5.2"]:

Setting up Cuda include

Setting up Cuda lib64

Setting up Cuda bin

Setting up Cuda nvvm

Setting up CUPTI include

Setting up CUPTI lib64

Configuration finished

---------------------------------------

bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

sudo pip install /tmp/tensorflow_pkg/tensorflow-0.9.0-py2-none-any.whl

# 测试

python -c "import tensorflow"

# ImportError: cannot import name pywrap_tensorflow:需要重启

sudo reboot

# Theano & keras

sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose libopenblas-dev git

sudo pip install Theano

sudo pip install keras

# 配置Theano

echo "[global]" > ~/.theanorc

echo "floatX = float32" >> ~/.theanorc

echo "device = gpu0" >> ~/.theanorc

echo "[nvcc]" >> ~/.theanorc

echo "fastmath = True" >> ~/.theanorc

# 测试

python -c "import keras"

# matplotlib

sudo apt-get build-dep python-matplotlib

# E: You must put some 'source' URIs in your sources.list

sudo vi /etc/apt/sources.list

# 去掉所有deb-src前面的#号

sudo apt-get update

sudo pip install matplotlib

# h5py

sudo apt-get install libhdf5-dev

sudo apt-get install cython

sudo pip install h5py

# Docker

# Update apt sources

sudo apt-get update

sudo apt-get install apt-transport-https ca-certificates

sudo apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D

sudo vi /etc/apt/sources.list.d/docker.list

# 添加(14.04):

deb https://apt.dockerproject.org/repo ubuntu-trusty main

# 添加(16.04):

deb https://apt.dockerproject.org/repo ubuntu-xenial main

sudo apt-get update

sudo apt-get install docker-engine

sudo service docker start

# add user group

sudo groupadd docker

sudo usermod -aG docker [your username]

© 本文版权归 Fizben 所有,任何形式转载请联系作者。
© 了解版权计划
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: