TensorFlow 从入门到精通(一):安装和使用

PIP 安装
安装 PIP

安装 TensorFlow

PIP 安装的优缺点


安装 Bazel




Docker 镜像安装

创建 Docker 用户组

启动 Docker 容器

Docker 镜像安装的优缺点

K40 上运行输出

M40 上运行输出

GTX1080 上运行输出

P40 上输出



目前较为稳定的版本为 0.12,本文以此为例。其他版本请读者自行甄别安装步骤是否需要根据实际情况修改。

TensorFlow 支持以下几种安装方式:

PIP 安装


Docker 镜像安装

PIP 安装

PIP 是一种包管理系统,用于安装和管理用 Python 写的软件包。 —— [ Python PIP ]

安装 PIP

# Ubuntu/Linux 64-bit
$ sudo apt-get install python-pip python-dev

# CentOS, Fedora, RHEL
$ sudo yum install python-pip python-devel

# Mac OS X
$ sudo easy_install pip

安装 TensorFlow

# Python 2
$ sudo pip install --upgrade $TF_BINARY_URL

# Python 3
$ sudo pip3 install --upgrade $TF_BINARY_URL

其中环境变量 TF_BINARY_URL 根据你的环境进行设置,典型选项如下:

# Ubuntu/Linux 64-bit, CPU only, Python 2.7
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.12.1-cp27-none-linux_x86_64.whl

# Ubuntu/Linux 64-bit, GPU enabled, Python 2.7
# 需要 CUDA toolkit 8.0 和 CuDNN v5. 其他版本只能用源码方式安装
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.1-cp27-none-linux_x86_64.whl

# Mac OS X, CPU only, Python 2.7:
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-0.12.1-py2-none-any.whl

# Mac OS X, GPU enabled, Python 2.7:
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/mac/gpu/tensorflow_gpu-0.12.1-py2-none-any.whl

# Ubuntu/Linux 64-bit, CPU only, Python 3.4
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.12.1-cp34-cp34m-linux_x86_64.whl

# Ubuntu/Linux 64-bit, GPU enabled, Python 3.4
# 需要 CUDA toolkit 8.0 和 CuDNN v5. 其他版本只能用源码方式安装
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.1-cp34-cp34m-linux_x86_64.whl

# Ubuntu/Linux 64-bit, CPU only, Python 3.5
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.12.1-cp35-cp35m-linux_x86_64.whl

# Ubuntu/Linux 64-bit, GPU enabled, Python 3.5
# Requires CUDA toolkit 8.0 and CuDNN v5. For other versions, see "Installing from sources" below.
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.1-cp35-cp35m-linux_x86_64.whl

# Mac OS X, CPU only, Python 3.4 or 3.5:
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-0.12.1-py3-none-any.whl

# Mac OS X, GPU enabled, Python 3.4 or 3.5:
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/mac/gpu/tensorflow_gpu-0.12.1-py3-none-any.whl

我的做法是先把 whl 文件下载到本地,然后拷贝到其他机器上安装。


# pip install tensorflow_gpu-0.12.1-cp27-none-linux_x86_64.whl
You are using pip version 7.1.0, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Processing ./tensorflow_gpu-0.12.1-cp27-none-linux_x86_64.whl
Collecting mock>=2.0.0 (from tensorflow-gpu==0.12.1)
Using cached mock-2.0.0-py2.py3-none-any.whl
Collecting protobuf>=3.1.0 (from tensorflow-gpu==0.12.1)
Downloading protobuf-3.1.0.post1-py2.py3-none-any.whl (347kB)
100% |████████████████████████████████| 348kB 164kB/s
Collecting numpy>=1.11.0 (from tensorflow-gpu==0.12.1)
Downloading numpy-1.11.3.zip (4.7MB)
100% |████████████████████████████████| 4.7MB 34kB/s
Collecting wheel (from tensorflow-gpu==0.12.1)
Downloading wheel-0.29.0-py2.py3-none-any.whl (66kB)
100% |████████████████████████████████| 69kB 58kB/s
Collecting six>=1.10.0 (from tensorflow-gpu==0.12.1)
Downloading six-1.10.0-py2.py3-none-any.whl
Collecting funcsigs>=1 (from mock>=2.0.0->tensorflow-gpu==0.12.1)
Downloading funcsigs-1.0.2-py2.py3-none-any.whl
Collecting pbr>=0.11 (from mock>=2.0.0->tensorflow-gpu==0.12.1)
Downloading pbr-1.10.0-py2.py3-none-any.whl (96kB)
100% |████████████████████████████████| 98kB 93kB/s
Requirement already satisfied (use --upgrade to upgrade): setuptools in /usr/lib/python2.7/site-packages (from protobuf>=3.1.0->tensorflow-gpu==0.12.1)
Installing collected packages: funcsigs, six, pbr, mock, protobuf, numpy, wheel, tensorflow-gpu
Found existing installation: six 1.9.0
Uninstalling six-1.9.0:
Successfully uninstalled six-1.9.0
Running setup.py install for numpy
Successfully installed funcsigs-1.0.2 mock-2.0.0 numpy-1.11.3 pbr-1.10.0 protobuf-3.1.0.post1 six-1.10.0 tensorflow-gpu-0.12.1 wheel-0.29.0

从运行记录可以看到 pip 安装 TensorFlow 时会自动将依赖安装。如果你需要在另一台没有网络的机器上安装 TensorFlow,需要将依赖打包拷过去才能正常运行。

PIP 安装的优缺点


缺点:不能灵活定制,操作系统、GPU 硬件、CUDA 版本、cuDNN 版本必须与官方标称一致


本小节的安装方法适合对 TensorFlow 做定制的场景。


$ git clone --recurse-submodules https://github.com/tensorflow/tensorflow[/code] 

安装 Bazel

参考 http://bazel.io/docs/install.html


$ ./configure

根据你的实际情况如实回答一系列问题。回答之后 bazel 会对环境进行配置,此时需要机器可以访问外网,便于获取一些编译依赖包。一些包可能需要翻墙。


仅 CPU 支持,无 GPU 支持:

$ bazel build -c opt //tensorflow/tools/pip_package:build_pip_package

有 GPU 支持:

$ bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

- 生成 pip 安装包

$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

- 使用 PIP 工具安装

$ pip install /tmp/tensorflow_pkg/tensorflow-x.x.x-py2-none-linux_x86_64.whl




Docker 镜像安装

Docker 是一个开源的应用容器引擎,让开发者可以打包他们的应用以及依赖包到一个可移植的容器中,然后发布到任何流行的 Linux 机器上,也可以实现虚拟化。 —— [ Docker ]

当你通过 Docker 安装和运行 TensorFlow 时,它与你机器上之前已安装的软件包完全隔离。


官方提供了 4 个 Docker 镜像可供使用:

仅 CPU 版,无开发环境:

仅 CPU 版,有开发环境:

支持 GPU,无开发环境:

支持 GPU,有开发环境:

另外提供了对应某个发布版本的镜像,只需将上面 tag 中 latest 替换为发布版本号。安装详细步骤如下:

创建 Docker 用户组

允许普通用户无需 sudo 即可启动容器。

$ usermod -a -G docker YOURNAME

启动 Docker 容器

选择上述 4 个镜像中的一个,创建容器。第一次执行该命令时会自动下载镜像,以后不需要再次下载。

$ docker run -it gcr.io/tensorflow/tensorflow

如果你使用了支持 GPU 的容器,在运行该命令时需要增加额外参数,目的是将宿主机上的 GPU 设备暴露给容器。使用 TensorFlow 源码中提供的脚本可以实现该功能

$ cd $TENSORFLOW_ROOT/tensorflow/tools/docker/
$ ./docker_run_gpu.sh gcr.io/tensorflow/tensorflow:gpu


#!/usr/bin/env bash
set -e

export CUDA_HOME=${CUDA_HOME:-/usr/local/cuda}

if [ ! -d ${CUDA_HOME}/lib64 ]; then
echo "Failed to locate CUDA libs at ${CUDA_HOME}/lib64."
exit 1

export CUDA_SO=$(\ls /usr/lib/x86_64-linux-gnu/libcuda.* | \
xargs -I{} echo '-v {}:{}')
export DEVICES=$(\ls /dev/nvidia* | \
xargs -I{} echo '--device {}:{}')

if [[ "${DEVICES}" = "" ]]; then
echo "Failed to locate NVidia device(s). Did you want the non-GPU container?"
exit 1

docker run -it $CUDA_SO $DEVICES "$@"


暴露宿主机的 CUDA_HOME 环境变量给容器使用;

暴露宿主机的 libcuda.* 动态链接库给容器访问;

暴露宿主机的 /dev/nvidia* 设备给容器访问;

Docker 镜像安装的优缺点


缺点:有墙的孩子像根草,增加了 Docker 学习成本


假设读者已经按照上述步骤安装了 GPU 版本 TensorFlow 0.12,接下来可以运行经典例程(MNIST):

# python -m tensorflow.models.image.mnist.convolutional

K40 上运行输出

I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
name: Tesla K40m
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:02:00.0
Total memory: 11.25GiB
Free memory: 11.12GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x4c81230
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 1 with properties:
name: Tesla K40m
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:03:00.0
Total memory: 11.25GiB
Free memory: 11.12GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0:   Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 1:   Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40m, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K40m, pci bus id: 0000:03:00.0)
Step 0 (epoch 0.00), 67.1 ms
Minibatch loss: 12.054, learning rate: 0.010000
Minibatch error: 90.6%
Validation error: 84.6%
Step 100 (epoch 0.12), 14.2 ms
Minibatch loss: 3.303, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 7.0%
Step 200 (epoch 0.23), 13.8 ms
Minibatch loss: 3.481, learning rate: 0.010000
Minibatch error: 12.5%
Validation error: 4.0%
Step 8500 (epoch 9.89), 13.9 ms
Minibatch loss: 1.618, learning rate: 0.006302
Minibatch error: 1.6%
Validation error: 0.8%
Test error: 0.8%

M40 上运行输出

I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
name: Tesla M40
major: 5 minor: 2 memoryClockRate (GHz) 1.112
pciBusID 0000:06:00.0
Total memory: 11.18GiB
Free memory: 11.07GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x37b6440
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 1 with properties:
name: Tesla M40
major: 5 minor: 2 memoryClockRate (GHz) 1.112
pciBusID 0000:87:00.0
Total memory: 11.18GiB
Free memory: 11.07GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0:   Y N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 1:   N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla M40, pci bus id: 0000:06:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla M40, pci bus id: 0000:87:00.0)
Step 0 (epoch 0.00), 23.6 ms
Minibatch loss: 12.054, learning rate: 0.010000
Minibatch error: 90.6%
Validation error: 84.6%
Step 100 (epoch 0.12), 6.7 ms
Minibatch loss: 3.297, learning rate: 0.010000
Minibatch error: 7.8%
Validation error: 7.3%
Step 200 (epoch 0.23), 6.5 ms
Minibatch loss: 3.448, learning rate: 0.010000
Minibatch error: 10.9%
Validation error: 3.8%
Step 8500 (epoch 9.89), 6.4 ms
Minibatch loss: 1.605, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.9%
Test error: 0.8%

GTX1080 上运行输出

I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:118] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.8475
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.48GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:138] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:148] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:868] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
Step 0 (epoch 0.00), 8.2 ms
Minibatch loss: 12.054, learning rate: 0.010000
Minibatch error: 90.6%
Validation error: 84.6%
Step 100 (epoch 0.12), 5.1 ms
Minibatch loss: 3.289, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 7.0%
Step 200 (epoch 0.23), 5.1 ms
Minibatch loss: 3.476, learning rate: 0.010000
Minibatch error: 12.5%
Validation error: 3.7%
Step 8500 (epoch 9.89), 4.9 ms
Minibatch loss: 1.612, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 1.0%
Test error: 0.8%

P40 上输出

$python -m tensorflow.models.image.mnist.convolutional
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Tesla P40
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:06:00.0
Total memory: 22.41GiB
Free memory: 22.24GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x3dbfc90
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: Tesla P40
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:87:00.0
Total memory: 22.41GiB
Free memory: 22.24GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1:   N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P40, pci bus id: 0000:06:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla P40, pci bus id: 0000:87:00.0)
Step 0 (epoch 0.00), 962.4 ms
Minibatch loss: 8.334, learning rate: 0.010000
Minibatch error: 85.9%
Validation error: 84.6%
Step 100 (epoch 0.12), 4.6 ms
Minibatch loss: 3.249, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 7.6%
Step 200 (epoch 0.23), 4.5 ms
Minibatch loss: 3.342, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 4.3%
Step 8500 (epoch 9.89), 4.5 ms
Minibatch loss: 1.614, learning rate: 0.006302
Minibatch error: 1.6%
Validation error: 0.8%
Test error: 0.8%


(1) 如果需要 GPU,那么首先安装 CUDA 和 cuDNN。

(2) GTX1080、P40 为 Pascal 架构,需要安装 CUDA 8.0 + cuDNN 5.1.5 才能正常运行。

(3) TensorFlow 从 0.8.0rc 开始支持多机多卡分布式计算,而更早的版本只支持单计算节点。
