1.3 centos7源码编译tensorflow-gpu版

时间:2022-06-22
本文章向大家介绍1.3 centos7源码编译tensorflow-gpu版,主要内容包括其使用实例、应用技巧、基本知识点总结和需要注意事项,具有一定的参考价值,需要的朋友可以参考一下。

更新时间:2019-4-5

文章目录

很巧的是编译安装tensorflow-gpu版成功了。 tensorflow已经更新到1.13版,官方的linux安装文件采用的是glibc2.23, 而centos只支持到glibc2.17,所以在使用pip install tensorflow-gpu安装后的使用过程中会报错:

ImportError: /lib64/libc.so.6: version `GLIBC_2.23' not found (required by /usr/local/python3.6/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow.so)

升级到glibc是不可能的,升级完系统都进不了了。只能重新源码编译tensorflow,这样就不会报错了。 下面是源码编译的过程,版本为最新版1.13:

1. 准备cuda

这个过程不用多说,网上教程很多,我使用是cuda 10.0 cudnn 7.5.0 参考一下: https://www.jianshu.com/p/a201b91b3d96 note:一定要记住自己的cuda版本和cudnn版本,以及cuda的安装位置,因为后面用得到。

2. 准备NCCL

nccl是tensorflow gpu版必须的,现在版本2.4.2,下载地址:https://developer.nvidia.com/nccl/nccl-download 下载后应该是rpm文件,安装命令:rpm -ivh nccl-repo-rhel7-2.4.2-ga-cuda10.0-1-1.x86_64.rpm 这个很奇怪,并不会直接安装,而只是解压了一下,产生了3个rpm文件,用命令:rpm -qpl nccl-repo-rhel7-2.4.2-ga-cuda10.0-1-1.x86_64.rpm, 可以看到文件位置:

到相应的文件夹下安装3个rpm文件,安装位置应该默认到/usr/lib64, 如果不确定可以用rpm -qpl xxx.rpm查看安装位置。 note: 这里要记住nccl的版本和安装位置

3. 安装bazel

bazel是google的编译工具,tensorflow就是用它编译的,所以必须安装。 下载链接:https://github.com/bazelbuild/bazel/releases 选在最新版下载:

下载后新建一个文件夹,文件名为bazel,并把该文件放到里面,解压命令:

unzip bazel-0.24.1-dist.zip

解压后编译:

./compile.sh 

等待一段时间,就会提示成功,编译后二进制执行文件在: bazel/ouput 目录下, 在bashrc里添加PATH:

这里的目录一定要正确,之后:source ~/.bashrc 在命令行输入: bazel 出现下面就表示成功了:

4. 安装tensorflow

git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow

开始编译配置:

./configure

注意:与cuda和nccl相关的选择Y,其他都选择no:

Please specify the location of python. [Default is /usr/bin/python]: 
Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: n
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n
Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
Do you wish to build TensorFlow with GDR support? [y/N]: N
Do you wish to build TensorFlow with VERBS support? [y/N]: N
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N
Do you wish to build TensorFlow with CUDA support? [y/N]: Y
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 10.0]:10.0
Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-10.0
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.5.0
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]:  /usr/local/cuda-10.0
Do you wish to build TensorFlow with TensorRT support? [y/N]: N
Please specify the NCCL version you want to use. [Leave empty to default to NCCL 2]: 2.4.2
Please specify the location where NCCL 2 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]: /usr/lib64
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1] 这个地方按回车就行,不用选,用其Default的就好
Do you want to use clang as CUDA compiler? [y/N]: N
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: /usr/bin/gcc
Do you wish to build TensorFlow with MPI support? [y/N]: N
Please specify optimization flags to use during compilation when bazel option “–config=opt” is specified [Default is -march=native]: -march=native
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:N

使用编译命令编译:

bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package 

等待结束就好,需要一定的时间,如果成功,则胜利了。

装换为whl文件:

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

使用pip安装文件:

pip install /tmp/tensorflow_pkg/*.whl

5. 失败后的查错

  1. bazel版本,tensorflow对于bazel有版本要求,一般最新版的tensorflow用最新的bazel肯定没有问题。
  2. cuda,cudnn, nccl 安装位置以及版本不能有错,在配置的过程中一定要指定正确,尤其是nccl 一定要查看安装位置,不然配置过程会找不到的。
  3. 不需要的选项不要选择,配置过程一定要正确。