使用Azure的GPU系列虚拟机Ubuntu-16.0.4安装GPU驱动并使用Tensorflow-GPU的过程

使用Azure的GPU系列虚拟机Ubuntu-16.0.4安装GPU驱动并使用Tensorflow-GPU的过程

1、source activate python36
2、source activate tensorflow-gpu
3、pip install tensorflow-gpu(提示安装的这个版本:tensorflow_gpu-1.12.0-cp36-cp36m-m)

4、查询GPU
from tensorflow.python.client import device_lib

def get_available_gpus():
“””
查看GPU的命令:nvidia-smi
查看被占用的情况:ps aux | grep PID
:return: GPU个数
“””
local_device_protos = device_lib.list_local_devices()
print “all: %s” % [x.name for x in local_device_protos]
print “gpu: %s” % [x.name for x in local_device_protos if x.device_type ==
‘GPU’]

get_available_gpus()

报错提示ImportError: libcublas.so.9.0: cannot open shared object file: No such
file or directory,因此需要安装cuda9

5、使用https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux下载。
命令如下:
cd /opt
wget
https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-
run
sudo sh cuda_9.0.176_384.81_linux-run

安装位置:/usr/local/cuda-9.0
安装信息:
Linux platform:

/usr/local/cuda-#.#
Do you accept the previously read EULA?
accept/decline/quit: accept

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
(y)es/(n)o/(q)uit: n

Install the CUDA 9.0 Toolkit?
(y)es/(n)o/(q)uit: y

Enter Toolkit Location
[ default is /usr/local/cuda-9.0 ]:

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y

Install the CUDA 9.0 Samples?
(y)es/(n)o/(q)uit: y

Enter CUDA Samples Location
[ default is /home/adai ]:

Installing the CUDA Toolkit in /usr/local/cuda-9.0 …

Installing the CUDA Toolkit in /usr/local/cuda-9.0 …
Installing the CUDA Samples in /home/adai …
Copying samples to /home/adai/NVIDIA_CUDA-9.0_Samples now…
Finished copying samples.

===========
= Summary =

Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-9.0
Samples: Installed in /home/adai

Please make sure that
- PATH includes /usr/local/cuda-9.0/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-9.0/lib64, or, add
/usr/local/cuda-9.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in
/usr/local/cuda-9.0/bin

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.0/doc/pdf
for detailed information on setting up CUDA.

***WARNING: Incomplete installation! This installation did not install the
CUDA Driver. A driver of version at least 384.00 is required for CUDA 9.0
functionality to work.
To install the driver using this installer, run the following command,
replacing with the name of this run file:
sudo .run -silent -driver

Logfile is /tmp/cuda_install_32689.log
Signal caught, cleaning up
(tensorflow-gpu) root@adailearninggpu:/opt#

6、执行步骤4测试列出GPU,这时提示:
libnvidia-fatbinaryloader.so.415.27: cannot open shared object file: No such
file or directory

7、解决办法:下载https://www.nvidia.com/content/DriverDownload-
March2009/confirmation.php?url=/XFree86/Linux-x86_64/415.27/NVIDIA-
Linux-x86_64-415.27.run&lang=us&type=TITAN
执行:
cd /opt/
wget http://us.download.nvidia.com/XFree86/Linux-x86_64/415.27/NVIDIA-
Linux-x86_64-415.27.run
chmod 777 NVIDIA-Linux-x86_64-415.27.run
./NVIDIA-Linux-x86_64-415.27.run
如果安装失败,则sudo apt-get –purge remove nvidia-*卸载原有Nvidia驱动。

8、修改/etc/profile,添加下列到末尾,添加后执行:source /etc/profile
export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
export
LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}:/usr/lib/nvidia-415/

9、测试第4步,成功时,会显示cpu、gpu设备。


使用Azure的GPU系列虚拟机Ubuntu-16.0.4安装GPU驱动并使用Tensorflow-GPU的过程
https://www.dearcloud.cn/2019/02/19/20200310-cnblogs-old-posts/20190219-使用Azure的GPU系列虚拟机Ubuntu-16.0.4安装GPU驱动并使用Tensorflow-GPU的过程/
作者
宋兴柱
发布于
2019年2月19日
许可协议