使用Azure的GPU系列虚拟机Ubuntu-16.0.4安装GPU驱动并使用Tensorflow-GPU的过程
使用Azure的GPU系列虚拟机Ubuntu-16.0.4安装GPU驱动并使用Tensorflow-GPU的过程
1、source activate python36
2、source activate tensorflow-gpu
3、pip install tensorflow-gpu(提示安装的这个版本:tensorflow_gpu-1.12.0-cp36-cp36m-m)
4、查询GPU
from tensorflow.python.client import device_lib
def get_available_gpus():
“””
查看GPU的命令:nvidia-smi
查看被占用的情况:ps aux | grep PID
:return: GPU个数
“””
local_device_protos = device_lib.list_local_devices()
print “all: %s” % [x.name for x in local_device_protos]
print “gpu: %s” % [x.name for x in local_device_protos if x.device_type ==
‘GPU’]
get_available_gpus()
报错提示ImportError: libcublas.so.9.0: cannot open shared object file: No such
file or directory,因此需要安装cuda9
5、使用https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux下载。
命令如下:
cd /opt
wget
https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-
run
sudo sh cuda_9.0.176_384.81_linux-run
安装位置:/usr/local/cuda-9.0
安装信息:
Linux platform:
/usr/local/cuda-#.#
Do you accept the previously read EULA?
accept/decline/quit: accept
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
(y)es/(n)o/(q)uit: n
Install the CUDA 9.0 Toolkit?
(y)es/(n)o/(q)uit: y
Enter Toolkit Location
[ default is /usr/local/cuda-9.0 ]:
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y
Install the CUDA 9.0 Samples?
(y)es/(n)o/(q)uit: y
Enter CUDA Samples Location
[ default is /home/adai ]:
Installing the CUDA Toolkit in /usr/local/cuda-9.0 …
Installing the CUDA Toolkit in /usr/local/cuda-9.0 …
Installing the CUDA Samples in /home/adai …
Copying samples to /home/adai/NVIDIA_CUDA-9.0_Samples now…
Finished copying samples.
===========
= Summary =
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-9.0
Samples: Installed in /home/adai
Please make sure that
- PATH includes /usr/local/cuda-9.0/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-9.0/lib64, or, add
/usr/local/cuda-9.0/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run the uninstall script in
/usr/local/cuda-9.0/bin
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.0/doc/pdf
for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the
CUDA Driver. A driver of version at least 384.00 is required for CUDA 9.0
functionality to work.
To install the driver using this installer, run the following command,
replacing
sudo
Logfile is /tmp/cuda_install_32689.log
Signal caught, cleaning up
(tensorflow-gpu) root@adailearninggpu:/opt#
6、执行步骤4测试列出GPU,这时提示:
libnvidia-fatbinaryloader.so.415.27: cannot open shared object file: No such
file or directory
7、解决办法:下载https://www.nvidia.com/content/DriverDownload-
March2009/confirmation.php?url=/XFree86/Linux-x86_64/415.27/NVIDIA-
Linux-x86_64-415.27.run&lang=us&type=TITAN
执行:
cd /opt/
wget http://us.download.nvidia.com/XFree86/Linux-x86_64/415.27/NVIDIA-
Linux-x86_64-415.27.run
chmod 777 NVIDIA-Linux-x86_64-415.27.run
./NVIDIA-Linux-x86_64-415.27.run
如果安装失败,则sudo apt-get –purge remove nvidia-*卸载原有Nvidia驱动。
8、修改/etc/profile,添加下列到末尾,添加后执行:source /etc/profile
export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
export
LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}:/usr/lib/nvidia-415/
9、测试第4步,成功时,会显示cpu、gpu设备。