We can install TensorFlow via pip easily, but we should care a little bit more if you want to enable GPU.
https://www.tensorflow.org/install/gpu#software_requirements
#Here is how I installed my NVIDIA GPU environment.
sudo apt-get install libcupti-dev #already installed in my case
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
tar -xzvf cudnn-10.2-linux-x64-v8.0.1.13.tgz
sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
I didn’t installed it so far.
https://github.com/tensorflow/tensorflow/issues/38194 https://github.com/tensorflow/tensorflow/issues/36426
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
...
2020-07-04 08:59:38.683571: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.2/lib64
...
Then, I install cuDNN v7 additionaly.
curl https://developer.nvidia.com/compute/machine-learning/cudnn/secure/7.6.5.32/Production/10.2_20191118/cudnn-10.2-linux-x64-v7.6.5.32.tgz -O
tar -xzvf cudnn-10.2-linux-x64-v7.6.5.32.tgz
sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
cuDNN v.7 worked!
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
2020-07-04 09:18:18.395229: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-07-04 09:18:19.410004: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:04:00.0 name: Quadro M5000 computeCapability: 5.2
coreClock: 1.038GHz coreCount: 16 deviceMemorySize: 7.94GiB deviceMemoryBandwidth: 196.99GiB/s
2020-07-04 09:18:19.410817: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties:
pciBusID: 0000:81:00.0 name: Quadro M5000 computeCapability: 5.2
coreClock: 1.038GHz coreCount: 16 deviceMemorySize: 7.94GiB deviceMemoryBandwidth: 196.99GiB/s
2020-07-04 09:18:19.411129: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-07-04 09:18:19.413339: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-07-04 09:18:19.415276: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-07-04 09:18:19.415610: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-07-04 09:18:19.418134: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-07-04 09:18:19.419456: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-07-04 09:18:19.424393: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-07-04 09:18:19.427474: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1
Num GPUs Available: 2
https://www.tensorflow.org/guide/distributed_training
TensorFlow provide data parallelism method and it is called “MirrorStrategy.”
https://www.tensorflow.org/guide/distributed_training#mirroredstrategy
There are two types of data parallelism training. Sync and Asnyc.
About performance, worth to see.