背景

用户在TKE中部署TensorFlow, 不知道如何部署已经如何验证是否可以使用GPU,还是用的cpu. 下面主要演示如何部署TensorFlow以及验证TensorFlow在TKE中是否可以使用GPU

在TKE中添加GPU节点

在TKE控制台中添加GPU节点

GPU

检查状态: 节点状态为健康说明添加成功.

部署 TensorFlow

本次部署我们选择官方镜像tensorflow/tensorflow:latest-gpu-jupyter( *Tag: 为latest-gpu-jupyter*), 为了方便在线调试我们选择jupyter版本的镜像。

Xnip2020-10-27_16-28-45.png

部署完成后, 在TKE控制台的服务与路由中找到刚刚创建的service获取到公网ip.

访问测试:

image.png

获取token

在TKE控制台登陆到TensorFlow 容器中执行一下命令:

jupyter notebook list

image.png

登陆时输入这个token

image.png

到目前为止我们的服务部署完成了

验证GPU

在TensorFlow的jupyter web页面中选择new-> python3：

image.png

输入一下代码:

import tensorflow as tf
print(tf.__version__)

print('GPU', tf.config.list_physical_devices('GPU'))

a = tf.constant(2.0)
b = tf.constant(4.0)
print(a + b)

点击运行

image.png

GPU [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

这个结果说明可以使用GPU进行计算

限制 GPU 内存增长

默认情况下，TensorFlow 会映射进程可见的所有 GPU（取决于 CUDA_VISIBLE_DEVICES）的几乎全部内存。这是为了减少内存碎片，更有效地利用设备上相对宝贵的 GPU 内存资源。为了将 TensorFlow 限制为使用一组特定的 GPU，我们使用 tf.config.experimental.set_visible_devices 方法。

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only use the first GPU
  try:
    tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
  except RuntimeError as e:
    # Visible devices must be set before GPUs have been initialized
    print(e)

image.png

在某些情况下，我们希望进程最好只分配可用内存的一个子集，或者仅在进程需要时才增加内存使用量。TensorFlow 为此提供了两种控制方法。

第一个选项是通过调用 tf.config.experimental.set_memory_growth 来打开内存增长。此选项会尝试根据运行时分配需求来分配尽可能充足的 GPU 内存：首先分配非常少的内存，但随着程序的运行，需要的 GPU 内存会逐渐增多，于是扩展分配给 TensorFlow 进程的 GPU 内存区域。请注意，我们不会释放内存，因为这样会产生内存碎片。要关闭特定 GPU 的内存增长，请在分配任何张量或执行任何运算之前使用以下代码。

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
    print(e)

image.png

第二个启用此选项的方式是将环境变量 TF_FORCE_GPU_ALLOW_GROWTH 设置为 true。这是一个特定于平台的配置

第二种方法是使用 tf.config.experimental.set_virtual_device_configuration 配置虚拟 GPU 设备，并且设置可在 GPU 上分配多少总内存的硬性限制。

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

image.png

更多场景请参考官方文档(以上内容部分摘自TensorFlow官方文章)

参考： https://www.tensorflow.org/

腾讯云TKE-GPU案例: TensorFlow 在TKE中的使用

背景

在TKE中添加GPU节点

部署 TensorFlow

验证GPU