TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments.

1 Overview

在 Tensorflow 给的官方例子中 Use TensorFlow Serving with Kubernetes，是将模型拷贝到镜像里的，这里是会有点不太灵活，因为更新模型就要重新构建镜像，并且再去更新对应的 Pod。

由于 Tensorflow Serving 本身就提供了滚动更新模型的能力，而 Tensorflow Serving 是可以通过 S3 来直接读取模型文件。

关于 Demo，希望大家可以通过 Amazon S3 Tools Usage，了解一下 s3cmd 的用法。

2 Practice

部署 S3 上的模型，和随时更新模型，需要提前准备下面的材料。

Serving 镜像
模型文件和 s3cmd 的环境

2.1 Serving 镜像

Serving 镜像可以在 Tensorflow Serving 官方的镜像仓库获取。选择镜像的时候，务必注意是否为 GPU 版本。然后 tag 为 TenC 的镜像仓库。

# 拉取官方镜像
docker pull tensorflow/serving

2.2 模型文件和 s3cmd 环境

这是 tensorflow/serving 提供的例子。模型所在的地址在这里。

.
└── 00000123
    ├── assets
    │   └── foo.txt
    ├── saved_model.pb
    └── variables
        ├── variables.data-00000-of-00001
        └── variables.index

假设大家对 s3cmd 有所了解了，如果需要访问对应的 bucket，需要 Access key 和 Secret key。

s3cmd put --recursive --access_key=runzhliu-demo-xxx --secret_key=runzhliu-demo-xxx --no-ssl --host=xxx.db:7480 resnet_v2_fp32_savedmodel_NHWC_jpg s3://runzhliu__demo/Tensorflow_Serving/

2.3 部署

这里需要在创建 Pod 的时候，传入跟 Tensorflow 与 S3 相关的几个环节变量，否则 Serving 是无法加载 S3 的模型。

# key 可以从上面的内容 Ceph 存储 Ceph Key 模块找到
export AWS_ACCESS_KEY_ID=runzhliu-demo-key
export AWS_SECRET_ACCESS_KEY=runzhliu-demo-secret
export S3_ENDPOINT=9.25.151.xxx:7480
export S3_USE_HTTPS=0

2.5 更新模型

这里测试的更新方式是直接上传一个 version 更大的模型文件夹。然后再通过 s3cmd put 上传到 Ceph 存储。

.
|-- 00000123 // 原来的模型
|   |-- assets
|   |   `-- foo.txt
|   |-- saved_model.pb
|   `-- variables
|       |-- variables.data-00000-of-00001
|       `-- variables.index
`-- 00000124 // 更新后的模型
    |-- assets
    |   `-- foo.txt
    |-- saved_model.pb
    `-- variables
        |-- variables.data-00000-of-00001
        `-- variables.index

然后再通过 curl 来查看模型服务的情况，可以发现 version 为 124 的模型是 AVAILABLE 的，而 123 的模型变成 END，这是由 Serving 默认的 Version Policy 决定的，会自动加载模型版本号更大的模型。

# curl http://172.17.91.182:8501/v1/models/saved_model_half_plus_two_cpu
{
 "model_version_status": [
  {
   "version": "124",
   "state": "AVAILABLE",
   "status": {
    "error_code": "OK",
    "error_message": ""
   }
  },
  {
   "version": "123",
   "state": "END",
   "status": {
    "error_code": "OK",
    "error_message": ""
   }
  }
 ]
}

3 测试

下面我们通过几个简单的 curl 请求来测试一下我们部署的 tensorflow-serving Workload。测试的环境可以参考 Serving_Curl。

/ # curl http://6.16.240.189:8501/v1/models/saved_model_half_plus_two_cpu
{
 "model_version_status": [
  {
   "version": "123",
   "state": "AVAILABLE",
   "status": {
    "error_code": "OK",
    "error_message": ""
   }
  }
 ]
}

# curl -d '{"instances": [1.0, 2.0, 5.0]}'  -X POST http://6.16.240.189:8501/v1/models/saved_model_half_plus_two_cpu:predict
{
    "predictions": [2.5, 3.0, 4.5]
}

下面通过 curl 已经部署的 Pod IP 和 HTTP 对应的8501端口，查看部署模型的 metadata 信息，将会输出 signature_def 等信息。

# curl http://6.16.240.189:8501/v1/models/saved_model_half_plus_two_cpu/versions/123/metadata
{
"model_spec":{
 "name": "saved_model_half_plus_two_cpu",
 "signature_name": "",
 "version": "123"
}
,
"metadata": {"signature_def": {
 "signature_def": {
  "regress_x_to_y2": {
   "inputs": {
    "inputs": {
     "dtype": "DT_STRING",
     "tensor_shape": {
      "dim": [],
      "unknown_rank": true
     },
     "name": "tf_example:0"
    }
   },
   "outputs": {
    "outputs": {
     "dtype": "DT_FLOAT",
     "tensor_shape": {
      "dim": [
       {
        "size": "-1",
        "name": ""
       },
       {
        "size": "1",
        "name": ""
       }
      ],
      "unknown_rank": false
     },
     "name": "y2:0"
    }
   },
   "method_name": "tensorflow/serving/regress"
  },
  "classify_x_to_y": {
   "inputs": {
    "inputs": {
     "dtype": "DT_STRING",
     "tensor_shape": {
      "dim": [],
      "unknown_rank": true
     },
     "name": "tf_example:0"
    }
   },
   "outputs": {
    "scores": {
     "dtype": "DT_FLOAT",
     "tensor_shape": {
      "dim": [
       {
        "size": "-1",
        "name": ""
       },
       {
        "size": "1",
        "name": ""
       }
      ],
      "unknown_rank": false
     },
     "name": "y:0"
    }
   },
   "method_name": "tensorflow/serving/classify"
  },
  "regress_x2_to_y3": {
   "inputs": {
    "inputs": {
     "dtype": "DT_FLOAT",
     "tensor_shape": {
      "dim": [
       {
        "size": "-1",
        "name": ""
       },
       {
        "size": "1",
        "name": ""
       }
      ],
      "unknown_rank": false
     },
     "name": "x2:0"
    }
   },
   "outputs": {
    "outputs": {
     "dtype": "DT_FLOAT",
     "tensor_shape": {
      "dim": [
       {
        "size": "-1",
        "name": ""
       },
       {
        "size": "1",
        "name": ""
       }
      ],
      "unknown_rank": false
     },
     "name": "y3:0"
    }
   },
   "method_name": "tensorflow/serving/regress"
  },
  "serving_default": {
   "inputs": {
    "x": {
     "dtype": "DT_FLOAT",
     "tensor_shape": {
      "dim": [
       {
        "size": "-1",
        "name": ""
       },
       {
        "size": "1",
        "name": ""
       }
      ],
      "unknown_rank": false
     },
     "name": "x:0"
    }
   },
   "outputs": {
    "y": {
     "dtype": "DT_FLOAT",
     "tensor_shape": {
      "dim": [
       {
        "size": "-1",
        "name": ""
       },
       {
        "size": "1",
        "name": ""
       }
      ],
      "unknown_rank": false
     },
     "name": "y:0"
    }
   },
   "method_name": "tensorflow/serving/predict"
  },
  "regress_x_to_y": {
   "inputs": {
    "inputs": {
     "dtype": "DT_STRING",
     "tensor_shape": {
      "dim": [],
      "unknown_rank": true
     },
     "name": "tf_example:0"
    }
   },
   "outputs": {
    "outputs": {
     "dtype": "DT_FLOAT",
     "tensor_shape": {
      "dim": [
       {
        "size": "-1",
        "name": ""
       },
       {
        "size": "1",
        "name": ""
       }
      ],
      "unknown_rank": false
     },
     "name": "y:0"
    }
   },
   "method_name": "tensorflow/serving/regress"
  }
 }
}
}
}

可以通过 Serving Pod 对应的 Serving 的 name 和集群 IP 来请求结果，分别是 tensorflow-serving 和 172.17.91.182 。所以就算更新了 Pod，Pod IP 变化了，通过上述两种方法，依然可以路由到 serving 的服务。

# curl http://tensorflow-serving:8501/v1/models/saved_model_half_plus_two_cpu
{
 "model_version_status": [
  {
   "version": "123",
   "state": "AVAILABLE",
   "status": {
    "error_code": "OK",
    "error_message": ""
   }
  }
 ]
}

# curl http://172.17.91.182:8501/v1/models/saved_model_half_plus_two_cpu
{
 "model_version_status": [
  {
   "version": "123",
   "state": "AVAILABLE",
   "status": {
    "error_code": "OK",
    "error_message": ""
   }
  }
 ]
}

4 Summary

目前在 Kubernetes 部署 Tensorflow Serving 还是非常便利的，同时通过 S3 来管理模型，也提供了滚动更新模型的能力。

Kubernetes 环境的 Tensorflow Serving on S3