k8s基础-健康检查机制

时间:2022-07-22
本文章向大家介绍k8s基础-健康检查机制,主要内容包括其使用实例、应用技巧、基本知识点总结和需要注意事项,具有一定的参考价值,需要的朋友可以参考一下。

探针类型

  • Execaction
    • 该探针在容器内执行任意命令,并检查命令的退出状态码,如果状态码是0,则探测成功,否则重启
  • TCPSocketAction
    • 该探针尝试与容器指定端口建立TCP连接,如果连接成功建立,则探测成功,否则容器重新启动
  • HTTPGetAction
    • 该探针对容器的IP地址执行HTTP GET请求,如果探测器收到响应,并且响应状态码没有错误,则认为探测成功,如果返回一个不是期望的状态码或未响应,则视为失败,容器将会被重新启动

示例

创建一个http get类型的探针

# cat nginx-health.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx-health
spec:
  nodeSelector:
    server: 'backend'
  containers:
  - image: nginx:latest
    name: nginx-health
    ports:
    - containerPort: 80
      protocol: TCP
    livenessProbe:
      httpGet:
        path: /index.php
        port: 80

这里我探测了一个并不存在的地址,所以pod在探测失败后肯定会重启 启动该pod,启动成功后查看pod的描述和日志

# kubectl logs -f nginx-health
2019/10/14 08:56:34 [error] 6#6: *1 open() "/usr/share/nginx/html/index.php" failed (2: No such file or directory), client: 192.168.152.168, server: localhost, request: "GET /index.php HTTP/1.1", host: "192.168.166.155:80"
192.168.152.168 - - [14/Oct/2019:08:56:34 +0000] "GET /index.php HTTP/1.1" 404 153 "-" "kube-probe/1.15" "-"

看下pod信息

# kubectl describe pod nginx-health
Name:         nginx-health
Namespace:    default
Priority:     0
Node:         node1/192.168.152.168
Start Time:   Mon, 14 Oct 2019 04:50:10 -0400
Labels:       <none>
Annotations:  cni.projectcalico.org/podIP: 192.168.166.155/32
              kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"nginx-health","namespace":"default"},"spec":{"containers":[{"image":"...
Status:       Running
IP:           192.168.166.155
Containers:
  nginx-health:
    Container ID:   docker://36e07faa8b8d0eb7f3e5465186cc2f23cf8198776d45c546f9ead3264e901c02
    Image:          nginx:latest
    Image ID:       docker-pullable://nginx@sha256:aeded0f2a861747f43a01cf1018cf9efe2bdd02afd57d2b11fcc7fcadc16ccd1
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 14 Oct 2019 05:00:09 -0400
      Finished:     Mon, 14 Oct 2019 05:00:34 -0400
    Ready:          False
    Restart Count:  7
    Liveness:       http-get http://:80/index.php delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-ps4lj (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-ps4lj:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-ps4lj
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  server=backend
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  11m                  default-scheduler  Successfully assigned default/nginx-health to node1
  Normal   Created    9m21s (x3 over 11m)  kubelet, node1     Created container nginx-health
  Normal   Started    9m21s (x3 over 11m)  kubelet, node1     Started container nginx-health
  Normal   Pulling    8m52s (x4 over 11m)  kubelet, node1     Pulling image "nginx:latest"
  Normal   Killing    8m52s (x3 over 10m)  kubelet, node1     Container nginx-health failed liveness probe, will be restarted
  Normal   Pulled     6m3s (x6 over 11m)   kubelet, node1     Successfully pulled image "nginx:latest"
  Warning  Unhealthy  82s (x22 over 10m)   kubelet, node1     Liveness probe failed: HTTP probe failed with statuscode: 404

可以看到相关报错,从该描述中也可以看到相关信息: delay=0s 表示在容器启动后立即开始探测 timeout=1s 表示容器必须在一秒内进行响应,否则记作失败 period=10s 表示每隔10秒探测一次 failure=3 表示连续三次探测失败后重启容器

现在我们改成一个存在的链接进行探测,然后将以上提到的几个指标修改下,比如等pod启动十秒后再进此探测 我们怎么去查看有哪些指标呢?k8s有相关类似的help

# kubectl explain pods.spec.containers.livenessProbe
KIND:     Pod
VERSION:  v1

RESOURCE: livenessProbe <Object>

DESCRIPTION:
     Periodic probe of container liveness. Container will be restarted if the
     probe fails. Cannot be updated. More info:
     https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes

     Probe describes a health check to be performed against a container to
     determine whether it is alive or ready to receive traffic.

FIELDS:
   exec <Object>
     One and only one of the following should be specified. Exec specifies the
     action to take.

   failureThreshold     <integer>
     Minimum consecutive failures for the probe to be considered failed after
     having succeeded. Defaults to 3. Minimum value is 1.

   httpGet      <Object>
     HTTPGet specifies the http request to perform.

   initialDelaySeconds  <integer>
     Number of seconds after the container has started before liveness probes
     are initiated. More info:
     https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes

   periodSeconds        <integer>
     How often (in seconds) to perform the probe. Default to 10 seconds. Minimum
     value is 1.

   successThreshold     <integer>
     Minimum consecutive successes for the probe to be considered successful
     after having failed. Defaults to 1. Must be 1 for liveness. Minimum value
     is 1.

   tcpSocket    <Object>
     TCPSocket specifies an action involving a TCP port. TCP hooks not yet
     supported

   timeoutSeconds       <integer>
     Number of seconds after which the probe times out. Defaults to 1 second.
     Minimum value is 1. More info:
     https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes

修改后的yml文件如下

# cat nginx-health.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx-health
spec:
  nodeSelector:
    server: 'backend'
  containers:
  - image: nginx:latest
    name: nginx-health
    ports:
    - containerPort: 80
      protocol: TCP
    livenessProbe:
      httpGet:
        path: /index.html
        port: 80
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 10

启动后再次查看日志

# kubectl logs -f nginx-health
192.168.152.168 - - [14/Oct/2019:09:03:39 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.15" "-"
192.168.152.168 - - [14/Oct/2019:09:03:49 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.15" "-"

查看修改后的几个指标有没有生效

# kubectl describe pod nginx-health
Name:         nginx-health
Namespace:    default
Priority:     0
Node:         node1/192.168.152.168
Start Time:   Mon, 14 Oct 2019 05:17:49 -0400
Labels:       <none>
Annotations:  cni.projectcalico.org/podIP: 192.168.166.157/32
              kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"nginx-health","namespace":"default"},"spec":{"containers":[{"image":"...
Status:       Running
IP:           192.168.166.157
Containers:
  nginx-health:
    Container ID:   docker://011be58ccbe6fbc6e490588ec5a1f60028e1593a1f28a59022fda72ff544cffc
    Image:          nginx:latest
    Image ID:       docker-pullable://nginx@sha256:aeded0f2a861747f43a01cf1018cf9efe2bdd02afd57d2b11fcc7fcadc16ccd1
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Mon, 14 Oct 2019 05:18:18 -0400
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:80/index.html delay=10s timeout=10s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-ps4lj (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-ps4lj:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-ps4lj
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  server=backend
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  67s   default-scheduler  Successfully assigned default/nginx-health to node1
  Normal  Pulling    66s   kubelet, node1     Pulling image "nginx:latest"
  Normal  Pulled     38s   kubelet, node1     Successfully pulled image "nginx:latest"
  Normal  Created    38s   kubelet, node1     Created container nginx-health
  Normal  Started    38s   kubelet, node1     Started container nginx-health