解决k8s的coredns一直处于的crashloopbackoff问题
时间:2020-12-30
本文章向大家介绍解决k8s的coredns一直处于的crashloopbackoff问题,主要包括解决k8s的coredns一直处于的crashloopbackoff问题使用实例、应用技巧、基本知识点总结和需要注意事项,具有一定的参考价值,需要的朋友可以参考一下。
首先来看看采坑记录
1-查看日志:kubectl logs得到具体的报错:
1 [root@i-F998A4DE ~]# kubectl logs -n kube-system coredns-fb8b8dccf-hhkfm
2 log is DEPRECATED and will be removed in a future version. Use logs instead.
3 E1230 03:03:51.298180 1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:315: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
4 E1230 03:03:51.298180 1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:315: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
5 log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-fb8b8dccf-hhkfm.unknownuser.log.ERROR.20201230-030351.1: no such file or directory
2-查看pod具体信息:kubectl describe pod得到一些可能比较没用的信息:
1 [root@i-F998A4DE ~]# kubectl describe po -n kube-system coredns-fb8b8dccf-s2nj9
2 Name: coredns-fb8b8dccf-s2nj9
3 Namespace: kube-system
4 Priority: 2000000000
5 PriorityClassName: system-cluster-critical
6 Node: master/10.252.37.41
7 Start Time: Wed, 30 Dec 2020 10:28:40 +0800
8 Labels: k8s-app=kube-dns
9 pod-template-hash=fb8b8dccf
10 Annotations: <none>
11 Status: Running
12 IP: 10.244.0.3
13 Controlled By: ReplicaSet/coredns-fb8b8dccf
14 Containers:
15 coredns:
16 Container ID: docker://50bab6b378f236af89bec945083bfe1af293a71f1276c3c8df324cfbe6540a54
17 Image: k8s.gcr.io/coredns:1.3.1
18 Image ID: docker://sha256:eb516548c180f8a6e0235034ccee2428027896af16a509786da13022fe95fe8c
19 Ports: 53/UDP, 53/TCP, 9153/TCP
20 Host Ports: 0/UDP, 0/TCP, 0/TCP
21 Args:
22 -conf
23 /etc/coredns/Corefile
24 State: Waiting
25 Reason: CrashLoopBackOff
26 Last State: Terminated
27 Reason: Error
28 Exit Code: 2
29 Started: Wed, 30 Dec 2020 10:29:00 +0800
30 Finished: Wed, 30 Dec 2020 10:29:01 +0800
31 Ready: False
32 Restart Count: 2
33 Limits:
34 memory: 170Mi
35 Requests:
36 cpu: 100m
37 memory: 70Mi
38 Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
39 Readiness: http-get http://:8080/health delay=0s timeout=1s period=10s #success=1 #failure=3
40 Environment: <none>
41 Mounts:
42 /etc/coredns from config-volume (ro)
43 /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-2gw5w (ro)
44 Conditions:
45 Type Status
46 Initialized True
47 Ready False
48 ContainersReady False
49 PodScheduled True
50 Volumes:
51 config-volume:
52 Type: ConfigMap (a volume populated by a ConfigMap)
53 Name: coredns
54 Optional: false
55 coredns-token-2gw5w:
56 Type: Secret (a volume populated by a Secret)
57 SecretName: coredns-token-2gw5w
58 Optional: false
59 QoS Class: Burstable
60 Node-Selectors: beta.kubernetes.io/os=linux
61 Tolerations: CriticalAddonsOnly
62 node-role.kubernetes.io/master:NoSchedule
63 node.kubernetes.io/not-ready:NoExecute for 300s
64 node.kubernetes.io/unreachable:NoExecute for 300s
65 Events:
66 Type Reason Age From Message
67 ---- ------ ---- ---- -------
68 Warning FailedScheduling 38s (x4 over 48s) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
69 Normal Scheduled 30s default-scheduler Successfully assigned kube-system/coredns-fb8b8dccf-s2nj9 to master
70 Normal Pulled 11s (x3 over 29s) kubelet, master Container image "k8s.gcr.io/coredns:1.3.1" already present on machine
71 Normal Created 11s (x3 over 29s) kubelet, master Created container coredns
72 Normal Started 10s (x3 over 28s) kubelet, master Started container coredns
73 Warning BackOff 2s (x6 over 26s) kubelet, master Back-off restarting failed container
3-修改coredns配置信息也没有效果
1 [root@master ~]# kubectl edit deployment coredns -n kube-system
2 # Please edit the object below. Lines beginning with a '#' will be ignored,
3 # and an empty file will abort the edit. If an error occurs while saving this file will be
4 # reopened with the relevant failures.
5 apiVersion: extensions/v1beta1
6 kind: Deployment
7 metadata:
8 annotations:
9 deployment.kubernetes.io/revision: "1"
10 creationTimestamp: "2020-12-30T02:28:07Z"
11 generation: 3
12 labels:
13 k8s-app: kube-dns
14 name: coredns
15 namespace: kube-system
16 resourceVersion: "6088"
17 selfLink: /apis/extensions/v1beta1/namespaces/kube-system/deployments/coredns
18 uid: a718d791-4a46-11eb-91a1-d00df998a4de
19 spec:
20 progressDeadlineSeconds: 600
21 replicas: 2 #先将这里改为0,k8s更新完之后,再将这里改回2
22 revisionHistoryLimit: 10
23 selector:
24 matchLabels:
25 k8s-app: kube-dns
26 strategy:
27 rollingUpdate:
28 maxSurge: 25%
29 maxUnavailable: 1
30 type: RollingUpdate
31 template:
32 metadata:
33 creationTimestamp: null
34 labels:
35 k8s-app: kube-dns
36 spec:
37 containers:
38 - args:
39 - -conf
40 - /etc/coredns/Corefile
41 image: k8s.gcr.io/coredns:1.3.1
42 imagePullPolicy: IfNotPresent
43 livenessProbe:
44 failureThreshold: 5
45 httpGet:
46 path: /health
47 port: 8080
48 scheme: HTTP
49 periodSeconds: 10
50 successThreshold: 1
51 timeoutSeconds: 1
52 resources:
53 limits:
54 memory: 170Mi
55 requests:
56 cpu: 100m
57 memory: 70Mi
58 securityContext:
59 allowPrivilegeEscalation: false
60 capabilities:
61 add:
62 - NET_BIND_SERVICE
63 drop:
64 - all
65 procMount: Default
66 readOnlyRootFilesystem: true
67 terminationMessagePath: /dev/termination-log
68 terminationMessagePolicy: File
69 volumeMounts:
70 - mountPath: /etc/coredns
71 name: config-volume
72 readOnly: true
73 dnsPolicy: Default
74 nodeSelector:
75 beta.kubernetes.io/os: linux
76 priorityClassName: system-cluster-critical
77 restartPolicy: Always
78 schedulerName: default-scheduler
79 securityContext: {}
80 serviceAccount: coredns
81 serviceAccountName: coredns
82 terminationGracePeriodSeconds: 30
83 tolerations:
84 - key: CriticalAddonsOnly
85 operator: Exists
86 - effect: NoSchedule
87 key: node-role.kubernetes.io/master
88 volumes:
89 - configMap:
90 defaultMode: 420
91 items:
92 - key: Corefile
93 path: Corefile
94 name: coredns
95 name: config-volume
96 status:
97 availableReplicas: 2
98 conditions:
99 - lastTransitionTime: "2020-12-30T03:00:38Z"
100 lastUpdateTime: "2020-12-30T03:00:38Z"
101 message: ReplicaSet "coredns-fb8b8dccf" has successfully progressed.
102 reason: NewReplicaSetAvailable
103 status: "True"
104 type: Progressing
105 - lastTransitionTime: "2020-12-30T03:38:49Z"
106 lastUpdateTime: "2020-12-30T03:38:49Z"
107 message: Deployment has minimum availability.
108 reason: MinimumReplicasAvailable
109 status: "True"
110 type: Available
111 observedGeneration: 3
112 readyReplicas: 2
113 replicas: 2 #先将这里改为0,k8s更新完后,这里不用动
114 updatedReplicas: 2
4-强制删除coredns pod没有效果
1 [root@i-F998A4DE ~]# kubectl delete po coredns-fb8b8dccf-hhkfm --grace-period=0 --force -n kube-system
2 warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
3 pod "coredns-fb8b8dccf-hhkfm" force deleted
4 [root@i-F998A4DE flannel-dashboard]# kubectl delete po coredns-fb8b8dccf-ll2mp --grace-period=0 --force -n kube-system
5 warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
6 pod "coredns-fb8b8dccf-ll2mp" force deleted
5-查看kubelet的访问也是coredns出错
1 [root@i-F998A4DE ~]# journalctl -f -u kubelet 2 -- Logs begin at Tue 2020-12-29 11:56:05 CST. -- 3 Dec 30 11:30:38 master kubelet[20570]: W1230 11:30:38.307384 20570 container.go:409] Failed to create summary reader for "/libcontainer_16449_systemd_test_default.slice": none of the resources are being tracked. 4 Dec 30 11:30:40 master kubelet[20570]: E1230 11:30:40.356882 20570 pod_workers.go:190] Error syncing pod 2c0dffd5-4a4f-11eb-8c6b-d00df998a4de ("coredns-fb8b8dccf-jnj5h_kube-system(2c0dffd5-4a4f-11eb-8c6b-d00df998a4de)"), skipping: failed to "StartContainer" for "coredns" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=coredns pod=coredns-fb8b8dccf-jnj5h_kube-system(2c0dffd5-4a4f-11eb-8c6b-d00df998a4de)" 5 Dec 30 11:30:41 master kubelet[20570]: E1230 11:30:41.375798 20570 pod_workers.go:190] Error syncing pod 2c0dffd5-4a4f-11eb-8c6b-d00df998a4de ("coredns-fb8b8dccf-jnj5h_kube-system(2c0dffd5-4a4f-11eb-8c6b-d00df998a4de)"), skipping: failed to "StartContainer" for "coredns" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=coredns pod=coredns-fb8b8dccf-jnj5h_kube-system(2c0dffd5-4a4f-11eb-8c6b-d00df998a4de)" 6 Dec 30 11:30:45 master kubelet[20570]: E1230 11:30:45.899200 20570 pod_workers.go:190] Error syncing pod 1ed96c42-4a4f-11eb-8c6b-d00df998a4de ("coredns-fb8b8dccf-24sxn_kube-system(1ed96c42-4a4f-11eb-8c6b-d00df998a4de)"), skipping: failed to "StartContainer" for "coredns" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=coredns pod=coredns-fb8b8dccf-24sxn_kube-system(1ed96c42-4a4f-11eb-8c6b-d00df998a4de)"
6-本地dns配置也没有什么问题
1 [root@i-F998A4DE ~]# cat /etc/resolv.conf
2 # Generated by NetworkManager
3 nameserver 114.114.114.114
4 nameserver 8.8.8.8
最后解决方案
这个问题很可能是由iptables规则的错乱或者缓存导致的,可以依次执行以下命令解决:
1 [root@master ~]# systemctl stop kubelet
2 [root@master ~]# systemctl stop docker
3 [root@master ~]# iptables --flush
4 [root@master ~]# iptables -tnat --flush
5 [root@master ~]# systemctl start kubelet
6 [root@master ~]# systemctl start docker
这里顺便解释一下,遇到问题的服务器是云平台服务器,需要添加iptables的规则才能远程通过ssh(22)、vnc(3389)、\\ip(445)使用这台服务器。如果在开了防火墙,并且没有通过firewall-cmd命令设置访问规则的情况下,清除iptables规则会导致远程连接不上。但是如果是使用firewall-cmd命令设置的端口,iptables设置的规则即使清除了,对服务器系统本身的防火墙也没有影响。从Centos7以后,iptables服务的启动脚本已被忽略,会使用firewalld来取代iptables服务。在RHEL7里,默认是使用firewalld来管理netfilter子系统,不过底层调用的命令仍然是iptables。firewalld是iptables的前端控制器,用于实现持久的网络流量规则。
这是第一篇博文,有误之处,欢迎指正!
参考:
原文地址:https://www.cnblogs.com/Enlencode/p/14213351.html
- Go并发编程基础(译)
- go-concurrent-programming.md
- Go语言并发模型:以并行处理MD5为例
- golang 使用json 包 实现序列化
- 【远古文章】用 Go 语言来看 Android! 出发, Android, 出发!
- Leaf 游戏服务器框架简介
- MongoDB 存储过程的使用以及性能调优方案
- [go语言]利用缓冲信道来实现网游帐号验证消息的分发和等待
- 【Golang语言社区--投稿专区】简单,好玩,有趣的命令行版12306(golang)
- 网游内存数据库的设计(1)
- 网游内存数据库的设计(2)
- Golang语言 上传文件
- Golang语言 监控文件变化小程序.
- Golang语言实现 tail 查看文本文件末行功能,类似于linux tail -n 100 功能
- JavaScript 教程
- JavaScript 编辑工具
- JavaScript 与HTML
- JavaScript 与Java
- JavaScript 数据结构
- JavaScript 基本数据类型
- JavaScript 特殊数据类型
- JavaScript 运算符
- JavaScript typeof 运算符
- JavaScript 表达式
- JavaScript 类型转换
- JavaScript 基本语法
- JavaScript 注释
- Javascript 基本处理流程
- Javascript 选择结构
- Javascript if 语句
- Javascript if 语句的嵌套
- Javascript switch 语句
- Javascript 循环结构
- Javascript 循环结构实例
- Javascript 跳转语句
- Javascript 控制语句总结
- Javascript 函数介绍
- Javascript 函数的定义
- Javascript 函数调用
- Javascript 几种特殊的函数
- JavaScript 内置函数简介
- Javascript eval() 函数
- Javascript isFinite() 函数
- Javascript isNaN() 函数
- parseInt() 与 parseFloat()
- escape() 与 unescape()
- Javascript 字符串介绍
- Javascript length属性
- javascript 字符串函数
- Javascript 日期对象简介
- Javascript 日期对象用途
- Date 对象属性和方法
- Javascript 数组是什么
- Javascript 创建数组
- Javascript 数组赋值与取值
- Javascript 数组属性和方法
- 解决android.support.v4.content.FileProvide找不到的问题
- 为Android系统添加config.xml 新配置的设置
- 浅析Android录屏 MediaRecorder
- 实用的网站开发工具导航源码,可以提高工作效率
- Github服务端和客户端完成本地代码上传至Github教程
- 一软在手截图无忧:ShareX截图神器-短小精悍功能完备 自动化任务可截动图截视频
- 使用SurfaceView实现视频弹幕
- 01 CentOS 7.6 切换系统语言
- Android双重SurfaceView实现弹幕效果
- SurfaceView播放视频发送弹幕并实现滚动歌词
- RecyclerView实现流式标签单选多选功能
- Android中AlertDialog四种对话框的最科学编写用法(实例代码)
- Android判断手机是否联网及自动跳转功能(收藏版)
- 使用Flutter实现一个走马灯布局的示例代码
- Android按钮美化样式的实现代码