28 February 2018

Debugging your kubernetes cluster

 if you are a developer or an operation guy working on kubernetes is part of your BAU if you are working on todays' cutting edge tech-stack. So it is always better if you have a test setup of a cluster to perform all the r&d on it before doing it at enterprise level.

if you have not exeperienced setting up your own cluster yet, here is another article which will walk you through the process of launching up your own k8s-cluster in different fashion.    

there are possible failures one can encounter in their k8s cluster so here are the basic approach person can follow to debug into failures.
 1. if node is going into notReady state as below

 $ kubectl get nodes
   NAME          STATUS   ROLES    AGE    VERSION
   controlPlane  Ready    master   192d   v1.12.10+1.0.15.el7
   worker1       Ready    <none>   192d   v1.12.10+1.0.14.el7
   worker2       NotReady <none>   192d   v1.12.10+1.0.14.el7

 # check if any of your cluster component is un-healthy or failed
 $ kubectl get pods -n kube-system
   NAME                                     READY   STATUS    RESTARTS   AGE
calico-kube-controllers-bc8f7d57-p5vhk   1/1     Running   1          192d
calico-node-9xtfr                        1/1     NodeLost   1          192d
calico-node-tpjz9 1/1 Running 1 192d calico-node-vh766 1/1 Running 1 192d coredns-bb49df795-9fn9g 1/1 Running 1 192d coredns-bb49df795-qq6cm 1/1 Running 1 192d etcd-bld09758002 1/1 Running 1 192d kube-apiserver-bld09758002 1/1 Running 1 192d kube-controller-manager-bld09758002 1/1 Running 1 192d kube-proxy-57n8h 1/1 NodeLost 1 192d kube-proxy-gvbkh 1/1 Running 1 192d kube-proxy-tzknm 1/1 Running 1 192d kube-scheduler-bld09758002 1/1 Running 1 192d

 # describe node to check what is causing issue
 $ kubectl describe node nodeName
 
 # ssh to node and ensure kubelet/docker services are running
 $ sudo systemctl status kubelet
 $ sudo systemctl status docker

 # troubleshoot services if not-running in depth
 $ sudo journalctl -u kubelet

 # if services are getting failed frequently try to reset daemon
 $ sudo systemctl daemon-reload



happy troubleshooting..

No comments:

Post a comment