To get an immediate view of the state of MachineDeployments and Nodes, run:
kubectl -n kube-system get machinedeployment,machineset,machine,no -o wide
NAME AGE DELETED REPLICAS AVAILABLEREPLICAS PROVIDER OS VERSION
machinedeployment.cluster.k8s.io/worker 59d 2 2 openstack ubuntu 1.30.1
NAME AGE DELETED REPLICAS AVAILABLEREPLICAS MACHINEDEPLOYMENT PROVIDER OS VERSION
machineset.cluster.k8s.io/worker-677cf94d4d 8d 0 worker openstack ubuntu 1.30.1
machineset.cluster.k8s.io/worker-77c8c559d6 8d 2 2 worker openstack ubuntu 1.30.1
NAME AGE DELETED MACHINESET ADDRESS NODE PROVIDER OS VERSION
machine.cluster.k8s.io/worker-77c8c559d6-mknks 8d worker-77c8c559d6 192.168.1.20 worker-77c8c559d6-mknks openstack ubuntu 1.30.1
machine.cluster.k8s.io/worker-77c8c559d6-rfd8r 8d worker-77c8c559d6 192.168.1.9 worker-77c8c559d6-rfd8r openstack ubuntu 1.30.1
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node/worker-77c8c559d6-mknks Ready <none> 8d v1.30.1 192.168.1.20 Ubuntu 22.04.4 LTS 5.15.0-107-generic containerd://1.6.33
node/worker-77c8c559d6-rfd8r Ready <none> 8d v1.30.1 192.168.1.9 Ubuntu 22.04.4 LTS 5.15.0-107-generic containerd://1.6.33
Note:
To inspect a specific Machine, describe it or check its Events:
kubectl -n kube-system events --for machine/worker-77c8c559d6-wvlcg 08/09 17:52
LAST SEEN TYPE REASON OBJECT MESSAGE
9m58s Normal Created Machine/worker-77c8c559d6-wvlcg Successfully created instance
7m26s (x5 over 9m53s) Normal InstanceFound Machine/worker-77c8c559d6-wvlcg Found instance at cloud provider, status: running
6m57s (x2 over 6m59s) Normal LabelsAnnotationsUpdated Machine/worker-77c8c559d6-wvlcg Successfully updated labels/annotations
To see the status of different Node conditions, run:
kubectl describe node $node
To list all Pods scheduled on a particular Node, run:
kubectl get pods --all-namespaces --field-selector spec.nodeName=$node
In case the Node isn't responsive, you may choose to force the immediate deletion of all Pods on a Node:
When using the --force
flag, Kubernetes does wait until the Pods and their containers are terminated.
The applications may continue to run!
Under normal circumstances you should rely on Kubelet to gracefully tear down the Pods.
kubectl delete pods --all-namespaces --field-selector spec.nodeName=$node --force
Prerequisites
Adding an SSH key after a Machine is provisioned is only possible if Kubelet on the Node is running and healthy.
Get public IP of Node
IP=$(kubectl get node $node -o jsonpath='{.status.addresses[?(@.type == "ExternalIP")].address}')
echo $IP
Establish SSH session
ssh ubuntu@$IP
Do at your own risk
The Pod has full access to the Node!
Make sure to verify the integrity of the tooling and container image that is used!
Prerequisites
By creating a Pod with a privileged container that's sharing the PID namespace of the host, you can switch to the Kernel namespaces to the init process.
To do this, you can use a tool like node-shell.
Prerequisites
Shell access on a Node
See SSH or node shell.
To tail the logs of Kubelet, run:
journalctl -exu kubelet -f
It's possible to get some output of the initialization process of a Node.
These logs may contain valuable information on how far the initialization has progressed or surface potentially errors (e.g. DNS).
To show the logs of a Node using the OpenStack CLI:
openstack console log show $node
The most important three phases during Node provisioning are:
Server creation
To verify this:
Query server with OpenStack CLI
openstack server show $node
Node initialization
Issues during this phase can be investigated by inspecting Kubelet logs or the OpenStack console output.
Initialize Node daemons
By this point the Node has already been registered with the Kubernetes cluster.
Check:
Get their logs
To get the logs of e.g. the Canal Pod running on the particular Node, run:
kubectl -n kube-system logs -l k8s-app=canal --field-selector spec.nodeName="$node"
The following steps help to determine why a Node isn't being deleted:
Check if the corresponding Machine has a deletion timestamp (DELETED
column):
kubectl -n kube-system get machine worker-77c8c559d6-rfd8r
NAME AGE DELETED MACHINESET ADDRESS NODE PROVIDER OS VERSION
worker-77c8c559d6-rfd8r 8d 91s worker-77c8c559d6 192.168.1.9 worker-77c8c559d6-rfd8r openstack ubuntu 1.30.1
Check if the Node cannot be drained
MetaKube will drain the Node, meaning it's evicting the Pods running on the Node (besides exceptions like DaemonSet Pods).
The eviction API attempts to gracefully delete Pods.
A Pod may be forbidden to be evicted, e.g. if a matching PodDisruptionBudget doesn't allow any more disruptions.
You can safely try draining the Node yourself:
kubectl drain --ignore-daemonsets --delete-emptydir-data $node
It may tell you that some Pods are not safe to evict and why.
To get a list of Pods running on a Node, see above:
Another reason why a Node cannot be drained is because the Pods don't leave the Terminating
state.
This may be because of an unresponsive Kubelet.
Server can't be deleted
If the Node is fully drained, but it still remains in the cluster, there may be issues with deleting the cloud server.
In that case, check the Machine Events for errors.
For issues related to autoscaling see here.