Cluster Lifecycle Management
Daily operations and management of Kubernetes clusters, including node maintenance, certificate management, kubeconfig configuration, and cluster addons management.
Overview
Cluster lifecycle management covers daily operations such as node maintenance, certificate renewal, kubeconfig configuration management, and addons deployment. This is a key area of the CKA exam, involving the continuous operation and maintenance of the cluster.
1. Node Management
1.1 cordon / uncordon
Mark a node as unschedulable (SchedulingDisabled). Already-running Pods are not affected.
# Mark a node as unschedulable
kubectl cordon node-1
# Verify node status (STATUS shows Ready,SchedulingDisabled)
kubectl get nodes
# Restore a node to schedulable
kubectl uncordon node-1
# Check node scheduling status
kubectl describe node node-1 | grep "Taints"
1.2 drain -- Node Maintenance
Gracefully evict Pods from a node to other nodes and mark the node as unschedulable.
# Basic drain operation
kubectl drain node-1
# Ignore DaemonSet-managed Pods (commonly used)
kubectl drain node-1 --ignore-daemonsets
# Force eviction (even if there are unmanaged non-mirror Pods)
kubectl drain node-1 --ignore-daemonsets --force
# Delete Pods with local data (e.g., emptyDir)
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data
# Set eviction timeout (default 5 minutes)
kubectl drain node-1 --ignore-daemonsets --grace-period=120
# Complete node maintenance workflow
kubectl cordon node-1 # 1. Prevent new Pod scheduling
kubectl drain node-1 --ignore-daemonsets # 2. Evict existing Pods
# ... Perform node maintenance ...
kubectl uncordon node-1 # 3. Restore scheduling
1.3 Delete a Node
# First remove the node from the cluster
kubectl delete node worker-1
# Then reset on that node
sudo kubeadm reset -f
sudo rm -rf /etc/kubernetes/
sudo rm -rf /var/lib/kubelet/
sudo rm -rf /var/lib/etcd/
# Verify the node has been removed
kubectl get nodes
1.4 Node Labels and Taints
# Add a label
kubectl label node node-1 disktype=ssd
# Remove a label
kubectl label node node-1 disktype-
# Add a taint
kubectl taint node node-1 key=value:NoSchedule
kubectl taint node node-1 key=value:PreferNoSchedule
kubectl taint node node-1 key=value:NoExecute
# Remove a taint
kubectl taint node node-1 key:NoSchedule-
# View node taints
kubectl describe node node-1 | grep Taints
2. Certificate Management
2.1 View Certificate Information
# Check certificate expiration
kubeadm certs check-expiration
# Example output:
[check-expiration] Reading configuration from /etc/kubernetes/admin.conf
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf May 25, 2027 18:27 UTC 364d no
apiserver May 25, 2027 18:27 UTC 364d ca no
apiserver-kubelet-client May 25, 2027 18:27 UTC 364d ca no
controller-manager.conf May 25, 2027 18:27 UTC 364d no
scheduler.conf May 25, 2027 18:27 UTC 364d no
front-proxy-client May 25, 2027 18:27 UTC 364d front-proxy-ca no
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca May 23, 2035 18:22 UTC 9y no
front-proxy-ca May 23, 2035 18:22 UTC 9y no
2.2 Certificate Renewal
# Renew all certificates
kubeadm certs renew all
# Renew specific certificates
kubeadm certs renew apiserver
kubeadm certs renew admin.conf
kubeadm certs renew etcd-healthcheck-client
# After certificate renewal, restart related components
# If using static Pods
docker run --rm -v /etc/kubernetes:/etc/kubernetes \
k8s.gcr.io/kube-apiserver:v1.31.0 \
kube-apiserver --help
# Simpler approach: restart kubelet (static Pods will restart automatically)
sudo systemctl restart kubelet
2.3 Manually Update kubeconfig
# After certificate renewal, update user kubeconfig
kubeadm kubeconfig user --client-name=my-user
# Update admin.conf
kubeadm kubeconfig admin
2.4 CSRs Management and Approval
# View pending certificate signing requests
kubectl get csr
kubectl get certificatesigningrequests
# View CSR details
kubectl describe csr <csr-name>
# Approve a CSR
kubectl certificate approve <csr-name>
# Deny a CSR
kubectl certificate deny <csr-name>
# Delete a completed CSR
kubectl delete csr <csr-name>
3. kubeconfig and Context Management
3.1 kubectl config Commands
# View current configuration
kubectl config view
# View full kubeconfig
kubectl config view --raw
# View current context
kubectl config current-context
# List all contexts
kubectl config get-contexts
# Set default context
kubectl config use-context <context-name>
# Create a new context
kubectl config set-context my-context \
--cluster=my-cluster \
--user=my-user \
--namespace=my-namespace
# Modify the default namespace of a context
kubectl config set-context $(kubectl config current-context) --namespace=my-namespace
3.2 Multi-cluster Management
# Merge multiple kubeconfigs
export KUBECONFIG=/path/to/config1:/path/to/config2
kubectl config view --flatten > ~/.kube/config
# List merged contexts
kubectl config get-contexts
# Switch clusters
kubectl config use-context cluster-1
kubectl get nodes
kubectl config use-context cluster-2
kubectl get nodes
3.3 kubeconfig Structure Explanation
apiVersion: v1
kind: Config
# Cluster list
clusters:
- cluster:
certificate-authority-data: <base64-ca-cert>
server: https://192.168.1.10:6443
name: kubernetes
# User list
users:
- name: admin
user:
client-certificate-data: <base64-cert>
client-key-data: <base64-key>
# Context list
contexts:
- context:
cluster: kubernetes
user: admin
namespace: default
name: admin@kubernetes
# Currently active context
current-context: admin@kubernetes
3.4 Manually Create kubeconfig
# Create a kubeconfig for a ServiceAccount
# 1. Create ServiceAccount
kubectl create serviceaccount my-sa
kubectl create clusterrolebinding my-sa-binding --clusterrole=view --serviceaccount=default:my-sa
# 2. Get token
TOKEN=$(kubectl create token my-sa)
# 3. Configure kubeconfig
kubectl config set-credentials my-sa-user --token=$TOKEN
kubectl config set-cluster my-cluster --server=https://<api-server>:6443 --certificate-authority=/etc/kubernetes/pki/ca.crt --embed-certs=true
kubectl config set-context my-sa-context --cluster=my-cluster --user=my-sa-user
kubectl config use-context my-sa-context
4. Cluster Addons Management
4.1 CoreDNS
# Check CoreDNS status
kubectl get pods -n kube-system -l k8s-app=kube-dns
# View CoreDNS configuration
kubectl get configmap -n kube-system coredns -o yaml
# Modify CoreDNS configuration (e.g., custom DNS resolution)
kubectl edit configmap -n kube-system coredns
# View CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns
# Scale CoreDNS replicas
kubectl scale deployment -n kube-system coredns --replicas=3
# Test DNS resolution
kubectl run dns-test --image=busybox:1.28 --rm -it --restart=Never -- nslookup kubernetes.default
4.2 metrics-server
# Install metrics-server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Verify metrics-server is running
kubectl get pods -n kube-system -l k8s-app=metrics-server
# View metrics-server logs
kubectl logs -n kube-system -l k8s-app=metrics-server
# Use metrics-server to view resource usage
kubectl top nodes
kubectl top pods
kubectl top pods --all-namespaces
kubectl top pod <pod-name> -n <namespace>
4.3 Kubernetes Dashboard
# Install Dashboard
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml
# Create access user
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kubernetes-dashboard
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kubernetes-dashboard
EOF
# Get access token
kubectl -n kubernetes-dashboard create token admin-user
# Start proxy
kubectl proxy
# Access: http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/
4.4 Installing and Uninstalling Addons
# Install required system addons
kubectl apply -f <manifest-url>
# Check addon running status
kubectl get pods --all-namespaces
# Update addons
kubectl apply -f <updated-manifest-url>
# Delete addons
kubectl delete -f <manifest-url>
5. Viewing Cluster Component Logs
# kubelet logs
sudo journalctl -u kubelet -f
sudo journalctl -u kubelet --since "5 min ago"
sudo journalctl -u kubelet -n 50 --no-pager
# Container runtime logs
sudo journalctl -u containerd -n 50 --no-pager
sudo journalctl -u crio -n 50 --no-pager
# API Server logs (static Pod)
kubectl logs -n kube-system kube-apiserver-<node-name>
kubectl logs -n kube-system kube-scheduler-<node-name>
kubectl logs -n kube-system kube-controller-manager-<node-name>
kubectl logs -n kube-system etcd-<node-name>
CKA Exam Key Points
- drain must include
--ignore-daemonsets-- DaemonSet Pods cannot be evicted; omitting this flag will cause an error - Restart kubelet after certificate renewal -- After running
kubeadm certs renew all, kubelet must be restarted - Configure context namespace --
kubectl config set-context --namespace=xxxis very practical in the exam - apt-mark hold -- The exam prohibits automatic version upgrades of kubeadm/kubelet/kubectl
- Node maintenance trifecta:
cordon -> drain -> uncordon
🧪 Complete Hands-on Example: Perform Maintenance on a Worker Node
Scenario Description
Perform planned maintenance on worker node worker-1: first mark it as unschedulable, gracefully evict all Pods, then restore scheduling after maintenance is complete.
Prerequisites
- A running Kubernetes cluster (at least 1 control plane + 2 worker nodes)
- Non-DaemonSet Pods running in the cluster, with other nodes having enough resources to accept the evicted Pods
Steps
Step 1: Cordon the node (mark as unschedulable)
kubectl cordon worker-1
# node/worker-1 cordoned
# Verify node status
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# control-plane-1 Ready control-plane 1h v1.31.0
# worker-1 Ready,SchedulingDisabled <none> 1h v1.31.0 <-- Note the status change
# worker-2 Ready <none> 1h v1.31.0
Step 2: Drain the node (evict Pods)
# Evict all non-DaemonSet Pods
kubectl drain worker-1 --ignore-daemonsets --delete-emptydir-data
# node/worker-1 already cordoned
# evicting pod default/nginx-deploy-xxxxxxxxx-xxxxx
# evicting pod kube-system/metrics-server-xxxxxxxxx-xxxxx
# ...
# node/worker-1 drained
# If eviction gets stuck, set a grace period
kubectl drain worker-1 --ignore-daemonsets --grace-period=120 --force
Step 3: Perform node maintenance (simulated)
# At this point you can perform system upgrades, kernel updates, hardware replacements, etc. on worker-1
# Example: Check remaining Pods on the node (should only be DaemonSets)
kubectl get pods --field-selector spec.nodeName=worker-1 --all-namespaces
# NAMESPACE NAME READY STATUS RESTARTS AGE
# kube-system kube-proxy-xxxxx 1/1 Running 0 1h
# calico-system calico-node-xxxxx 1/1 Running 0 1h
# Perform maintenance operations (simulated)
# ssh worker-1 "sudo apt-get update && sudo apt-get upgrade -y linux-image"
Step 4: Uncordon the node (restore scheduling)
# Restore node scheduling after maintenance is complete
kubectl uncordon worker-1
# node/worker-1 uncordoned
# Verify node status
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# control-plane-1 Ready control-plane 1h v1.31.0
# worker-1 Ready <none> 1h v1.31.0 <-- SchedulingDisabled removed
# worker-2 Ready <none> 1h v1.31.0
Verification Results
# Verify new Pods have been scheduled to the restored node
kubectl get pods -o wide
# NAME READY STATUS RESTARTS AGE NODE
# nginx-deploy-xxxxxxxxx-xxxxx 1/1 Running 0 1m worker-1 <-- New Pod scheduled back
# nginx-deploy-xxxxxxxxx-xxxxx 1/1 Running 0 5m worker-2
Exam Tips
- Drain must include
--ignore-daemonsets-- DaemonSet Pods cannot be evicted; without this flag, drain will hang - Cordon does not affect existing Pods -- It only prevents new Pods from being scheduled to the node
- Complete maintenance trifecta:
cordon->drain->uncordon, order must not be wrong - If drain encounters
cannot delete DaemonSet managed Poderror, add--ignore-daemonsets - If drain encounters
Pod has local data (emptyDir)error, add--delete-emptydir-data - In the exam, after uncordon, check
kubectl get nodesto confirmSchedulingDisabledhas been removed