Qingular

Cluster Lifecycle Management

·CKAk8s练习

Daily operations and management of Kubernetes clusters, including node maintenance, certificate management, kubeconfig configuration, and cluster addons management.

← Back to CKA Practice Index

Overview

Cluster lifecycle management covers daily operations such as node maintenance, certificate renewal, kubeconfig configuration management, and addons deployment. This is a key area of the CKA exam, involving the continuous operation and maintenance of the cluster.


1. Node Management

1.1 cordon / uncordon

Mark a node as unschedulable (SchedulingDisabled). Already-running Pods are not affected.

# Mark a node as unschedulable
kubectl cordon node-1

# Verify node status (STATUS shows Ready,SchedulingDisabled)
kubectl get nodes

# Restore a node to schedulable
kubectl uncordon node-1

# Check node scheduling status
kubectl describe node node-1 | grep "Taints"

1.2 drain -- Node Maintenance

Gracefully evict Pods from a node to other nodes and mark the node as unschedulable.

# Basic drain operation
kubectl drain node-1

# Ignore DaemonSet-managed Pods (commonly used)
kubectl drain node-1 --ignore-daemonsets

# Force eviction (even if there are unmanaged non-mirror Pods)
kubectl drain node-1 --ignore-daemonsets --force

# Delete Pods with local data (e.g., emptyDir)
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data

# Set eviction timeout (default 5 minutes)
kubectl drain node-1 --ignore-daemonsets --grace-period=120

# Complete node maintenance workflow
kubectl cordon node-1                    # 1. Prevent new Pod scheduling
kubectl drain node-1 --ignore-daemonsets # 2. Evict existing Pods
# ... Perform node maintenance ...
kubectl uncordon node-1                  # 3. Restore scheduling

1.3 Delete a Node

# First remove the node from the cluster
kubectl delete node worker-1

# Then reset on that node
sudo kubeadm reset -f
sudo rm -rf /etc/kubernetes/
sudo rm -rf /var/lib/kubelet/
sudo rm -rf /var/lib/etcd/

# Verify the node has been removed
kubectl get nodes

1.4 Node Labels and Taints

# Add a label
kubectl label node node-1 disktype=ssd

# Remove a label
kubectl label node node-1 disktype-

# Add a taint
kubectl taint node node-1 key=value:NoSchedule
kubectl taint node node-1 key=value:PreferNoSchedule
kubectl taint node node-1 key=value:NoExecute

# Remove a taint
kubectl taint node node-1 key:NoSchedule-

# View node taints
kubectl describe node node-1 | grep Taints

2. Certificate Management

2.1 View Certificate Information

# Check certificate expiration
kubeadm certs check-expiration

# Example output:
[check-expiration] Reading configuration from /etc/kubernetes/admin.conf
CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 May 25, 2027 18:27 UTC   364d                                    no
apiserver                  May 25, 2027 18:27 UTC   364d            ca                      no
apiserver-kubelet-client   May 25, 2027 18:27 UTC   364d            ca                      no
controller-manager.conf    May 25, 2027 18:27 UTC   364d                                    no
scheduler.conf             May 25, 2027 18:27 UTC   364d                                    no
front-proxy-client         May 25, 2027 18:27 UTC   364d            front-proxy-ca           no

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      May 23, 2035 18:22 UTC   9y              no
front-proxy-ca          May 23, 2035 18:22 UTC   9y              no

2.2 Certificate Renewal

# Renew all certificates
kubeadm certs renew all

# Renew specific certificates
kubeadm certs renew apiserver
kubeadm certs renew admin.conf
kubeadm certs renew etcd-healthcheck-client

# After certificate renewal, restart related components
# If using static Pods
docker run --rm -v /etc/kubernetes:/etc/kubernetes \
    k8s.gcr.io/kube-apiserver:v1.31.0 \
    kube-apiserver --help

# Simpler approach: restart kubelet (static Pods will restart automatically)
sudo systemctl restart kubelet

2.3 Manually Update kubeconfig

# After certificate renewal, update user kubeconfig
kubeadm kubeconfig user --client-name=my-user

# Update admin.conf
kubeadm kubeconfig admin

2.4 CSRs Management and Approval

# View pending certificate signing requests
kubectl get csr
kubectl get certificatesigningrequests

# View CSR details
kubectl describe csr <csr-name>

# Approve a CSR
kubectl certificate approve <csr-name>

# Deny a CSR
kubectl certificate deny <csr-name>

# Delete a completed CSR
kubectl delete csr <csr-name>

3. kubeconfig and Context Management

3.1 kubectl config Commands

# View current configuration
kubectl config view

# View full kubeconfig
kubectl config view --raw

# View current context
kubectl config current-context

# List all contexts
kubectl config get-contexts

# Set default context
kubectl config use-context <context-name>

# Create a new context
kubectl config set-context my-context \
    --cluster=my-cluster \
    --user=my-user \
    --namespace=my-namespace

# Modify the default namespace of a context
kubectl config set-context $(kubectl config current-context) --namespace=my-namespace

3.2 Multi-cluster Management

# Merge multiple kubeconfigs
export KUBECONFIG=/path/to/config1:/path/to/config2
kubectl config view --flatten > ~/.kube/config

# List merged contexts
kubectl config get-contexts

# Switch clusters
kubectl config use-context cluster-1
kubectl get nodes

kubectl config use-context cluster-2
kubectl get nodes

3.3 kubeconfig Structure Explanation

apiVersion: v1
kind: Config
# Cluster list
clusters:
- cluster:
    certificate-authority-data: <base64-ca-cert>
    server: https://192.168.1.10:6443
  name: kubernetes

# User list
users:
- name: admin
  user:
    client-certificate-data: <base64-cert>
    client-key-data: <base64-key>

# Context list
contexts:
- context:
    cluster: kubernetes
    user: admin
    namespace: default
  name: admin@kubernetes

# Currently active context
current-context: admin@kubernetes

3.4 Manually Create kubeconfig

# Create a kubeconfig for a ServiceAccount

# 1. Create ServiceAccount
kubectl create serviceaccount my-sa
kubectl create clusterrolebinding my-sa-binding --clusterrole=view --serviceaccount=default:my-sa

# 2. Get token
TOKEN=$(kubectl create token my-sa)

# 3. Configure kubeconfig
kubectl config set-credentials my-sa-user --token=$TOKEN
kubectl config set-cluster my-cluster --server=https://<api-server>:6443 --certificate-authority=/etc/kubernetes/pki/ca.crt --embed-certs=true
kubectl config set-context my-sa-context --cluster=my-cluster --user=my-sa-user
kubectl config use-context my-sa-context

4. Cluster Addons Management

4.1 CoreDNS

# Check CoreDNS status
kubectl get pods -n kube-system -l k8s-app=kube-dns

# View CoreDNS configuration
kubectl get configmap -n kube-system coredns -o yaml

# Modify CoreDNS configuration (e.g., custom DNS resolution)
kubectl edit configmap -n kube-system coredns

# View CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns

# Scale CoreDNS replicas
kubectl scale deployment -n kube-system coredns --replicas=3

# Test DNS resolution
kubectl run dns-test --image=busybox:1.28 --rm -it --restart=Never -- nslookup kubernetes.default

4.2 metrics-server

# Install metrics-server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify metrics-server is running
kubectl get pods -n kube-system -l k8s-app=metrics-server

# View metrics-server logs
kubectl logs -n kube-system -l k8s-app=metrics-server

# Use metrics-server to view resource usage
kubectl top nodes
kubectl top pods
kubectl top pods --all-namespaces
kubectl top pod <pod-name> -n <namespace>

4.3 Kubernetes Dashboard

# Install Dashboard
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml

# Create access user
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: admin-user
  namespace: kubernetes-dashboard
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: admin-user
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: admin-user
  namespace: kubernetes-dashboard
EOF

# Get access token
kubectl -n kubernetes-dashboard create token admin-user

# Start proxy
kubectl proxy
# Access: http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/

4.4 Installing and Uninstalling Addons

# Install required system addons
kubectl apply -f <manifest-url>

# Check addon running status
kubectl get pods --all-namespaces

# Update addons
kubectl apply -f <updated-manifest-url>

# Delete addons
kubectl delete -f <manifest-url>

5. Viewing Cluster Component Logs

# kubelet logs
sudo journalctl -u kubelet -f
sudo journalctl -u kubelet --since "5 min ago"
sudo journalctl -u kubelet -n 50 --no-pager

# Container runtime logs
sudo journalctl -u containerd -n 50 --no-pager
sudo journalctl -u crio -n 50 --no-pager

# API Server logs (static Pod)
kubectl logs -n kube-system kube-apiserver-<node-name>
kubectl logs -n kube-system kube-scheduler-<node-name>
kubectl logs -n kube-system kube-controller-manager-<node-name>
kubectl logs -n kube-system etcd-<node-name>

CKA Exam Key Points

  1. drain must include --ignore-daemonsets -- DaemonSet Pods cannot be evicted; omitting this flag will cause an error
  2. Restart kubelet after certificate renewal -- After running kubeadm certs renew all, kubelet must be restarted
  3. Configure context namespace -- kubectl config set-context --namespace=xxx is very practical in the exam
  4. apt-mark hold -- The exam prohibits automatic version upgrades of kubeadm/kubelet/kubectl
  5. Node maintenance trifecta: cordon -> drain -> uncordon

🧪 Complete Hands-on Example: Perform Maintenance on a Worker Node

Scenario Description

Perform planned maintenance on worker node worker-1: first mark it as unschedulable, gracefully evict all Pods, then restore scheduling after maintenance is complete.

Prerequisites

  • A running Kubernetes cluster (at least 1 control plane + 2 worker nodes)
  • Non-DaemonSet Pods running in the cluster, with other nodes having enough resources to accept the evicted Pods

Steps

Step 1: Cordon the node (mark as unschedulable)

kubectl cordon worker-1
# node/worker-1 cordoned

# Verify node status
kubectl get nodes
# NAME              STATUS                     ROLES           AGE   VERSION
# control-plane-1   Ready                      control-plane   1h    v1.31.0
# worker-1          Ready,SchedulingDisabled   <none>          1h    v1.31.0   <-- Note the status change
# worker-2          Ready                      <none>          1h    v1.31.0

Step 2: Drain the node (evict Pods)

# Evict all non-DaemonSet Pods
kubectl drain worker-1 --ignore-daemonsets --delete-emptydir-data
# node/worker-1 already cordoned
# evicting pod default/nginx-deploy-xxxxxxxxx-xxxxx
# evicting pod kube-system/metrics-server-xxxxxxxxx-xxxxx
# ...
# node/worker-1 drained

# If eviction gets stuck, set a grace period
kubectl drain worker-1 --ignore-daemonsets --grace-period=120 --force

Step 3: Perform node maintenance (simulated)

# At this point you can perform system upgrades, kernel updates, hardware replacements, etc. on worker-1
# Example: Check remaining Pods on the node (should only be DaemonSets)
kubectl get pods --field-selector spec.nodeName=worker-1 --all-namespaces
# NAMESPACE     NAME              READY   STATUS    RESTARTS   AGE
# kube-system   kube-proxy-xxxxx  1/1     Running   0          1h
# calico-system calico-node-xxxxx 1/1     Running   0          1h

# Perform maintenance operations (simulated)
# ssh worker-1 "sudo apt-get update && sudo apt-get upgrade -y linux-image"

Step 4: Uncordon the node (restore scheduling)

# Restore node scheduling after maintenance is complete
kubectl uncordon worker-1
# node/worker-1 uncordoned

# Verify node status
kubectl get nodes
# NAME              STATUS   ROLES           AGE   VERSION
# control-plane-1   Ready    control-plane   1h    v1.31.0
# worker-1          Ready    <none>          1h    v1.31.0   <-- SchedulingDisabled removed
# worker-2          Ready    <none>          1h    v1.31.0

Verification Results

# Verify new Pods have been scheduled to the restored node
kubectl get pods -o wide
# NAME                                READY   STATUS    RESTARTS   AGE   NODE
# nginx-deploy-xxxxxxxxx-xxxxx        1/1     Running   0          1m    worker-1  <-- New Pod scheduled back
# nginx-deploy-xxxxxxxxx-xxxxx       1/1     Running   0          5m    worker-2

Exam Tips

  • Drain must include --ignore-daemonsets -- DaemonSet Pods cannot be evicted; without this flag, drain will hang
  • Cordon does not affect existing Pods -- It only prevents new Pods from being scheduled to the node
  • Complete maintenance trifecta: cordon -> drain -> uncordon, order must not be wrong
  • If drain encounters cannot delete DaemonSet managed Pod error, add --ignore-daemonsets
  • If drain encounters Pod has local data (emptyDir) error, add --delete-emptydir-data
  • In the exam, after uncordon, check kubectl get nodes to confirm SchedulingDisabled has been removed

Official Documentation