High Availability Cluster

2026-05-27·CKA k8s 练习

Kubernetes high availability cluster architecture design, including etcd topology selection, multi-control-plane node deployment, and load balancer configuration.

← Back to CKA Practice Index

Overview

High Availability (HA) clusters eliminate single points of failure through redundant control plane components, ensuring the cluster continues to function when some nodes fail. The CKA exam requires understanding the basic concepts of HA architecture and kubeadm configuration methods.

1. HA Topology Architecture

1.1 Stacked etcd Topology

                    ┌─────────────────────┐
                    │   Load Balancer      │
                    │   (HAProxy/Nginx)    │
                    │   192.168.1.100      │
                    │   port 6443          │
                    └──────┬──────────────┘
                           │
            ┌──────────────┼──────────────┐
            │              │              │
    ┌───────┴───────┐ ┌───┴────────┐ ┌───┴────────┐
    │ CP-1          │ │ CP-2       │ │ CP-3       │
    │ API Server    │ │ API Server │ │ API Server  │
    │ Scheduler     │ │ Scheduler  │ │ Scheduler   │
    │ Controller-Mgr│ │Controller-Mgr│Controller-Mgr│
    │ etcd (member) │ │ etcd (member)│ etcd (member)│
    └───────────────┘ └────────────┘ └────────────┘

Characteristics:

etcd runs on the same nodes as the control plane
Fewer nodes, lower cost
Requires at least 3 control plane nodes (odd number)
etcd failure affects the control plane on that node

1.2 External etcd Topology

                    ┌─────────────────────┐
                    │   Load Balancer      │
                    │   (HAProxy/Nginx)    │
                    │   192.168.1.100      │
                    │   port 6443          │
                    └──────┬──────────────┘
                           │
            ┌──────────────┼──────────────┐
            │              │              │
    ┌───────┴───────┐ ┌───┴────────┐ ┌───┴────────┐
    │ CP-1          │ │ CP-2       │ │ CP-3       │
    │ API Server    │ │ API Server │ │ API Server  │
    │ Scheduler     │ │ Scheduler  │ │ Scheduler   │
    │ Controller-Mgr│ │Controller-Mgr│Controller-Mgr│
    └───────────────┘ └────────────┘ └────────────┘
            │              │              │
            └──────────────┼──────────────┘
                           │
                    ┌──────┴──────┐
                    │  etcd nodes │
                    │  3 or 5     │
                    │  dedicated  │
                    └─────────────┘

Characteristics:

etcd runs on dedicated nodes
Control plane and etcd failures are isolated
Requires more nodes, higher cost
Suitable for large-scale production clusters

1.3 Topology Comparison

Feature	Stacked etcd	External etcd
Node count	3+ (control plane)	3+ (control plane) + 3 (etcd)
Cost	Lower	Higher
Fault isolation	etcd failure affects same-node control plane	Fully isolated
Performance	etcd and control plane contend for resources	Dedicated resources
Operational complexity	Lower	Higher
Use case	Small to medium clusters	Large production clusters

2. Multi-Control-Plane Node Setup

2.1 Initializing the First Control Plane

# On the first control plane node
sudo kubeadm init --config=kubeadm-config.yaml

# kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta4
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: "192.168.1.10"
  bindPort: 6443
---
apiVersion: kubeadm.k8s.io/v1beta4
kind: ClusterConfiguration
kubernetesVersion: "v1.31.0"
controlPlaneEndpoint: "192.168.1.100:6443"  # Load balancer address
networking:
  serviceSubnet: "10.96.0.0/12"
  podSubnet: "10.244.0.0/16"
  dnsDomain: "cluster.local"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: "systemd"

# Generate certificate upload key during init (for other control planes to join)
kubeadm init --config=kubeadm-config.yaml --upload-certs

# The output will include:
# - kubeadm join command (including token)
# - --certificate-key (for control plane nodes to join)

2.2 Adding Additional Control Plane Nodes

# On the second control plane node
sudo kubeadm join 192.168.1.100:6443 --token <token> \
    --discovery-token-ca-cert-hash sha256:<hash> \
    --control-plane \
    --certificate-key <certificate-key>

# On the third control plane node
sudo kubeadm join 192.168.1.100:6443 --token <token> \
    --discovery-token-ca-cert-hash sha256:<hash> \
    --control-plane \
    --certificate-key <certificate-key>

2.3 Verifying the HA Cluster

# View all nodes
kubectl get nodes

# View control plane components
kubectl get pods -n kube-system | grep -E "kube-apiserver|kube-scheduler|kube-controller"

# View etcd members
kubectl exec -n kube-system etcd-cp-1 -- etcdctl member list

# Verify API Server high availability (access via load balancer address)
curl -k https://192.168.1.100:6443/version

# Simulate a control plane node failure
# Shut down kubelet on one control plane
# Verify the cluster is still available
kubectl get nodes

3. API Server Load Balancer Configuration

3.1 HAProxy Configuration

# Install HAProxy
sudo apt install -y haproxy

# /etc/haproxy/haproxy.cfg
frontend kubernetes-frontend
    bind *:6443
    mode tcp
    option tcplog
    default_backend kubernetes-backend

backend kubernetes-backend
    mode tcp
    option tcp-check
    balance roundrobin
    server cp-1 192.168.1.10:6443 check fall 3 rise 2
    server cp-2 192.168.1.11:6443 check fall 3 rise 2
    server cp-3 192.168.1.12:6443 check fall 3 rise 2

# Restart HAProxy
sudo systemctl restart haproxy
sudo systemctl enable haproxy

# Verify load balancing
curl -k https://192.168.1.100:6443/version

3.2 Nginx Load Balancer

# /etc/nginx/nginx.conf
stream {
    upstream kubernetes {
        server 192.168.1.10:6443;
        server 192.168.1.11:6443;
        server 192.168.1.12:6443;
    }

    server {
        listen 6443;
        proxy_pass kubernetes;
        proxy_connect_timeout 1s;
        proxy_timeout 3s;
    }
}

# Restart Nginx
sudo systemctl restart nginx
sudo systemctl enable nginx

3.3 keepalived (Virtual IP)

# Install keepalived
sudo apt install -y keepalived

# /etc/keepalived/keepalived.conf
vrrp_instance VI_1 {
    state MASTER           # Set to MASTER on the primary node, BACKUP on standby
    interface eth0
    virtual_router_id 51
    priority 100           # Primary 100, standby 90
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1234
    }
    virtual_ipaddress {
        192.168.1.100/24   # Virtual IP
    }
}

# Start keepalived
sudo systemctl restart keepalived
sudo systemctl enable keepalived

# Verify virtual IP
ip addr show | grep 192.168.1.100

4. Control Plane Component Redundancy

4.1 API Server Redundancy

The API Server is stateless; multiple instances provide service externally through a load balancer.

# /etc/kubernetes/manifests/kube-apiserver.yaml
spec:
  containers:
  - command:
    - kube-apiserver
    - --advertise-address=192.168.1.10
    - --etcd-servers=https://192.168.1.10:2379,https://192.168.1.11:2379,https://192.168.1.12:2379
    # Note: etcd-servers points to all etcd nodes

4.2 Scheduler Redundancy

The Scheduler achieves high availability through leader election; only one instance performs scheduling at a time.

# /etc/kubernetes/manifests/kube-scheduler.yaml
spec:
  containers:
  - command:
    - kube-scheduler
    - --leader-elect=true
    - --leader-elect-lease-duration=15s
    - --leader-elect-renew-deadline=10s
    - --leader-elect-retry-period=2s

4.3 Controller Manager Redundancy

Also uses leader election; only one instance executes controller logic at a time.

# /etc/kubernetes/manifests/kube-controller-manager.yaml
spec:
  containers:
  - command:
    - kube-controller-manager
    - --leader-elect=true
    - --leader-elect-lease-duration=15s
    - --leader-elect-renew-deadline=10s
    - --leader-elect-retry-period=2s

4.4 Viewing Leader Election Status

# Check who is the current scheduler leader
kubectl get endpoints -n kube-system kube-scheduler -o yaml

# Check who is the current controller-manager leader
kubectl get endpoints -n kube-system kube-controller-manager -o yaml

# Output example (annotations show the current leader):
# annotations:
#   control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"cp-1","leaseDurationSeconds":15,...}'

5. Complete kubeadm HA Cluster Configuration Example

5.1 Configuration File Initialization

# ha-cluster-config.yaml
apiVersion: kubeadm.k8s.io/v1beta4
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: "192.168.1.10"
  bindPort: 6443
nodeRegistration:
  criSocket: "unix:///var/run/containerd/containerd.sock"
  name: "control-plane-1"
---
apiVersion: kubeadm.k8s.io/v1beta4
kind: ClusterConfiguration
kubernetesVersion: "v1.31.0"
controlPlaneEndpoint: "192.168.1.100:6443"
apiServer:
  certSANs:
  - "192.168.1.100"    # Load balancer IP
  - "loadbalancer.example.com"  # Load balancer domain
  - "127.0.0.1"
controllerManager: {}
scheduler: {}
networking:
  serviceSubnet: "10.96.0.0/12"
  podSubnet: "10.244.0.0/16"
  dnsDomain: "cluster.local"
etcd:
  local:
    dataDir: "/var/lib/etcd"
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "iptables"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: "systemd"

6. Verifying HA Functionality

# 1. Verify all control plane nodes are running
kubectl get nodes -o wide

# 2. Verify etcd cluster health
kubectl exec -n kube-system etcd-cp-1 -- etcdctl \
    --cacert=/etc/kubernetes/pki/etcd/ca.crt \
    --cert=/etc/kubernetes/pki/etcd/server.crt \
    --key=/etc/kubernetes/pki/etcd/server.key \
    endpoint health --cluster

# 3. Verify API Server load balancing
for i in {1..10}; do
    curl -sk https://192.168.1.100:6443/version | jq .serverVersion
done

# 4. Simulate a failure
# Stop kubelet on one control plane node
# Verify the cluster can still operate normally
kubectl get pods --all-namespaces

CKA Exam Key Points

controlPlaneEndpoint must be configured as the load balancer address -- not a single node IP
--upload-certs -- Used to share certificates with other control plane nodes
etcd requires an odd number of nodes (3 or 5) -- ensures the etcd cluster can elect a leader
Scheduler and Controller Manager have leader election enabled by default -- no manual configuration needed
HA clusters require at least 3 control plane nodes -- 2 nodes cannot tolerate a single point of failure (etcd requires majority)

🧪 Complete Hands-on Example: Configuring High Availability Control Plane (Stacked etcd)

Scenario Description

With an existing first control plane node (control-plane-1), add a second control plane node (control-plane-2) to form a stacked etcd high availability architecture, and access the cluster via the load balancer address.

Prerequisites

An initialized control plane node (control-plane-1) already exists
The second control plane node (control-plane-2) has completed infrastructure setup
At least one load balancer (e.g., HAProxy) address 192.168.1.100:6443 is configured
Network connectivity between the two control plane nodes

Steps

Step 1: Verify the first control plane node

# On control-plane-1
kubectl get nodes
# NAME               STATUS   ROLES           AGE   VERSION
# control-plane-1    Ready    control-plane   1h    v1.31.0

kubectl get pods -n kube-system | grep -E "kube-apiserver|etcd"
# etcd-control-plane-1                      1/1     Running   0   1h
# kube-apiserver-control-plane-1            1/1     Running   0   1h

Step 2: Create kubeadm-config.yaml on the first control plane

# kubeadm-config.yaml (on control-plane-1)
apiVersion: kubeadm.k8s.io/v1beta4
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: "192.168.1.10"
  bindPort: 6443
nodeRegistration:
  criSocket: "unix:///var/run/containerd/containerd.sock"
  name: "control-plane-1"
---
apiVersion: kubeadm.k8s.io/v1beta4
kind: ClusterConfiguration
kubernetesVersion: "v1.31.0"
controlPlaneEndpoint: "192.168.1.100:6443"
apiServer:
  certSANs:
  - "192.168.1.100"
  - "control-plane-1"
  - "control-plane-2"
networking:
  serviceSubnet: "10.96.0.0/12"
  podSubnet: "10.244.0.0/16"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: "systemd"

Step 3: Initialize the first control plane (with --upload-certs)

sudo kubeadm init --config=kubeadm-config.yaml --upload-certs

# The output will include:
# You can now join any number of control-plane nodes by running the following on each as root:
#   kubeadm join 192.168.1.100:6443 --token <token> \
#     --discovery-token-ca-cert-hash sha256:<hash> \
#     --control-plane --certificate-key <certificate-key>
#
# Save the token, hash, and certificate-key!

Step 4: Join the second control plane node to the cluster

# Run on control-plane-2 (get values from Step 3 output)
sudo kubeadm join 192.168.1.100:6443 --token <token> \
    --discovery-token-ca-cert-hash sha256:<hash> \
    --control-plane \
    --certificate-key <certificate-key>

# Output example:
# This node has joined the cluster and a new control plane instance was created
# ...
# To start using your cluster, you need to run the following as a regular user:
#   mkdir -p $HOME/.kube
#   sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
#   sudo chown $(id -u):$(id -g) $HOME/.kube/config

Step 5: Verify both control plane nodes

# On any control plane node
kubectl get nodes
# NAME               STATUS   ROLES           AGE   VERSION
# control-plane-1    Ready    control-plane   1h    v1.31.0
# control-plane-2    Ready    control-plane   5m    v1.31.0

# Verify etcd cluster members
kubectl exec -n kube-system etcd-control-plane-1 -- etcdctl \
    --cacert=/etc/kubernetes/pki/etcd/ca.crt \
    --cert=/etc/kubernetes/pki/etcd/server.crt \
    --key=/etc/kubernetes/pki/etcd/server.key \
    member list
# 8e9e05c52164694d, started, control-plane-1, https://192.168.1.10:2380, https://192.168.1.10:2379
# 6a4d1c8352a47abd, started, control-plane-2, https://192.168.1.11:2380, https://192.168.1.11:2379

Verification

# Access the cluster via the load balancer address
curl -k https://192.168.1.100:6443/version
# {
#   "major": "1",
#   "minor": "31",
#   ...
# }

# Verify control plane component high availability
kubectl get pods -n kube-system | grep -E "kube-apiserver|kube-scheduler|kube-controller"
# kube-apiserver-control-plane-1            1/1     Running   0   1h
# kube-apiserver-control-plane-2            1/1     Running   0   5m
# ...

Exam Tips

controlPlaneEndpoint must be set to the load balancer address, not a single control plane node IP
--upload-certs is used to securely share control plane certificates with other control plane nodes
etcd requires an odd number of nodes (3 or 5) to elect a leader
When joining control plane nodes, the --control-plane and --certificate-key parameters must be used
If the token expires, use kubeadm token create --print-join-command to regenerate