Qingular

Scaling and HPA

·CKAk8s练习

Manual scaling, HorizontalPodAutoscaler auto-scaling, metrics-server configuration

← Back to CKA Practice Index

Overview

The CKA exam requires mastering manual scaling of Deployments and configuring HPA auto-scaling. HPA relies on metrics-server to provide Pod resource metrics.


1. Manual Scaling

1.1 kubectl scale

kubectl scale deployment/nginx --replicas=5
kubectl scale deployment nginx --replicas=3

# Scale ReplicaSet
kubectl scale rs/web-rs --replicas=4

# Scale StatefulSet
kubectl scale sts/web --replicas=5

# Check current replica count
kubectl get deployment nginx
kubectl get rs
kubectl get pods

# Conditional scale-down (--current-replicas validates current replica count)
kubectl scale deployment/nginx --current-replicas=5 --replicas=3
# If the current count is not 5, the command is not executed

1.2 Scale a Specific ReplicaSet Revision (Scaling After Rollback)

# View revision history
kubectl rollout history deployment nginx

# Roll back to revision 2
kubectl rollout undo deployment nginx --to-revision=2

# Scale up
kubectl scale deployment nginx --replicas=5

2. HorizontalPodAutoscaler (HPA)

HPA automatically adjusts the replica count of Deployments / StatefulSets based on CPU/memory utilization.

2.1 Prerequisites: metrics-server

# Install metrics-server (usually pre-installed in the exam environment)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify installation
kubectl get pods -n kube-system | grep metrics-server

# View node and Pod metrics
kubectl top nodes
kubectl top pods

# If the top command returns "metrics not available yet", wait for metrics-server to collect data (about 30s)

2.2 Create HPA

Method 1: Imperative (kubectl autoscale)

# Create HPA: scale up when CPU usage exceeds 50%, max 10 replicas, min 2
kubectl autoscale deployment nginx --cpu-percent=50 --min=2 --max=10

# Generate HPA YAML
kubectl autoscale deployment nginx --cpu-percent=50 --min=2 --max=10 --dry-run=client -o yaml

Method 2: Declarative YAML

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

2.3 View HPA Status

# View HPA
kubectl get hpa
kubectl get horizontalpodautoscaler

# View HPA details
kubectl describe hpa nginx-hpa

# View HPA YAML
kubectl get hpa nginx-hpa -o yaml

# Monitor HPA
kubectl get hpa -w

2.4 Example HPA Scaling Behavior Output

NAME         REFERENCE           TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
nginx-hpa    Deployment/nginx    30%/50%   2         10        2          5m
nginx-hpa    Deployment/nginx    70%/50%   2         10        4          6m
nginx-hpa    Deployment/nginx    45%/50%   2         10        4          7m

2.5 Generate Load to Test Scaling

# Start a Pod that generates CPU load
kubectl run load-generator --image=busybox --restart=Never -- /bin/sh -c "while true; do wget -q -O- http://nginx-service; done"

# Or use
kubectl run -i --tty load-generator --image=busybox --restart=Never -- sh -c "while true; do wget -q -O- http://nginx:80; done"

# Check if HPA is scaling
kubectl get hpa -w

# Delete the load generator after testing
kubectl delete pod load-generator

3. Advanced HPA Configuration

3.1 Custom Metrics (autoscaling/v2)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  - type: Pods
    pods:
      metric:
        name: requests-per-second
      target:
        type: AverageValue
        averageValue: 1000
  behavior:                           # Scaling behavior control
    scaleDown:
      stabilizationWindowSeconds: 300 # Scale-down stabilization window (default 5 minutes)
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
    scaleUp:
      stabilizationWindowSeconds: 0   # No waiting needed for scale-up
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max

3.2 Behavior Policy Explanation

PolicyDescription
stabilizationWindowSecondsStabilization window, prevents frequent scaling (flapping)
scaleDown.policiesScale-down policy: max percentage/count per second
scaleUp.policiesScale-up policy: max percentage/count per second
selectPolicy: Max/Min/DisabledSelection policy: Max / Min / Disabled

4. metrics-server

4.1 Installation

# Quick install
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# If it doesn't work after installation, you may need to modify parameters
kubectl edit deployment metrics-server -n kube-system
# Add to spec.containers[0].args:
# - --kubelet-insecure-tls
# - --kubelet-preferred-address-types=InternalIP

4.2 Verification

# Wait for Pod to be ready
kubectl wait --namespace kube-system --for=condition=ready pod -l k8s-app=metrics-server --timeout=120s

# Test metric collection
kubectl top nodes
kubectl top pods

# If no data for a long time, check metrics-server logs
kubectl logs -n kube-system -l k8s-app=metrics-server

5. Useful Exam Commands

# 1. Quickly create a Deployment with resources (HPA requires Pods to have CPU requests)
kubectl create deployment nginx --image=nginx --dry-run=client -o yaml > nginx.yaml
# Edit to add resources.requests.cpu

# 2. Apply resource settings
vim nginx.yaml
kubectl apply -f nginx.yaml

# 3. Create HPA
kubectl autoscale deployment nginx --cpu-percent=50 --min=1 --max=5

# 4. Verify
kubectl get hpa
kubectl get pods -w

# 5. Create CronJob for scheduled scaling (not HPA but useful for the exam)
kubectl create cronjob scale-up --image=bitnami/kubectl --schedule="0 8 * * 1-5" -- kubectl scale deployment nginx --replicas=10
kubectl create cronjob scale-down --image=bitnami/kubectl --schedule="0 18 * * 1-5" -- kubectl scale deployment nginx --replicas=2

🧪 Complete Hands-on Example: Manual Scaling + Configure HPA Auto-scaling

Scenario

First manually scale a Deployment, then configure CPU-based HPA auto-scaling.

Prerequisites

  • A working Kubernetes cluster (minikube or kind recommended)
  • kubectl is configured to connect to the cluster
  • Pods in the Deployment must have CPU requests set (required by HPA)

Steps

Step 1: Create a Deployment with Resource Requests

cat <<'EOF' > nginx-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        resources:
          requests:
            cpu: "200m"
            memory: "128Mi"
        ports:
        - containerPort: 80
EOF

kubectl apply -f nginx-deploy.yaml
# Expected output: deployment.apps/nginx created

kubectl get deployment nginx
# Expected output: NAME    READY   UP-TO-DATE   AVAILABLE   AGE
#          nginx   2/2     2            2           <seconds>

Step 2: Manual Scaling

# Scale up to 5 replicas
kubectl scale deployment nginx --replicas=5
# Expected output: deployment.apps/nginx scaled

kubectl get deployment nginx
# Expected output: NAME    READY   UP-TO-DATE   AVAILABLE   AGE
#          nginx   5/5     5            5           <seconds>

# Scale down back to 2 replicas
kubectl scale deployment nginx --replicas=2
# Expected output: deployment.apps/nginx scaled

Step 3: Verify metrics-server is Installed

kubectl get pods -n kube-system | grep metrics-server
# Expected output: metrics-server-<hash>   1/1     Running   0   <time>

# If not installed, install it
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify metrics are available
kubectl top pods
# Expected output: NAME                    CPU(cores)   MEMORY(bytes)
#          nginx-<hash>-<pod>     1m           10Mi
#          nginx-<hash>-<pod>     2m           12Mi

Step 4: Create HPA (CPU-based)

kubectl autoscale deployment nginx --cpu-percent=50 --min=2 --max=10
# Expected output: horizontalpodautoscaler.autoscaling/nginx autoscaled

kubectl get hpa nginx
# Expected output: NAME    REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
#          nginx   Deployment/nginx   0%/50%    2         10        2          <seconds>

Step 5: Generate CPU Load to Trigger Auto-scaling

# Expose a Service for the load generator to access
kubectl expose deployment nginx --port=80 --target-port=80
# Expected output: service/nginx exposed

# Start load generator
kubectl run load-generator --image=busybox --restart=Never -- /bin/sh -c "while true; do wget -q -O- http://nginx; done"
# Expected output: pod/load-generator created

# Monitor HPA (open another terminal or append &)
kubectl get hpa nginx -w
# Expected output (changes start after about 1-2 minutes):
# nginx   Deployment/nginx   0%/50%   2         10        2          2m
# nginx   Deployment/nginx   65%/50%  2         10        4          3m
# nginx   Deployment/nginx   80%/50%  2         10        5          4m
# nginx   Deployment/nginx   45%/50%  2         10        5          5m

Verification

# View HPA final state
kubectl get hpa nginx
# Expected output: CPU usage below 50% (may drop back to low after load stabilizes)

# View replica count changes
kubectl get deployment nginx
# Expected output: REPLICAS may be > 2 (auto-scaled)

# Observe scale-down after stopping the load (auto-scales back to 2 after a few minutes)
kubectl delete pod load-generator
# Expected output: pod "load-generator" deleted

# Cleanup
kubectl delete deployment nginx
kubectl delete service nginx
kubectl delete hpa nginx

Exam Tips

  • Pods must have CPU requests set to be recognized by HPA, otherwise HPA cannot calculate CPU utilization
  • kubectl autoscale is the fastest way to create HPA, suitable for the time-pressured CKA exam
  • HPA collects metrics every 15 seconds by default, scale-up has no delay policy, scale-down has a default 5-minute stabilization window
  • metrics-server is usually pre-installed in the exam environment, but it's recommended to verify with kubectl top nodes first
  • If kubectl top returns metrics not available yet, wait about 30-60 seconds and retry