Network Troubleshooting

2026-05-27·CKA k8s 练习

CKA Exam Domain 5 — Service unreachable, DNS resolution, CoreDNS, NetworkPolicy, kube-proxy, CNI troubleshooting

← Back to CKA Practice Index Kubernetes networking involves multiple layers: Pod networking, Service networking, DNS resolution, network policies, and more. Network troubleshooting is both a difficult and key topic in the CKA exam.

1. Steps to Troubleshoot an Unreachable Service

# Step 1: Check whether the Service has been created
kubectl get svc

# Step 2: Check whether the Service has Endpoints
kubectl get endpoints <service-name>
# If Endpoints is empty, the Service's Selector is not matching any Pods

# Step 3: Check whether the Service's backend Pods are running normally
kubectl get pods -l <service-selector>

# Step 4: Check whether the Pod is listening on the correct port
kubectl exec -it <pod-name> -- netstat -tlnp

# Step 5: Test Service access from inside the cluster
kubectl run test --image=busybox -it --rm -- wget -qO- http://<service-name>.<namespace>.svc.cluster.local

# Step 6: Check kube-proxy
kubectl get pods -n kube-system | grep kube-proxy

Troubleshooting flowchart:

Service unreachable
    │
    ├─ kubectl get endpoints <svc> -- is it non-empty?
    │   ├─ Empty → Selector does not match any Pod
    │   └─ Non-empty → continue
    │
    ├─ Access Service from a Pod inside the cluster
    │   ├─ Unreachable → check kube-proxy
    │   └─ Reachable → check external access configuration
    │
    ├─ Check whether kube-proxy is healthy
    ├─ Check the CNI plugin
    └─ Check firewall/security groups

2. DNS Resolution Issues

CoreDNS Pod Status Check

# Check whether CoreDNS Pods are running
kubectl get pods -n kube-system -l k8s-app=kube-dns

# Check CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns

# Check CoreDNS configuration
kubectl get configmap -n kube-system coredns -o yaml

# Check the CoreDNS Service
kubectl get svc -n kube-system kube-dns

DNS Resolution Testing

# Use busybox to test DNS resolution
kubectl run dns-test --image=busybox:1.28 --rm -it -- nslookup kubernetes.default.svc.cluster.local

# Use nslookup
kubectl run dns-test --image=busybox:1.28 --rm -it -- nslookup kubernetes

# Use dig
kubectl run dns-test --image=dig --rm -it -- dig kubernetes.default.svc.cluster.local

# Test external domain resolution
kubectl run dns-test --image=busybox:1.28 --rm -it -- nslookup www.google.com

Note: nslookup in busybox 1.28 is more stable than in newer versions of busybox. Using the default latest tag may cause nslookup to fail.

Common CoreDNS Failure Causes

Problem	Diagnosis	Solution
CoreDNS Pods not running	`kubectl get pods -n kube-system`	Check Pod status and logs
CoreDNS CrashLoopBackOff	`kubectl logs -n kube-system coredns-xxx`	Check ConfigMap configuration
CoreDNS out of memory	`kubectl describe pod -n kube-system coredns-xxx`	Adjust resource limits
DNS timeout	Check network connectivity	Check kube-proxy and CNI
Custom domain resolution fails	Check CoreDNS ConfigMap	Add rewrite or forward rules

3. Network Connectivity Testing

# Create a test Pod
kubectl run test --image=busybox --rm -it -- sh

# Inside the test Pod:
# Test Service connectivity
wget -qO- http://<service-name>
wget -qO- http://<service-name>.<namespace>.svc.cluster.local

# Test Pod IP connectivity (requires knowing the Pod IP)
wget -qO- http://<pod-ip>:<port>

# Test node port
wget -qO- http://<node-ip>:<node-port>

4. NetworkPolicy Blocking Traffic -- Troubleshooting

# View all NetworkPolicies
kubectl get networkpolicy

# View NetworkPolicy details
kubectl describe networkpolicy <policy-name>

# Test whether traffic is being blocked by a NetworkPolicy
# Start a temporary Pod for testing
kubectl run test-pod --image=busybox --rm -it -- sh

# Inside test-pod, test connectivity to the target Pod
wget -qO- http://<target-pod-ip>:<port>

# Understand NetworkPolicy scope
# NetworkPolicy is a "whitelist" mechanism; once a NetworkPolicy selects a Pod,
# all traffic not explicitly allowed in the rules will be denied

Troubleshooting approach:

Check whether the target Pod's namespace has any NetworkPolicy
Check whether the NetworkPolicy's podSelector selects the target Pod
Check whether the ingress rules allow the source Pod's IP/labels
Check whether the egress rules allow the target Pod's IP/labels
If a NetworkPolicy exists but does not allow this traffic, add a corresponding rule

5. kube-proxy Modes and Issues

# Check kube-proxy Pod status
kubectl get pods -n kube-system -l k8s-app=kube-proxy

# View kube-proxy logs
kubectl logs -n kube-system kube-proxy-<node> --tail=50

# Check kube-proxy configuration
kubectl get configmap -n kube-system kube-proxy -o yaml

# Check kube-proxy mode (iptables / IPVS / userspace)
kubectl get configmap -n kube-system kube-proxy -o yaml | grep mode

kube-proxy mode comparison:

Mode	Description	Performance
`iptables`	Default mode, based on iptables rule forwarding	Medium
`IPVS`	Based on IPVS, supports more load balancing algorithms	High (large clusters)
`userspace`	User-space proxy (deprecated)	Low

Common issues:

kube-proxy not running → Service ClusterIP cannot be accessed
iptables rules accidentally modified → systemctl restart kube-proxy or restart the Pod
Missing kernel modules in IPVS mode → load ip_vs, ip_vs_rr, ip_vs_wrr, ip_vs_sh modules

# Check IPVS kernel modules
lsmod | grep ip_vs

# Check iptables rules
iptables-save | grep <service-name>

6. CNI Plugin Troubleshooting

# View CNI Pod status
kubectl get pods -n kube-system | grep -E "calico|flannel|weave|cilium"

# View CNI logs
kubectl logs -n kube-system -l k8s-app=calico-node --tail=50

# Check CNI configuration files
ls /etc/cni/net.d/

# Check CNI binary files
ls /opt/cni/bin/

# View node network interfaces
ip a
ip route

Common CNI issues:

Problem	Cause	Solution
Pods cannot communicate with each other	CNI plugin abnormal	Restart CNI DaemonSet
CoreDNS CrashLoopBackOff	CNI not properly configured for Pod networking	Check CNI configuration
Node NotReady	CNI plugin not deployed	Deploy per CNI documentation
Pod IP conflict	CNI IP pool exhausted	Check IPAM configuration

7. Node Port Connectivity Testing

# Test locally from the node
curl http://localhost:<node-port>

# Test from other nodes within the cluster
curl http://<node-ip>:<node-port>

# Test from outside the cluster
telnet <node-ip> <node-port>

# Test port with nc
nc -zv <node-ip> <node-port>

8. Practical Command Quick Reference

# View Services and Endpoints
kubectl get svc,ep

# View all network policies
kubectl get netpol --all-namespaces

# Create a temporary test Pod
kubectl run test --image=busybox -it --rm -- sh

# Test internal DNS
kubectl run test --image=busybox:1.28 --rm -it -- nslookup kubernetes.default

# View kube-proxy logs
kubectl logs -n kube-system -l k8s-app=kube-proxy

# View CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns

# Check whether kubelet is configured with the correct node IP
kubectl describe node <node-name>

9. Exam Key Points

When a Service is unreachable, first check Endpoints
For DNS resolution issues, first check CoreDNS Pod status
Use busybox:1.28 for DNS testing (newer busybox versions have nslookup issues)
NetworkPolicy is a whitelist mechanism; once it selects a Pod, all unallowed traffic is denied by default
kube-proxy issues will cause the Service's ClusterIP to be unreachable
CNI plugin issues will cause Pod network connectivity failures

🧪 Complete Hands-on Example: Troubleshooting an Unreachable Service

Scenario

A Service has just been created, but other Pods within the cluster cannot access the target Pod by the Service name. This is a complete end-to-end troubleshooting process.

Prerequisites

Multiple Pods and Services in the cluster
Permission to create temporary test Pods

Steps

Step 1: Confirm Service Status

kubectl get svc
# NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
# kubernetes      ClusterIP   10.96.0.1        <none>        443/TCP   10d
# my-app-service  ClusterIP   10.108.12.34     <none>        80/TCP    5m

kubectl describe svc my-app-service
# Name:              my-app-service
# Namespace:         default
# Labels:            app=my-app
# Selector:          app=my-app,tier=frontend
# Type:              ClusterIP
# IP Family Policy:  SingleStack
# IP Families:       IPv4
# IP:                10.108.12.34
# Port:              http  80/TCP
# TargetPort:        80/TCP
# Endpoints:         <none>       ← Endpoints is empty!
# Session Affinity:  None
# Events:            <none>

Step 2: Found Endpoints Empty -- Check Selector Matching

# View the Service's Selector
# Selector: app=my-app,tier=frontend

# Check whether any Pods match the labels
kubectl get pods -l app=my-app,tier=frontend
# No resources found in default namespace.

# View all Pod labels
kubectl get pods --show-labels
# NAME                          READY   STATUS    RESTARTS   AGE   LABELS
# my-app-6b8f9c7d8b-abc12       1/1     Running   0          10m   app=my-app
# my-app-6b8f9c7d8b-def34       1/1     Running   0          10m   app=my-app
# ← Pods only have app=my-app, missing the tier=frontend label!

Step 3: Fix the Label Matching Issue

# Add the missing label to the Pods
kubectl label pod -l app=my-app tier=frontend --overwrite
# pod/my-app-6b8f9c7d8b-abc12 labeled
# pod/my-app-6b8f9c7d8b-def34 labeled

# Verify Endpoints have been updated
kubectl get endpoints my-app-service
# NAME              ENDPOINTS                     AGE
# my-app-service    10.244.1.2:80,10.244.1.3:80   6m

Step 4: Test Service Connectivity from Another Pod

# Create a test Pod
kubectl run test-pod --image=busybox --rm -it -- sh
# If entering an interactive environment, run:
# wget -qO- http://my-app-service
# Or use non-interactive mode:
kubectl run test-pod --image=busybox --rm -it -- wget -qO- http://my-app-service
# <!DOCTYPE html>
# <html>... (returns HTML normally)

Step 5: Check DNS Resolution (If Service Name Resolution Fails)

kubectl run dns-test --image=busybox:1.28 --rm -it -- nslookup my-app-service
# Server:    10.96.0.10
# Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
#
# Name:      my-app-service
# Address 1: 10.108.12.34 my-app-service.default.svc.cluster.local
# → DNS resolution is normal, returning the correct ClusterIP

Step 6: Check Whether kube-proxy Is Healthy (If Endpoints Exist but Still Cannot Access)

kubectl get pods -n kube-system -l k8s-app=kube-proxy
# NAME                   READY   STATUS    RESTARTS   AGE
# kube-proxy-abc12       1/1     Running   0          10d
# kube-proxy-def34       1/1     Running   0          10d

# View kube-proxy logs
kubectl logs -n kube-system -l k8s-app=kube-proxy --tail=20
# Logs should show no errors; otherwise, restart kube-proxy on the corresponding node

Step 7: Check Whether a NetworkPolicy Is Blocking Traffic

kubectl get networkpolicy --all-namespaces
# NAMESPACE   NAME                 POD-SELECTOR   AGE
# default     deny-all             {}             5d
# → A NetworkPolicy exists!

kubectl describe networkpolicy deny-all
# PodSelector: <empty> (selects all Pods)
# Policy Types: Ingress
# Ingress: <none>  ← No ingress rules, default deny all inbound traffic

A deny-all policy has been found that denies all inbound traffic; an allow rule needs to be added.

# Add a NetworkPolicy that allows ingress
cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-service-access
spec:
  podSelector:
    matchLabels:
      app: my-app
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector: {}
EOF

Step 8: Verify Access Again

kubectl run test-pod --image=busybox --rm -it -- wget -qO- http://my-app-service
# <!DOCTYPE html>
# <html>... (success)

Verification

# Verify Service Endpoints
kubectl get endpoints my-app-service
# NAME              ENDPOINTS                     AGE
# my-app-service    10.244.1.2:80,10.244.1.3:80   10m

# Verify DNS resolution
kubectl run dns-test --image=busybox:1.28 --rm -it -- nslookup my-app-service.default.svc.cluster.local

# Verify the complete communication chain (Service → Pod)
kubectl exec test-pod -- wget -qO- http://my-app-service | head -5

Exam Tips

When a Service is unreachable, first check Endpoints -- if empty, the Selector is not matching any Pod
Selector labels must match exactly (all labels must be present)
kube-proxy issues will cause the ClusterIP to be unreachable; check kube-proxy Pods in the kube-system namespace
Use busybox:1.28 for DNS testing; nslookup in newer versions of busybox may be unavailable
NetworkPolicy denies all unallowed traffic by default; check whether a policy is blocking traffic