Network Troubleshooting
CKA Exam Domain 5 — Service unreachable, DNS resolution, CoreDNS, NetworkPolicy, kube-proxy, CNI troubleshooting
← Back to CKA Practice Index Kubernetes networking involves multiple layers: Pod networking, Service networking, DNS resolution, network policies, and more. Network troubleshooting is both a difficult and key topic in the CKA exam.
1. Steps to Troubleshoot an Unreachable Service
# Step 1: Check whether the Service has been created
kubectl get svc
# Step 2: Check whether the Service has Endpoints
kubectl get endpoints <service-name>
# If Endpoints is empty, the Service's Selector is not matching any Pods
# Step 3: Check whether the Service's backend Pods are running normally
kubectl get pods -l <service-selector>
# Step 4: Check whether the Pod is listening on the correct port
kubectl exec -it <pod-name> -- netstat -tlnp
# Step 5: Test Service access from inside the cluster
kubectl run test --image=busybox -it --rm -- wget -qO- http://<service-name>.<namespace>.svc.cluster.local
# Step 6: Check kube-proxy
kubectl get pods -n kube-system | grep kube-proxy
Troubleshooting flowchart:
Service unreachable
│
├─ kubectl get endpoints <svc> -- is it non-empty?
│ ├─ Empty → Selector does not match any Pod
│ └─ Non-empty → continue
│
├─ Access Service from a Pod inside the cluster
│ ├─ Unreachable → check kube-proxy
│ └─ Reachable → check external access configuration
│
├─ Check whether kube-proxy is healthy
├─ Check the CNI plugin
└─ Check firewall/security groups
2. DNS Resolution Issues
CoreDNS Pod Status Check
# Check whether CoreDNS Pods are running
kubectl get pods -n kube-system -l k8s-app=kube-dns
# Check CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns
# Check CoreDNS configuration
kubectl get configmap -n kube-system coredns -o yaml
# Check the CoreDNS Service
kubectl get svc -n kube-system kube-dns
DNS Resolution Testing
# Use busybox to test DNS resolution
kubectl run dns-test --image=busybox:1.28 --rm -it -- nslookup kubernetes.default.svc.cluster.local
# Use nslookup
kubectl run dns-test --image=busybox:1.28 --rm -it -- nslookup kubernetes
# Use dig
kubectl run dns-test --image=dig --rm -it -- dig kubernetes.default.svc.cluster.local
# Test external domain resolution
kubectl run dns-test --image=busybox:1.28 --rm -it -- nslookup www.google.com
Note: nslookup in busybox 1.28 is more stable than in newer versions of busybox. Using the default
latesttag may cause nslookup to fail.
Common CoreDNS Failure Causes
| Problem | Diagnosis | Solution |
|---|---|---|
| CoreDNS Pods not running | kubectl get pods -n kube-system | Check Pod status and logs |
| CoreDNS CrashLoopBackOff | kubectl logs -n kube-system coredns-xxx | Check ConfigMap configuration |
| CoreDNS out of memory | kubectl describe pod -n kube-system coredns-xxx | Adjust resource limits |
| DNS timeout | Check network connectivity | Check kube-proxy and CNI |
| Custom domain resolution fails | Check CoreDNS ConfigMap | Add rewrite or forward rules |
3. Network Connectivity Testing
# Create a test Pod
kubectl run test --image=busybox --rm -it -- sh
# Inside the test Pod:
# Test Service connectivity
wget -qO- http://<service-name>
wget -qO- http://<service-name>.<namespace>.svc.cluster.local
# Test Pod IP connectivity (requires knowing the Pod IP)
wget -qO- http://<pod-ip>:<port>
# Test node port
wget -qO- http://<node-ip>:<node-port>
4. NetworkPolicy Blocking Traffic -- Troubleshooting
# View all NetworkPolicies
kubectl get networkpolicy
# View NetworkPolicy details
kubectl describe networkpolicy <policy-name>
# Test whether traffic is being blocked by a NetworkPolicy
# Start a temporary Pod for testing
kubectl run test-pod --image=busybox --rm -it -- sh
# Inside test-pod, test connectivity to the target Pod
wget -qO- http://<target-pod-ip>:<port>
# Understand NetworkPolicy scope
# NetworkPolicy is a "whitelist" mechanism; once a NetworkPolicy selects a Pod,
# all traffic not explicitly allowed in the rules will be denied
Troubleshooting approach:
- Check whether the target Pod's namespace has any NetworkPolicy
- Check whether the NetworkPolicy's
podSelectorselects the target Pod - Check whether the
ingressrules allow the source Pod's IP/labels - Check whether the
egressrules allow the target Pod's IP/labels - If a NetworkPolicy exists but does not allow this traffic, add a corresponding rule
5. kube-proxy Modes and Issues
# Check kube-proxy Pod status
kubectl get pods -n kube-system -l k8s-app=kube-proxy
# View kube-proxy logs
kubectl logs -n kube-system kube-proxy-<node> --tail=50
# Check kube-proxy configuration
kubectl get configmap -n kube-system kube-proxy -o yaml
# Check kube-proxy mode (iptables / IPVS / userspace)
kubectl get configmap -n kube-system kube-proxy -o yaml | grep mode
kube-proxy mode comparison:
| Mode | Description | Performance |
|---|---|---|
iptables | Default mode, based on iptables rule forwarding | Medium |
IPVS | Based on IPVS, supports more load balancing algorithms | High (large clusters) |
userspace | User-space proxy (deprecated) | Low |
Common issues:
- kube-proxy not running → Service ClusterIP cannot be accessed
- iptables rules accidentally modified →
systemctl restart kube-proxyor restart the Pod - Missing kernel modules in IPVS mode → load
ip_vs,ip_vs_rr,ip_vs_wrr,ip_vs_shmodules
# Check IPVS kernel modules
lsmod | grep ip_vs
# Check iptables rules
iptables-save | grep <service-name>
6. CNI Plugin Troubleshooting
# View CNI Pod status
kubectl get pods -n kube-system | grep -E "calico|flannel|weave|cilium"
# View CNI logs
kubectl logs -n kube-system -l k8s-app=calico-node --tail=50
# Check CNI configuration files
ls /etc/cni/net.d/
# Check CNI binary files
ls /opt/cni/bin/
# View node network interfaces
ip a
ip route
Common CNI issues:
| Problem | Cause | Solution |
|---|---|---|
| Pods cannot communicate with each other | CNI plugin abnormal | Restart CNI DaemonSet |
| CoreDNS CrashLoopBackOff | CNI not properly configured for Pod networking | Check CNI configuration |
| Node NotReady | CNI plugin not deployed | Deploy per CNI documentation |
| Pod IP conflict | CNI IP pool exhausted | Check IPAM configuration |
7. Node Port Connectivity Testing
# Test locally from the node
curl http://localhost:<node-port>
# Test from other nodes within the cluster
curl http://<node-ip>:<node-port>
# Test from outside the cluster
telnet <node-ip> <node-port>
# Test port with nc
nc -zv <node-ip> <node-port>
8. Practical Command Quick Reference
# View Services and Endpoints
kubectl get svc,ep
# View all network policies
kubectl get netpol --all-namespaces
# Create a temporary test Pod
kubectl run test --image=busybox -it --rm -- sh
# Test internal DNS
kubectl run test --image=busybox:1.28 --rm -it -- nslookup kubernetes.default
# View kube-proxy logs
kubectl logs -n kube-system -l k8s-app=kube-proxy
# View CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns
# Check whether kubelet is configured with the correct node IP
kubectl describe node <node-name>
9. Exam Key Points
- When a Service is unreachable, first check Endpoints
- For DNS resolution issues, first check CoreDNS Pod status
- Use
busybox:1.28for DNS testing (newer busybox versions have nslookup issues) - NetworkPolicy is a whitelist mechanism; once it selects a Pod, all unallowed traffic is denied by default
- kube-proxy issues will cause the Service's ClusterIP to be unreachable
- CNI plugin issues will cause Pod network connectivity failures
🧪 Complete Hands-on Example: Troubleshooting an Unreachable Service
Scenario
A Service has just been created, but other Pods within the cluster cannot access the target Pod by the Service name. This is a complete end-to-end troubleshooting process.
Prerequisites
- Multiple Pods and Services in the cluster
- Permission to create temporary test Pods
Steps
Step 1: Confirm Service Status
kubectl get svc
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
# kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 10d
# my-app-service ClusterIP 10.108.12.34 <none> 80/TCP 5m
kubectl describe svc my-app-service
# Name: my-app-service
# Namespace: default
# Labels: app=my-app
# Selector: app=my-app,tier=frontend
# Type: ClusterIP
# IP Family Policy: SingleStack
# IP Families: IPv4
# IP: 10.108.12.34
# Port: http 80/TCP
# TargetPort: 80/TCP
# Endpoints: <none> ← Endpoints is empty!
# Session Affinity: None
# Events: <none>
Step 2: Found Endpoints Empty -- Check Selector Matching
# View the Service's Selector
# Selector: app=my-app,tier=frontend
# Check whether any Pods match the labels
kubectl get pods -l app=my-app,tier=frontend
# No resources found in default namespace.
# View all Pod labels
kubectl get pods --show-labels
# NAME READY STATUS RESTARTS AGE LABELS
# my-app-6b8f9c7d8b-abc12 1/1 Running 0 10m app=my-app
# my-app-6b8f9c7d8b-def34 1/1 Running 0 10m app=my-app
# ← Pods only have app=my-app, missing the tier=frontend label!
Step 3: Fix the Label Matching Issue
# Add the missing label to the Pods
kubectl label pod -l app=my-app tier=frontend --overwrite
# pod/my-app-6b8f9c7d8b-abc12 labeled
# pod/my-app-6b8f9c7d8b-def34 labeled
# Verify Endpoints have been updated
kubectl get endpoints my-app-service
# NAME ENDPOINTS AGE
# my-app-service 10.244.1.2:80,10.244.1.3:80 6m
Step 4: Test Service Connectivity from Another Pod
# Create a test Pod
kubectl run test-pod --image=busybox --rm -it -- sh
# If entering an interactive environment, run:
# wget -qO- http://my-app-service
# Or use non-interactive mode:
kubectl run test-pod --image=busybox --rm -it -- wget -qO- http://my-app-service
# <!DOCTYPE html>
# <html>... (returns HTML normally)
Step 5: Check DNS Resolution (If Service Name Resolution Fails)
kubectl run dns-test --image=busybox:1.28 --rm -it -- nslookup my-app-service
# Server: 10.96.0.10
# Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
#
# Name: my-app-service
# Address 1: 10.108.12.34 my-app-service.default.svc.cluster.local
# → DNS resolution is normal, returning the correct ClusterIP
Step 6: Check Whether kube-proxy Is Healthy (If Endpoints Exist but Still Cannot Access)
kubectl get pods -n kube-system -l k8s-app=kube-proxy
# NAME READY STATUS RESTARTS AGE
# kube-proxy-abc12 1/1 Running 0 10d
# kube-proxy-def34 1/1 Running 0 10d
# View kube-proxy logs
kubectl logs -n kube-system -l k8s-app=kube-proxy --tail=20
# Logs should show no errors; otherwise, restart kube-proxy on the corresponding node
Step 7: Check Whether a NetworkPolicy Is Blocking Traffic
kubectl get networkpolicy --all-namespaces
# NAMESPACE NAME POD-SELECTOR AGE
# default deny-all {} 5d
# → A NetworkPolicy exists!
kubectl describe networkpolicy deny-all
# PodSelector: <empty> (selects all Pods)
# Policy Types: Ingress
# Ingress: <none> ← No ingress rules, default deny all inbound traffic
A deny-all policy has been found that denies all inbound traffic; an allow rule needs to be added.
# Add a NetworkPolicy that allows ingress
cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-service-access
spec:
podSelector:
matchLabels:
app: my-app
policyTypes:
- Ingress
ingress:
- from:
- podSelector: {}
EOF
Step 8: Verify Access Again
kubectl run test-pod --image=busybox --rm -it -- wget -qO- http://my-app-service
# <!DOCTYPE html>
# <html>... (success)
Verification
# Verify Service Endpoints
kubectl get endpoints my-app-service
# NAME ENDPOINTS AGE
# my-app-service 10.244.1.2:80,10.244.1.3:80 10m
# Verify DNS resolution
kubectl run dns-test --image=busybox:1.28 --rm -it -- nslookup my-app-service.default.svc.cluster.local
# Verify the complete communication chain (Service → Pod)
kubectl exec test-pod -- wget -qO- http://my-app-service | head -5
Exam Tips
- When a Service is unreachable, first check Endpoints -- if empty, the Selector is not matching any Pod
- Selector labels must match exactly (all labels must be present)
- kube-proxy issues will cause the ClusterIP to be unreachable; check kube-proxy Pods in the
kube-systemnamespace - Use
busybox:1.28for DNS testing; nslookup in newer versions of busybox may be unavailable - NetworkPolicy denies all unallowed traffic by default; check whether a policy is blocking traffic