CRD 与 Operator
CustomResourceDefinition(CRD)和 Operator 模式是 Kubernetes 扩展性的核心,允许用户自定义资源并实现自动化运维逻辑。
概述
CustomResourceDefinition(CRD)允许用户扩展 Kubernetes API,定义自己的资源类型。Operator 模式则结合 CRD 和控制器逻辑,实现应用的自动化管理。CKA 考试要求对 CRD 和 Operator 模式有基本理解。
第一部分:CRD(CustomResourceDefinition)
1. CRD 概念
CRD 是 Kubernetes 的扩展机制,允许用户定义新的资源类型(如 Database、Backup、Application),Kubernetes API Server 会为这些自定义资源提供完整的 RESTful API 支持。
自定义资源创建后:
- API Server 自动生成 RESTful 端点:/apis/<group>/<version>/namespaces/<ns>/<resource-plural>/
- 支持 kubectl 操作(get, create, delete, etc.)
- 支持 RBAC 权限控制
- 存储在 etcd 中
CRD 与原生资源对比
| 特性 | 原生资源(Pod, Service...) | CRD |
|---|---|---|
| 定义位置 | Kubernetes 源码 | CRD YAML 文件 |
| 验证逻辑 | 内置 | OpenAPI v3 schema |
| 控制器 | 内置控制器 | 自定义控制器(Operator) |
| 存储 | etcd | etcd |
2. CRD YAML 结构
2.1 基本 CRD 示例
# crd-database.yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: databases.example.com # 必须格式:<plural>.<group>
spec:
group: example.com # API 组
names:
plural: databases # 复数名称(用于 kubectl)
singular: database # 单数名称
shortNames: # 短名称
- db
kind: Database # 资源类型(用于 YAML 的 kind 字段)
listKind: DatabaseList # 列表类型
scope: Namespaced # 作用域:Namespaced 或 Cluster
versions:
- name: v1 # API 版本
served: true # 是否在 API Server 中提供
storage: true # 是否作为 etcd 存储版本
schema: # OpenAPI v3 验证 Schema
openAPIV3Schema:
type: object
required:
- spec
properties:
spec:
type: object
required:
- engine
- version
properties:
engine:
type: string
enum:
- mysql
- postgres
- mongodb
version:
type: string
pattern: '^\d+\.\d+\.\d+$'
replicas:
type: integer
minimum: 1
maximum: 10
default: 1
storage:
type: string
pattern: '^\d+(Gi|Ti)$'
default: "10Gi"
adminUser:
type: string
default: "admin"
backup:
type: object
properties:
enabled:
type: boolean
default: false
schedule:
type: string
pattern: '^(\d+|\*)(/\d+)?(\s+(\d+|\*)(/\d+)?){4}$'
description: "Cron schedule for backups"
2.2 使用 CRD 创建自定义资源
# database-sample.yaml
apiVersion: example.com/v1
kind: Database
metadata:
name: my-production-db
labels:
environment: production
team: backend
spec:
engine: postgres
version: "14.5"
replicas: 3
storage: 100Gi
adminUser: dbadmin
backup:
enabled: true
schedule: "0 2 * * *"
# 应用 CRD
kubectl apply -f crd-database.yaml
# 查看 CRD
kubectl get crd
kubectl get crd databases.example.com -o yaml
# 使用自定义资源
kubectl apply -f database-sample.yaml
# 操作自定义资源(与原生资源完全一致)
kubectl get databases
kubectl get db # 使用 shortName
kubectl get databases --all-namespaces
kubectl describe database my-production-db
kubectl delete database my-production-db
kubectl get databases -o wide
kubectl get databases -o yaml
2.3 CRD 的附加功能配置
# CRD 高级配置示例
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: databases.example.com
spec:
# ... 基本配置 ...
versions:
- name: v1
# ... schema ...
additionalPrinterColumns: # kubectl get 的额外列
- name: Engine
type: string
jsonPath: .spec.engine
description: Database engine type
- name: Version
type: string
jsonPath: .spec.version
- name: Replicas
type: integer
jsonPath: .spec.replicas
- name: Status
type: string
jsonPath: .status.phase
subresources: # 子资源
status: {} # 启用 /status 子资源
scale: # 启用 /scale 子资源
specReplicasPath: .spec.replicas
statusReplicasPath: .status.replicas
conversion: # 版本转换
strategy: None # 或 Webhook
# 有了 additionalPrinterColumns 后:
kubectl get databases
# NAME ENGINE VERSION REPLICAS STATUS
# my-production-db postgres 14.5 3 Running
# 查看状态子资源
kubectl get database my-production-db -o json | jq '.status'
2.4 CRD 版本管理
# CRD 支持多版本共存,通过版本转换实现平滑升级
# 查看 CRD 的版本
kubectl get crd databases.example.com -o json | jq '.spec.versions[].name'
# 使用特定版本访问
kubectl get databases.v1.example.com
3. Operator 模式
3.1 Operator 概念
Operator 是打包、部署和管理 Kubernetes 应用的模式。它通过 CRD 扩展 API,并使用自定义控制器(Controller)来维护自定义资源的目标状态。
Operator = CRD + Controller + Domain Knowledge
工作原理:
1. 用户创建自定义资源(如:Database 实例)
2. Operator 控制器 watch 该资源的变更
3. 控制器根据业务逻辑创建/管理底层资源(StatefulSet, Service, PVC...)
4. 控制器不断调整实际状态使其匹配用户期望状态
3.2 Operator 与传统部署对比
| 特性 | 传统方式 | Operator 方式 |
|---|---|---|
| 部署 | 手动创建多个 YAML | 创建单个 CR 实例 |
| 扩缩容 | 手动修改 Deployment | 修改 CR 的 replicas 字段 |
| 升级 | 手动更新镜像版本 | 修改 CR 的 version 字段 |
| 备份恢复 | 手动执行脚本 | Operator 自动处理 |
| 故障恢复 | 手动排查修复 | Operator 自动检测恢复 |
| 升级回滚 | 手动执行 | Operator 自动化 |
3.3 Operator 工作流程
┌──────────────────────┐
│ etcd (CR 存储) │
└──────────┬───────────┘
│ watch
┌──────────▼───────────┐
│ Reconciler Loop │
│ ───────────────── │
│ 1. Read CR (当前期望)│
│ 2. 查询实际状态 │
│ 3. 对比差异 │
│ 4. 执行调整操作 │
│ 5. 更新 CR Status │
└──────────────────────┘
4. 使用已有 Operator
4.1 etcd-operator
# 安装 etcd-operator
kubectl apply -f https://raw.githubusercontent.com/coreos/etcd-operator/master/example/service_account.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/etcd-operator/master/example/rbac/cluster-role.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/etcd-operator/master/example/rbac/cluster-role-binding.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/etcd-operator/master/example/deployment.yaml
# 查看 Operator
kubectl get pods -l name=etcd-operator
# 创建 etcd 集群
cat <<EOF | kubectl apply -f -
apiVersion: "etcd.database.coreos.com/v1beta2"
kind: "EtcdCluster"
metadata:
name: "example-etcd-cluster"
spec:
size: 3
version: "3.5.15"
EOF
# 查看 etcd 集群
kubectl get etcdclusters
kubectl get pods -l etcd_cluster=example-etcd-cluster
# 扩容 etcd 集群
kubectl patch etcdcluster example-etcd-cluster --type='json' -p='[{"op": "replace", "path": "/spec/size", "value": 5}]'
# 修改 etcd 版本(Operator 自动滚动升级)
kubectl patch etcdcluster example-etcd-cluster --type='json' -p='[{"op": "replace", "path": "/spec/version", "value": "3.5.15"}]'
4.2 prometheus-operator
# 使用 Helm 安装 kube-prometheus-stack(包含 prometheus-operator)
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack
# 查看 operator
kubectl get pods -l app.kubernetes.io/name=prometheus-operator
# 查看自定义资源(prometheus-operator 注册了大量 CRD)
kubectl get crd | grep monitoring.coreos.com
# prometheuses.monitoring.coreos.com
# alertmanagers.monitoring.coreos.com
# servicemonitors.monitoring.coreos.com
# podmonitors.monitoring.coreos.com
# prometheusrules.monitoring.coreos.com
# thanosrulers.monitoring.coreos.com
# 使用 ServiceMonitor 配置抓取
cat <<EOF | kubectl apply -f -
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-app-monitor
spec:
selector:
matchLabels:
app: my-app
endpoints:
- port: metrics
interval: 15s
EOF
# 查看 ServiceMonitor
kubectl get servicemonitors
4.3 cert-manager Operator
# 安装 cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.0/cert-manager.yaml
# 查看 cert-manager Pod
kubectl get pods -n cert-manager
# 注册的自定义资源
kubectl get crd | grep cert-manager
# certificates.cert-manager.io
# issuers.cert-manager.io
# clusterissuers.cert-manager.io
# 创建证书
cat <<EOF | kubectl apply -f -
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: selfsigned-issuer
spec:
selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: my-tls-cert
spec:
dnsNames:
- example.com
secretName: my-tls-secret
issuerRef:
name: selfsigned-issuer
kind: Issuer
EOF
# 查看证书
kubectl get certificates
kubectl get certificate my-tls-cert -o yaml
kubectl get secret my-tls-secret
5. OLM(Operator Lifecycle Manager)
OLM 是 Operator 的包管理器,负责 Operator 的安装、升级和生命周期管理。
# 安装 OLM
curl -sL https://github.com/Operator-framework/operator-lifecycle-manager/releases/download/v0.25.0/install.sh | bash -s v0.25.0
# 查看 OLM 组件
kubectl get pods -n olm
# 查看可用的 Operator(从 OperatorHub)
kubectl get packagemanifests -n olm
# 安装 Operator(通过 Subscription)
cat <<EOF | kubectl apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: prometheus
namespace: operators
spec:
channel: stable
name: prometheus
source: operatorhubio-catalog
sourceNamespace: olm
EOF
# 查看已安装的 Operator
kubectl get operators
kubectl get subscriptions
# 查看 Operator 版本
kubectl get clusterserviceversion
kubectl get csv -n operators
6. kubebuilder / operator-sdk(构建 Operator 的工具)
# 安装 operator-sdk
export ARCH=$(case $(uname -m) in x86_64) echo -n amd64 ;; aarch64) echo -n arm64 ;; esac)
export OS=$(uname | awk '{print tolower($0)}')
curl -sL https://github.com/operator-framework/operator-sdk/releases/download/v1.33.0/operator-sdk_${OS}_${ARCH} -o operator-sdk
chmod +x operator-sdk && sudo mv operator-sdk /usr/local/bin/
# 创建 Operator 项目
operator-sdk init --domain example.com --repo github.com/example/database-operator
# 创建 API(CRD)
operator-sdk create api --group example --version v1 --kind Database --resource --controller
# 生成 CRD 清单
make generate
make manifests
CKA 考试要点
- CRD 基本结构 -- 知道
group、names、scope、versions、schema字段 - kubectl 操作 CRD 资源 -- CRD 资源如同原生资源一样使用 kubectl
- Operator = CRD + Controller -- 理解 Operator 模式的核心思想
additionalPrinterColumns-- 定制kubectl get的输出列- CRD 的
scope--Namespaced与Cluster的区别
🧪 完整操作实例:创建 CRD 并使用 Operator
场景描述
创建一个名为 databases.example.com 的 CRD,然后使用 Helm 安装 prometheus-operator(kube-prometheus-stack),验证自定义资源和 Operator 正常工作。
前置条件
- 一个运行中的 Kubernetes 集群
- kubectl 已配置
- Helm v3 已安装
操作步骤
Step 1: 创建 CRD YAML
cat <<'EOF' > ~/crd-database.yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: databases.example.com
spec:
group: example.com
names:
plural: databases
singular: database
shortNames:
- db
kind: Database
listKind: DatabaseList
scope: Namespaced
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
required:
- spec
properties:
spec:
type: object
required:
- engine
- version
properties:
engine:
type: string
enum:
- mysql
- postgres
version:
type: string
replicas:
type: integer
minimum: 1
maximum: 10
default: 1
EOF
Step 2: 应用 CRD
kubectl apply -f ~/crd-database.yaml
# customresourcedefinition.apiextensions.k8s.io/databases.example.com created
# 验证 CRD
kubectl get crd
# NAME CREATED AT
# databases.example.com 2026-05-27T10:00:00Z
kubectl get crd databases.example.com -o yaml
Step 3: 创建自定义资源
cat <<'EOF' | kubectl apply -f -
apiVersion: example.com/v1
kind: Database
metadata:
name: my-production-db
spec:
engine: postgres
version: "14.5"
replicas: 3
EOF
# database.example.com/my-production-db created
# 使用 kubectl 操作自定义资源(与原生资源一样)
kubectl get databases
# NAME ENGINE VERSION AGE
# my-production-db postgres 14.5 10s
kubectl get db # 使用 shortName
# NAME ENGINE VERSION AGE
# my-production-db postgres 14.5 10s
kubectl describe database my-production-db
Step 4: 安装 prometheus-operator(kube-prometheus-stack)
# 添加 prometheus 仓库
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# 安装 kube-prometheus-stack(包含 prometheus-operator)
helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring --create-namespace
# NAME: prometheus
# LAST DEPLOYED: Tue May 27 10:00:00 2026
# NAMESPACE: monitoring
# STATUS: deployed
# REVISION: 1
Step 5: 查看 Operator 注册的 CRD
# 查看 prometheus-operator 注册的 CRD
kubectl get crd | grep monitoring.coreos.com
# alertmanagerconfigs.monitoring.coreos.com
# alertmanagers.monitoring.coreos.com
# podmonitors.monitoring.coreos.com
# prometheuses.monitoring.coreos.com
# prometheusrules.monitoring.coreos.com
# servicemonitors.monitoring.coreos.com
# 查看 Operator Pod
kubectl get pods -n monitoring
# NAME READY STATUS RESTARTS AGE
# prometheus-kube-prometheus-operator-xxxxxxxxx-xxxxx 1/1 Running 0 2m
# prometheus-kube-state-metrics-xxxxxxxxx-xxxxx 1/1 Running 0 2m
# prometheus-prometheus-node-exporter-xxxxx 1/1 Running 0 2m
Step 6: 使用 Operator 的 CRD 创建 ServiceMonitor
cat <<'EOF' | kubectl apply -f -
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: example-app-monitor
namespace: monitoring
spec:
selector:
matchLabels:
app: example-app
endpoints:
- port: metrics
interval: 30s
EOF
# servicemonitor.monitoring.coreos.com/example-app-monitor created
# 查看 ServiceMonitor
kubectl get servicemonitors -n monitoring
# NAME AGE
# example-app-monitor 10s
验证结果
# 验证 CRD 自定义资源
kubectl get databases
# NAME ENGINE VERSION AGE
# my-production-db postgres 14.5 5m
# 验证 Operator 工作
kubectl get pods -n monitoring | grep operator
# prometheus-kube-prometheus-operator-xxxxxxxxx-xxxxx 1/1 Running 0 5m
# 验证 Operator 管理的 Prometheus
kubectl get prometheus -n monitoring
# NAME VERSION REPLICAS AGE
# prometheus-kube-prometheus-prometheus v2.xx.x 1 5m
考试提示
- CRD 操作与原生资源完全一样:
kubectl get/create/delete/describe <crd-resource> - CRD 的
metadata.name必须遵循<plural>.<group>格式(如databases.example.com) scope决定了资源是 Namespaced 还是 Cluster 级别- Operator = CRD + Controller,安装 Operator 后相关的 CRD 会自动注册
- CKA 考试对 CRD/Operator 只要求基本理解,不要求编写 Operator 代码