Kubernetes Pod 调度策略详解
Kubernetes Pod调度是K8s集群管理中的核心功能,决定了Pod被调度到哪个Node上运行。合理配置调度策略,可以实现资源的高效利用、业务的高可用部署和硬件资源的充分利用。
一、默认调度器工作原理
Kubernetes默认调度器(kube-scheduler)通过两个阶段选择目标Node:
- 过滤(Filtering):排除不满足Pod要求的Node(资源不足、污点不匹配、端口冲突等)
- 打分(Scoring):对剩余Node评分,综合考虑资源均衡、数据局部性、Pod亲和性等因素,选择得分最高的Node
二、资源请求与限制
apiVersion: v1
kind: Pod
spec:
containers:
- name: app
image: myapp:latest
resources:
requests: # 调度时保证的资源量
cpu: "500m" # 0.5核
memory: "256Mi"
limits: # 运行时最大使用量
cpu: "1000m" # 1核(超过会被CPU限速,但不Kill)
memory: "512Mi" # 超过OOM Kill
建议:requests设置为实际平均使用量,limits设置为峰值的2倍。requests过小会导致Node负载过高,过大会浪费资源。
三、节点选择(NodeSelector & NodeAffinity)
# NodeSelector:简单标签选择
spec:
nodeSelector:
disk-type: ssd # 只调度到有ssd标签的节点
# NodeAffinity:更灵活的节点亲和性
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution: # 硬要求
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/zone
operator: In
values: ["cn-shanghai-a", "cn-shanghai-b"]
preferredDuringSchedulingIgnoredDuringExecution: # 软偏好
- weight: 100
preference:
matchExpressions:
- key: node-type
operator: In
values: ["high-memory"]
四、Pod亲和性与反亲和性
# Pod反亲和性:同一应用的Pod分散到不同节点(高可用)
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: my-web
topologyKey: "kubernetes.io/hostname" # 按Node分散
# Pod亲和性:前端Pod尽量和后端Pod调度到同一节点(降低延迟)
spec:
affinity:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: my-backend
topologyKey: "kubernetes.io/hostname"
五、污点与容忍(Taints & Tolerations)
# 给Node打污点(防止普通Pod调度到特殊节点)
kubectl taint nodes gpu-node01 dedicated=gpu:NoSchedule
# Pod添加容忍才能调度到污点节点
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"
常见用途:将GPU节点专用于AI训练任务;将高配节点专用于数据库;将某Node标记为维护模式(NoExecute驱逐已运行Pod)。
六、拓扑分散约束(TopologySpreadConstraints)
# 确保Pod均匀分布在多个可用区(K8s 1.19+)
spec:
topologySpreadConstraints:
- maxSkew: 1 # 各区Pod数量差不超过1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: my-app
七、优先级与抢占
apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: high-priority value: 1000000 preemptionPolicy: PreemptLowerPriority # 可抢占低优先级Pod # 在Pod中使用 spec: priorityClassName: high-priority
八、总结
Kubernetes调度策略是实现高可用和资源高效利用的关键。对于生产环境,建议至少配置:资源requests和limits(避免资源争抢);Pod反亲和性(同一服务的多副本分散部署);关键业务Pod的高优先级设置。随着业务复杂度增加,逐步引入节点亲和性和拓扑分散约束等高级特性。