Kubernetes故障排查实录
Pod无法启动是K8s最常见的问题.本文记录一个真实案例的完整排查过程.
问题现象
$ kubectl get pods NAME READY STATUS RESTARTS AGE web-app-7d9f4b8c5-x2v9p 0/1 ImagePullBackOff 0 5m # Pod一直无法启动,状态是ImagePullBackOff
排查过程
第1步:查看Pod详情
$ kubectl describe pod web-app-7d9f4b8c5-x2v9p # 关键信息在Events部分: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 2m default-scheduler Successfully assigned default/web-app-7d9f4b8c5-x2v9p to node-1 Normal Pulling 2m kubelet Pulling image "myregistry.com/web-app:v1.2.3" Warning Failed 1m (x3 over 2m) kubelet Failed to pull image "myregistry.com/web-app:v1.2.3": rpc error: code = Unknown desc = Error response from daemon: unauthorized: authentication required Warning Failed 1m (x3 over 2m) kubelet Error: ImagePullBackOff
发现问题:拉取镜像时认证失败.
第2步:检查镜像仓库密钥
# 查看是否有imagePullSecret $ kubectl get pod web-app-7d9f4b8c5-x2v9p -o yaml | grep -A5 imagePullSecrets # 输出为空!没有配置镜像拉取密钥 # 查看已有的secret $ kubectl get secrets NAME TYPE DATA AGE default-token-xxx kubernetes.io/service-account-token 3 30d # 没有registry相关的secret
第3步:创建镜像仓库密钥
# 创建docker-registry类型的secret $ kubectl create secret docker-registry regcred --docker-server=myregistry.com --docker-username=admin --docker-password=your-password --docker-email=admin@example.com secret/regcred created
第4步:修改Deployment添加密钥引用
# 编辑deployment
$ kubectl edit deployment web-app
# 在spec.template.spec下添加:
spec:
template:
spec:
imagePullSecrets:
- name: regcred # 引用刚才创建的secret
containers:
- name: web-app
image: myregistry.com/web-app:v1.2.3
第5步:验证修复
# 查看Pod状态 $ kubectl get pods NAME READY STATUS RESTARTS AGE web-app-7d9f4b8c5-abc12 1/1 Running 0 30s # 查看Events确认镜像拉取成功 $ kubectl describe pod web-app-7d9f4b8c5-abc12 | tail -10 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 1m default-scheduler Successfully assigned... Normal Pulling 1m kubelet Pulling image "myregistry.com/web-app:v1.2.3" Normal Pulled 30s kubelet Successfully pulled image Normal Created 30s kubelet Created container web-app Normal Started 30s kubelet Started container web-app
Pod状态速查表
| 状态 | 含义 | 排查方向 |
|---|---|---|
| Pending | 等待调度 | 资源不足/节点选择器/亲和性 |
| ContainerCreating | 创建容器中 | 镜像拉取/存储挂载/CNI |
| ImagePullBackOff | 镜像拉取失败 | 镜像名/仓库认证/网络 |
| CrashLoopBackOff | 容器反复崩溃 | 应用错误/健康检查/资源限制 |
| Error | 启动错误 | 查看容器日志 |
| Completed | 正常结束 | Job类型Pod正常状态 |
排查命令速查
# 查看Pod详情和事件 kubectl describe pod# 查看容器日志 kubectl logs # 查看之前容器的日志(崩溃后) kubectl logs --previous # 进入容器调试 kubectl exec -it -- /bin/sh # 查看节点资源 kubectl top node # 查看Pod资源使用 kubectl top pod
总结
K8s排查遵循”看状态→查事件→看日志→进容器“的流程.掌握describe和logs命令,能解决80%的Pod问题.
