Kubernetes运维与故障排查详解1. 常用运维命令1.1 Pod操作# 查看Pod kubectl get pods -n default # 查看Pod详情 kubectl describe pod nginx-xxxx -n default # 查看Pod日志 kubectl logs -f nginx-xxxx -n default # 进入Pod kubectl exec -it nginx-xxxx -n default -- /bin/sh1.2 调试工具# 临时运行调试容器 kubectl debug -it nginx-xxxx --imagebusybox -- /bin/sh # 端口转发 kubectl port-forward pod/nginx-xxxx 8080:802. 故障排查2.1 Pod无法启动# 查看事件 kubectl get events -n default --sort-by.lastTimestamp # 查看Pod状态 kubectl get pod nginx -o wide # 查看资源配额 kubectl describe limitrange -n default2.2 网络问题# 查看Service kubectl get svc -n default # 测试Service连通性 kubectl run curl --imagecurlimages/curl -it --rm -- sh # 查看Endpoints kubectl get endpoints nginx-svc -n default3. 资源调优3.1 ResourceQuotaapiVersion: v1 kind: ResourceQuota metadata: name: quota spec: hard: requests.cpu: 10 requests.memory: 20Gi pods: 1003.2 LimitRangeapiVersion: v1 kind: LimitRange metadata: name: limits spec: limits: - max: memory: 1Gi min: memory: 64Mi default: memory: 256Mi type: Container4. 总结通过合理的运维命令和故障排查技巧可以快速定位和解决Kubernetes集群中的问题。