CCE Use Checklist

Update time ： 2024-05-30

overview

CCE provides container management services based on native Kubernetes. In order to facilitate users to better use CCE, we have summarized some checklists of typical practices from three aspects: cluster, application, and problem troubleshooting. It is strongly recommended that CCE users review the checklists before starting to use or launching services to help you successfully migrate services to CCE, Reduce the risk of application exceptions or cluster reconstruction due to improper use.

Cluster check item

type	project	proposal	Reference Documents
colony	Number of nodes	No matter how small the service scale is, for online services, it is strongly recommended that the number of cluster nodes be at least greater than 1, and that certain resource buffers be reserved to avoid business damage caused by a single point of failure
	Node Password	The root password of the node must be set to a strong password
	Node network	If the cluster has external access requirements, it is not recommended that nodes directly bind to EIP. If the nodes are exposed to external security risks, the node subnet of the cluster can be created as a NAT subnet type.
	VPC routing	The implementation of CCE container network depends on VPC routing. The routing rules created by CCE are described as auto generated by cce. Please do not conflict with the routing rules of CCE when creating a new route. If it is unavoidable, you can send a work order for consultation.
	Security Group	If there is a requirement to set a security group, the security group needs to release the node network, container network, and network segment 100.64.230.0/24 and ports 22, 6443, and 30000-32768. Otherwise, the container engine network may be blocked.
	Disk capacity	When creating a cluster, it is strongly recommended to attach at least 100GB CDS to the node (CCE has been checked by default).
	Node lifting configuration	Considering that the upgrade configuration needs to restart the machine, which may lead to insufficient cluster capacity, CCE does not directly support the upgrade configuration of virtual machines. Users can operate on the BCC page, but it is recommended that users expand and then shrink the capacity to reduce business impact.
	Virtual machine monitoring	Excessive utilization of virtual machine CPU, MEM, disk, etc. will affect the stability of the cluster. CCE has the Evicted mechanism, which will migrate some instances when the node load is too high. Therefore, it is strongly recommended that users add monitoring alarms to the nodes in the BCM.	BCM Add Alarm
	Baidu intelligent cloud third-party resources	It is strongly recommended that users of CCEs should not directly modify the configuration of resources created by CCEs, including their names, on the product pages of BCC, DCC, VPC, BLB, EIP, etc., which may lead to unexpected results.

Application check item

type	project	proposal	Reference Documents
application	image	It is recommended that when building a Docker image, users can install some common debugging tools in the image, such as ping, telnet, curl, vim, etc., which can be customized.
	Private image	When a container uses a private image, it needs to set a secret.	Practice of using private images in CCE clusters
	Number of instance replicas	If the service is stateless and there is no conflict, it is recommended that instance replica>2 to avoid instance migration caused by a single point of failure. The service is temporarily unavailable.
	Resource constraints	It is strongly recommended that resource.limits be set for all online services.	Kubernetes resource restrictions
	health examination	It is recommended to set the livenesss and readiness probe health check methods for all online services to ensure automatic service failover.	Kubernetes health check
	Service exposure mode	Access within cluster: ClusterIP Service; Access outside the cluster: LB Service; Access outside the cluster (HTTP/HTTPS): Ingress	LoadBalancer access network traffic Ingress access network traffic
	Service data persistence	The service requires data persistence. It is recommended to use PV and PVC. At present, CCE has supported the use of file storage (CFS), block storage (CDS), and object storage (BOS) through PV/PVC.	Using CFS through PV/PVC Using CDS through PV/PVC Using BOS through PV/PVC

Troubleshooting of common problems

1. Failed to start the container?

You can usually view error messages in the following two ways:

kubectl describe podName
kubectl logs podName

If no obvious error can be seen in the above way, you can modify the startup command of the container in YAML, such as setting it to sleep 3600:

 apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment labels: app: nginx spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: hub.baidubce.com/cce/nginx-alpine-go:latest command: ["/bin/sh", "-c", "sleep 3600"]

After starting the service, enter the container through kubectl exec - it podName/bin/sh, manually execute the startup command, and view the service error message.

2. LoadBalancer Service creation failed?

You can view the reasons for the failure of events troubleshooting through kubectl describe service serviceName. Generally, EIP and BLB quotas exceed the limit, and you can issue a work order to apply for quota increase.

Note: The number of EIP instances that users can purchase<=the number of existing BCC instances+the number of existing BLB instances+2

3. Failed to access the container network?

The container network access failure can be divided into many situations:

Serivce EIP cannot be accessed;
ServiceName cannot be accessed in the container;
The service cluster IP cluster cannot be accessed;
PodIP cannot be accessed in the cluster;
...

The container network problem is generally due to the impassability of the PodIP, which leads to various access problems of the service. It is preferred to check whether the PodIP can be pinged in the node and the Pod respectively. If not, check two places:

Check the VPC routing table to confirm whether there is a conflict between routing rules and CCE;
Check the VPC security group to see if there is a policy that will block requests.

If no, you can issue a work order to contact the administrator for troubleshooting.

Note: The service cluster IP cannot be pinged directly. It needs to be accessed through ip: port; PodIP can be pinged.

Operation Guide

CCE Access to Public Network Practice

Baidu Smart Cloud

Container Engine CCE