CCE provides container management services based on native Kubernetes. In order to facilitate users to better use CCE, we have summarized some checklists of typical practices from three aspects: cluster, application, and problem troubleshooting. It is strongly recommended that CCE users review the checklists before starting to use or launching services to help you successfully migrate services to CCE,Reduce the risk of application exceptions or cluster reconstruction due to improper use.
Cluster check item
type
project
proposal
Reference Documents
colony
Number of nodes
No matter how small the service scale is, for online services, it is strongly recommended that the number of cluster nodes be at least greater than 1, and that certain resource buffers be reserved to avoid business damage caused by a single point of failure
Node Password
The root password of the node must be set to a strong password
Node network
If the cluster has external access requirements, it is not recommended that nodes directly bind to EIP. If the nodes are exposed to external security risks, the node subnet of the cluster can be created as a NAT subnet type.
VPC routing
The implementation of CCE container network depends on VPC routing. The routing rules created by CCE are described as auto generated by cce. Please do not conflict with the routing rules of CCE when creating a new route. If it is unavoidable, you can send a work order for consultation.
Security Group
If there is a requirement to set a security group, the security group needs to release the node network, container network, and network segment 100.64.230.0/24 and ports 22, 6443, and 30000-32768. Otherwise, the container engine network may be blocked.
Disk capacity
When creating a cluster, it is strongly recommended to attach at least 100GB CDS to the node (CCE has been checked by default).
Node lifting configuration
Considering that the upgrade configuration needs to restart the machine, which may lead to insufficient cluster capacity, CCE does not directly support the upgrade configuration of virtual machines. Users can operate on the BCC page, but it is recommended that users expand and then shrink the capacity to reduce business impact.
Virtual machine monitoring
Excessive utilization of virtual machine CPU, MEM, disk, etc. will affect the stability of the cluster. CCE has the Evicted mechanism, which will migrate some instances when the node load is too high. Therefore, it is strongly recommended that users add monitoring alarms to the nodes in the BCM.
It is strongly recommended that users of CCEs should not directly modify the configuration of resources created by CCEs, including their names, on the product pages of BCC, DCC, VPC, BLB, EIP, etc., which may lead to unexpected results.
Application check item
type
project
proposal
Reference Documents
application
image
It is recommended that when building a Docker image, users can install some common debugging tools in the image, such as ping, telnet, curl, vim, etc., which can be customized.
Private image
When a container uses a private image, it needs to set a secret.
If the service is stateless and there is no conflict, it is recommended that instance replica>2 to avoid instance migration caused by a single point of failure. The service is temporarily unavailable.
Resource constraints
It is strongly recommended that resource.limits be set for all online services.
The service requires data persistence. It is recommended to use PV and PVC. At present, CCE has supported the use of file storage (CFS), block storage (CDS), and object storage (BOS) through PV/PVC.
After starting the service, enter the container through kubectl exec - it podName/bin/sh, manually execute the startup command, and view the service error message.
2. LoadBalancer Service creation failed?
You can view the reasons for the failure of events troubleshooting through kubectl describe service serviceName. Generally, EIP and BLB quotas exceed the limit, and you can issue a work order to apply for quota increase.
Note: The number of EIP instances that users can purchase<=the number of existing BCC instances+the number of existing BLB instances+2
3. Failed to access the container network?
The container network access failure can be divided into many situations:
Serivce EIP cannot be accessed;
ServiceName cannot be accessed in the container;
The service cluster IP cluster cannot be accessed;
PodIP cannot be accessed in the cluster;
...
The container network problem is generally due to the impassability of the PodIP, which leads to various access problems of the service.It is preferred to check whether the PodIP can be pinged in the node and the Pod respectively. If not, check two places:
Check the VPC routing table to confirm whether there is a conflict between routing rules and CCE;
Check the VPC security group to see if there is a policy that will block requests.
If no, you can issue a work order to contact the administrator for troubleshooting.
Note: The service cluster IP cannot be pinged directly. It needs to be accessed through ip: port;PodIP can be pinged.