Use rsync to synchronize from the old file system to the CFS file system
Operation steps
-
Log in to ECS -
Bind EIP on the ECS BCC to have a public IP address. -
Ensure that the remote server can login to the public IP address of the BCC through ssh.
-
-
Configure public key Log in to the remote server and execute the following command: Ssh keygen - t rsa # Create a public key, and you will be prompted to enter information. You can ignore it and just press Enter all the time Cat~/. ssh/id_rsa. pub # View the generated public key, copy the content here, and prepare to input it into the file in the next step Log in to BCC and edit files ~/.ssh/id_rsa.pub , enter the public key copied in the previous step here. Now, from the remote server, you can log in and access the BCC without the secret key. -
Strategy evaluation Users need to evaluate the following factors to synchronize data in a more appropriate way: -
Whether writes to existing file systems can be paused during synchronization. If possible, full synchronization will only be performed once (synchronization scheme 4.1). If not, please evaluate the amount of newly written data during synchronization. If the amount of data is small, you can use full synchronization+one incremental synchronization (synchronization scheme 4.2), otherwise use periodic synchronization scheme 4.3. -
The amount of data in the file system. If the existing file system has a large amount of data, it will be synchronized by directory. The synchronization of each subdirectory is irrelevant and can be performed simultaneously or serially. At the same time, CFS supports multiple virtual machines to be mounted at the same time, and properly increasing the number of virtual machines to synchronize subdirectories can improve the synchronization speed.
-
-
Synchronization scheme The synchronization process will vary slightly depending on whether the user writes to the old file system. The following situations are discussed respectively: 4.1 The user does not write IO requests to the original file system First, confirm that the CFS file system has been mounted to the/mnt/cfs/or other directory on the BCC machine, and then execute the following command on the remote server: rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip :/mnt/cfs/sub_folder_1 rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip :/mnt/cfs/sub_folder_2 rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip :/mnt/cfs/sub_folder_3 ... notes – z: Allow compression – v: verbose – r: Recursive User: login account of bcc Bcc_ip: public IP address of BCC 4.2 The user has a small number of write IO requests to the original file system If the user has a request to write IO, the rsync command cannot ensure that the original file system and the CFS file system remain in absolute synchronization. But the internal algorithm of rsync can ensure that most of the two file systems are synchronized. After the first rsync execution, the user can suspend the write operation IO of the application to the original file system. The specific steps are as follows: -
The user's write IO to the old file system can continue. Execute the following command on the remote server:
rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip :/mnt/cfs/sub_folder_1 rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip :/mnt/cfs/sub_folder_2 rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip :/mnt/cfs/sub_folder_3 ... -
Pauses user write IO to the old file system. Execute the following command on the remote server:
rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip :/mnt/cfs/sub_folder_1 rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip :/mnt/cfs/sub_folder_2 rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip :/mnt/cfs/sub_folder_3 ... The second time you execute the rsync operation, because you have already done a synchronization operation, the second time you will calculate the difference between the two files, and then synchronize them. Therefore, the second execution of the rsync operation will be much faster. 4.3 Users have a large number of write IO requests to the original file system When the user's application program has a large number of write IO, and cannot pause IO at the same time. Theoretically, it is impossible to ensure the strong consistency between the old file system and CFS through synchronization. At this time, you can consider running the rsync operation periodically to keep the two file systems as consistent as possible. In Linux system, there is a way to execute commands in multiple cycles. Here we recommend using crontab, which can easily call rsync periodically. For example: crontab -e Then enter the following content, which means that the rsync command is transferred every 1 hour. * */1 * * * rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip :/mnt/cfs/sub_folder_1 You can check whether the crontab command is configured successfully through the following command. crontab -l After a period of synchronization from the old file system to Baidu CFS, the user selects an appropriate time to pause the application's write operation to the old file system, and delete the crontab task at the same time. The operation is as follows: Crontab - e # Delete the rsync task and exit according to the prompt Finally, manually execute the following rsync command to ensure that the two file systems are fully synchronized again. rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip :/mnt/cfs/sub_folder_1 rsync -zvr /old_fs/sub_folder_2/ user@bcc_ip :/mnt/cfs/sub_folder_2 rsync -zvr /old_fs/sub_folder_3/ user@bcc_ip :/mnt/cfs/sub_folder_3 ... At this time, users can migrate applications from the old server to Baidu BCC server. -
-
Statistical results After executing rsync, the results of this synchronization will be displayed, including: -
Time spent in synchronization; -
How many bytes are transmitted; -
IO transmission speed; -
How many files are synchronized.
-