After the dynamic and static separation of Qiniu Cloud is configured, the Qiniu image will automatically create a robots.txt (This robots.txt is in the storage space of Qiniu Cloud), all search engines are prohibited from grabbing it, so as to prevent Baidu from grabbing this image site and causing the main site to be K in terms of power reduction. But after using this robots.txt, the thumbnails in Baidu and 360 search results will be deleted. Because your pictures are forbidden to be captured! So we need to improve the robots. txt file so that spiders can capture pictures. Of course, using this robots.txt directly is certainly no problem for SEO, which avoids the search engine from grabbing duplicate content. It's up to you to decide whether to change or not.
Improved robots.txt
Below is the improved robots.txt
# robots.txt generated at http://portal.qiniu.com User-agent: Baiduspider Allow: /wp-content/uploads/*.jpg$ Allow: /wp-content/uploads/*.png$ Disallow: / User-agent: 360Spider Allow: /wp-content/uploads/*.jpg$ Allow: /wp-content/uploads/*.png$ Disallow: / User-agent: Baiduspider-image Allow: /wp-content/uploads/*.jpg$ Allow: /wp-content/uploads/*.png$ Disallow: / User-agent: 360Spider-Image Allow: /wp-content/uploads/*.jpg$ Allow: /wp-content/uploads/*.png$ Disallow: / User-agent: * Disallow: /
The above code means that Baidu and 360 are allowed to crawl the image ending in jpg/png in the article (under the uploads folder), and other search engines are prohibited to crawl!
Advantages:
- ① . While avoiding the search engine's inclusion of seven bulls' duplicate content, it is allowed to include featured pictures and content pictures of articles;
- ② The prohibition of crawling can greatly save the number of GETs in Qiniu space, because every crawl of the search engine will generate a GET, and the number of GETs in Qiniu free space is limited, so we don't need to waste it.
Then, log in to the Qiniu Cloud you use, select Object Storage ->Select Storage Space ->Content Management to find robots.txt to delete, and then click the "Upload File" above to directly improve the robots.txt file.
Refresh prefetch file
If you only go here, you can refresh and view the robots. txt file. Since Qiniu Cloud has caches at each cdn node, you need to clear the cache and refresh the prefetch files.
When the blogger first contacted Qiniu Cloud, he modified the logo image of his website. After uploading and deleting it several times, he failed to modify it. The logo image in Qiniu Space has just been uploaded to the new logo, but the new logo is not displayed on the front desk. It's depressing , we later learned that it was caused by cache.
Try again to see if your robots. txt file has been modified. This station is http://cdn.wosn.net/robots.txt
Use this link to see:
Let's take a look at the cache image:
So far, how to correctly configure the robots.txt file of Qiniu Cloud storage is over. If you want to do more work, you can see the following.
Other methods to avoid the power reduction of the website caused by Qiniu Cloud:
It is also a reasonable solution to block the capture of web files by the Seven Bull image crawler through UserAgent. Add the following code to the index.php file in the root directory of the website or to the functions.php file in the theme directory.
- if (? strpos ( $_SERVER ['HTTP_USER_AGENT'],'qiniu-imgstg-spider')?!==? false)? {
- header('HTTP/1.1? 503? Service?Temporarily?Unavailable');
- echo ?' Anti seven cattle mirror image ';
- exit ;
- }
If your wordpress uses the WP Super Cache plug-in, please add the 7N ua to the list of prohibited caches. Set ->WP Super Cache ->Advanced ->Find the rejected user agent ->Add the qiniu imgstg spider, as shown in the following screenshot.
After completing this method, we will delete the 7-N cache, and then return 503 when we access it again, indicating that it is successful.
Ps: Part of this article comes from Zhang Ge's blog and Mingtai Network.