How to correctly configure the robots.txt file of Qiniu Cloud storage

 Watson Blog September 13, 2017 10:44:36 WordPress comment eight hundred and sixteen one Reading mode

After the dynamic and static separation of Qiniu Cloud is configured, the Qiniu image will automatically create a robots.txt (This robots.txt is in the storage space of Qiniu Cloud), all search engines are prohibited from grabbing it, so as to prevent Baidu from grabbing this image site and causing the main site to be K in terms of power reduction. But after using this robots.txt, the thumbnails in Baidu and 360 search results will be deleted. Because your pictures are forbidden to be captured! So we need to improve the robots. txt file so that spiders can capture pictures. Of course, using this robots.txt directly is certainly no problem for SEO, which avoids the search engine from grabbing duplicate content. It's up to you to decide whether to change or not.

如何正确配置七牛云存储的robots.txt文件

Improved robots.txt

Below is the improved robots.txt

 # robots.txt generated at  http://portal.qiniu.com User-agent: Baiduspider Allow: /wp-content/uploads/*.jpg$ Allow: /wp-content/uploads/*.png$ Disallow: / User-agent: 360Spider Allow: /wp-content/uploads/*.jpg$ Allow: /wp-content/uploads/*.png$ Disallow: / User-agent: Baiduspider-image Allow: /wp-content/uploads/*.jpg$ Allow: /wp-content/uploads/*.png$ Disallow: / User-agent: 360Spider-Image Allow: /wp-content/uploads/*.jpg$ Allow: /wp-content/uploads/*.png$ Disallow: / User-agent: * Disallow: /

The above code means that Baidu and 360 are allowed to crawl the image ending in jpg/png in the article (under the uploads folder), and other search engines are prohibited to crawl!

Advantages:

  • ① . While avoiding the search engine's inclusion of seven bulls' duplicate content, it is allowed to include featured pictures and content pictures of articles;
  • ② The prohibition of crawling can greatly save the number of GETs in Qiniu space, because every crawl of the search engine will generate a GET, and the number of GETs in Qiniu free space is limited, so we don't need to waste it.

Then, log in to the Qiniu Cloud you use, select Object Storage ->Select Storage Space ->Content Management to find robots.txt to delete, and then click the "Upload File" above to directly improve the robots.txt file.

如何正确配置七牛云存储的robots.txt文件

 

Refresh prefetch file

If you only go here, you can refresh and view the robots. txt file. Since Qiniu Cloud has caches at each cdn node, you need to clear the cache and refresh the prefetch files.

When the blogger first contacted Qiniu Cloud, he modified the logo image of his website. After uploading and deleting it several times, he failed to modify it. The logo image in Qiniu Space has just been uploaded to the new logo, but the new logo is not displayed on the front desk. It's depressing如何正确配置七牛云存储的robots.txt文件 如何正确配置七牛云存储的robots.txt文件 , we later learned that it was caused by cache.

如何正确配置七牛云存储的robots.txt文件

Try again to see if your robots. txt file has been modified. This station is http://cdn.wosn.net/robots.txt

 

Baidu webmaster platform Check whether robots rules are effective

Use this link to see:

如何正确配置七牛云存储的robots.txt文件

Let's take a look at the cache image:
如何正确配置七牛云存储的robots.txt文件

So far, how to correctly configure the robots.txt file of Qiniu Cloud storage is over. If you want to do more work, you can see the following.

Other methods to avoid the power reduction of the website caused by Qiniu Cloud:

It is also a reasonable solution to block the capture of web files by the Seven Bull image crawler through UserAgent. Add the following code to the index.php file in the root directory of the website or to the functions.php file in the theme directory.

  1. if (? strpos ( $_SERVER ['HTTP_USER_AGENT'],'qiniu-imgstg-spider')?!==? false)? {
  2. header('HTTP/1.1? 503? Service?Temporarily?Unavailable');
  3. echo ?' Anti seven cattle mirror image ';
  4. exit ;
  5. }

If your wordpress uses the WP Super Cache plug-in, please add the 7N ua to the list of prohibited caches. Set ->WP Super Cache ->Advanced ->Find the rejected user agent ->Add the qiniu imgstg spider, as shown in the following screenshot.

如何正确配置七牛云存储的robots.txt文件

After completing this method, we will delete the 7-N cache, and then return 503 when we access it again, indicating that it is successful.

Ps: Part of this article comes from Zhang Ge's blog and Mingtai Network.

 Watson Blog
  • This article is written by Published on September 13, 2017 10:44:36
  • This article is collected and sorted by the website of Mutual Benefit, and the email address for problem feedback is: wosnnet@foxmail.com , please keep the link of this article for reprinting: https://wosn.net/541.html

Comment