Welcome to Hua'an Notes! Personal notes, job blog

SEO optimization details:
Long tailed words, website weight, website map, title tag, inner chain, outer chain, 301 redirect, URL static, 404 pages, anchor text, friendship links, search spiders, secondary domain names, power reduction K stations, robots.txt, keyword density
Effective ways to completely ban Baidu and other search engines


How to completely prohibit Baidu and other search engines to include?


Many bloggers exchange friend chains and ask me why your blog Baidu is not included. In fact, robots have blocked Baidu spider crawling and made some technical restrictions. Here is the ultimate and effective solution to ban Baidu and other search engines from inclusion.

1、 Robots.txt can only prohibit fetching and cannot prohibit recording

Many people think that Baidu has also banned the inclusion of robots.txt, which is a serious misunderstanding. The robots.txt file can tell search engines which directories and files can be grabbed and which cannot. Through practice, it is found that even if the search engine is prohibited from grabbing the root directory, robots files cannot be banned from Baidu and other search engines. If a website has a large number of external links, it is basically impossible to prohibit Baidu's inclusion by normal means. Taobao is a typical case (Taobao's robots.txt is set to prohibit the capture of root directories, but the home page is still included).

2、 Judging by Nginx that user_agent prohibits Baidu and other search engines from accessing, so as to realize prohibition of inclusion

Since you can't directly ban crawling, you can simply ban Baidu Spider and other search engines from accessing. The idea is: judge user_agent. If it is the user_agent of Baidu Spider, Google Robot and other search engines, it will return 403 or 404. In this way, Baidu and other search engines will think that the website cannot be opened or does not exist, and naturally it will not be included.

The configuration code of Nginx is as follows:

if ($http_user_agent ~* "qihoobot|Baiduspider|Googlebot|Googlebot-Mobile|Googlebot-Image|Mediapartners-Google|Adsbot-Google|Feedfetcher-Google|Yahoo! Slurp|Yahoo! Slurp China|YoudaoBot|Sosospider|Sogou spider|Sogou web spider|MSNBot|ia_archiver|Tomato Bot")
        {
                return 403;
        }
Through the curl simulation search engine capture, the above code was tested to be effective, and Baidu inclusion was completely prohibited!




Search "Hua'an Notes" to follow me