Robots rule

What is robots.txt? _WordPress is the correct robot writing and generating tool

For a new webmaster, you may not know what the robots. txt file is for. So I don't know the relationship between robots. txt and website SEO.

Today, my father's notes on building a website will share with you how to correctly write robots. txt file to help website SEO.

What is robots.txt

Robots.txt, also known as the robots protocol, is a common code of ethics in the Internet community.

Robots.txt is a text file located in the root directory of your website, which tells search engines which pages can be crawled and which pages cannot be crawled; You can block large files in some websites, such as pictures, music, videos, etc., to save server bandwidth; Some dead links of the site can be blocked. It is convenient for search engines to crawl website content; Set up a website map connection to guide spiders to crawl the page.

How to create a robots.txt file

You only need to use text editing software, such as Notepad, to create a text file named robots. txt, and then upload the file to the root directory of the website.

You can also use Robots generation tool Online generation.

How to write the robots.txt rule

It is not enough to create robots. txt file alone. The essence is to write robots rules suitable for your website.

Robots. txt supports the following rules

 User agent: * here * represents all search engine types, * is a wildcard Disallow:/admin/It is defined here to prohibit crawling the directory under the admin directory Disallow:/require/The definition here is to prohibit crawling the directory under the require directory Disallow:/ABC/The definition here is to prohibit crawling the directory under the ABC directory Disallow:/cgi bin/*. htm Disable access to all URLs (including subdirectories) under the/cgi bin/directory with the suffix ". htm". Disallow: /*?*  Prohibit access to all websites containing question mark (?) Disallow:/. jpg $Forbid fetching all pictures in. jpg format on the web page Disallow:/ab/adc.html Do not crawl the adc.html file under the ab folder. Allow:/cgi bin/The definition here is to allow crawling the directory under the cgi bin directory Allow:/tmp This definition allows you to crawl the entire directory of tmp Allow:. htm $Only URLs with the suffix ". htm" are allowed. Allow:. gif $Allow fetching web pages and gif format pictures Sitemap: the site map tells the crawler that this page is a site map

It is recommended that you use the robots generation tool of webmaster tool to write rules, which will be more simple and clear.

Robots generation tool

Tips from the father: Prompt Disallow: if the slash is not added, the whole station is allowed to be captured.

WordPress robots.txt rule recommendation

After WordPress is installed, a robots.txt rule file will be created by default (that is, you can't see it in the website directory, but you can use the“ Website/robots.txt ”Visit)

The default rules are as follows:

 User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php

This rule means that all search engines are prohibited from crawling wp-admin Contents under the folder, but fetching is allowed /wp-admin/admin-ajax.php This file.

But for the sake of website SEO and security, the father suggested that the rules should be improved, The following is the current robots.txt rule of the daddy's site building notes.

 User-agent: * Disallow: /wp-admin/ Disallow: /wp-content/plugins/ Disallow: /? s=* Allow: /wp-admin/admin-ajax.php User-agent: YandexBot Disallow: / User-agent: DotBot Disallow: / User-agent: BLEXBot Disallow: / User-agent: YaK Disallow: / Sitemap:  https://blog.naibabiji.com/sitemap_index.xml

The above rule adds the following two lines to the default rule:

 Disallow: /wp-content/plugins/  Disallow: /? s=*

No Grabbing /wp-content/plugins/ Folder and web address are /? s=* Web page of.

/wp-content/plugins/ It is the directory of WordPress plug-ins to avoid privacy risks when being captured (for example, some plug-ins have privacy leakage bugs, which are just captured by search engines.)

Do not crawl the search results page to avoid being used by others to brush weights:

The web address is /? s=* This is also a bug that was recently discovered by the father and used by the grey production project of SEO.

/? s=* The URL of is the default search result page of the WordPress website, as shown below:

 Search URL results

Basically most WordPress Theme The title of the search page is a combination of "keyword+site title".

But there will be a problem, that is, Baidu organic rate to grab this page. For example, if my father has a station, he will not be used by others.
 Web snapshot
The following rules prohibit specific search engines from fetching rules and a sitemap address link, Several Methods of Generating Site Map by WordPress _sitemap Plug in Recommendation

How to check whether robots.txt is effective

After creating and writing the robots. txt rule, you can use Baidu webmaster's robots detection tool to determine whether it is effective.

Baidu robots detection

However, Baidu's websites that do not support https can be detected using the tools of Aizhan.

Aizhan robots detection

Related articles:

Limit the capture frequency of Bing and other search engines to reduce the server load

5/5 - (1 vote)
Scroll to top