Description: allow all search engines, in addition to some system files. And the two site map, is a certified member of the micro-blog home page bottom personal address, the other is a micro-blog message. The XML format has a restriction site map is a map file list only up to 50 thousand URL, a maximum of not more than 10m files, you can create a super multi site map, the first XML map solitary rattan specifically to check the Tencent micro-blog, 41000 left and right URL map file. 2m much. A period of time and then look at the Tencent is also the new site map over url.
Quanzhou Shanghai Longfeng solitary rattan Zan Hui teacher read "Se real code", which talked about the robots.txt only, personal feeling is very detailed, did not study a large website is how to set up, today remembered, to analyze the robots.txt file under the domestic micro-blog Sina, Tencent, Sohu, NetEase 4 platform each set, robots how to write.
. Another point is that micro-blog’s robots.txt Sohu found solitary rattan almost around June time changed, shielding love other search engines outside Shanghai, Sogou crawl, but other search engines do so the index included the amount has been increased, the difference is only Google, Youdao, Bing index, no included. The search does not seem to support the robot file or what, still included a snapshot, text extraction. YAHOO also included, only the snapshot can not see, can not determine whether >
1. Sina micro-blog
Love official documents said — especially note that Disallow and Allow order is significant, robot will according to the first, the success of the Allow or Disallow line to determine whether access to a URL.
Description: allow all search engines
Micro-blog According to the Shanghai
Sohu is one of the most interesting, because of the high weight fast rise of a few months before the keyword ranking by the Sohu is micro-blog itself, then the Sohu micro-blog screen legend love Shanghai spiders, let’s look at the robots.txt file. The first part is the statement allows love Shanghai spiders, the second part statement is allowed to Sogou crawl, third part statement is to ban all search engines crawl.
therefore the last part of the statement of love and Shanghai Sogou is invalid. That is to say only love Shanghai and Sohu micro-blog Sogou to grab the page.
In fact, before the
2. Tencent micro-blog
3. Sohu micro-blog