Avatar

Do we have to have a robot.txt - Google (Technics)

by Alfie ⌂, Vienna, Austria, Tuesday, June 23, 2009, 01:03 (5422 days ago) @ Auge
edited by Alfie, Tuesday, June 23, 2009, 13:46

Hi Auge and Göran!

Everything Auge said about indexing is correct.

That is actually causing us a problem. We have a database with 155.000 entries and when the Googlebot starts to search through the database, the forum canät be used by other users for about two hours. This happens every day.

I don't know how to avoid the daily access for search robots. Maybe the Google Webmaster Tools (in the case of googlebot) give any possibilities to control it?

Get a Google account first and go to to Google’s webmaster tools. After verifying that you are the owner of the site (you get a 2byte file [containing just LF/CR] of the type googleXXXXXXXXXXXXXXXX.html, where XXXXXXXXXXXXXXXX is a unique ID; upload the file to your root directory) you may change the settings:
Dashboard > Site configuration > Settings > Crawl rate > [o] Set custom crawl rate (the slowest rate is 0.002 requests/second = 1 per 500 seconds).

If this measure doesn’t help, you have to go with a xml-sitemap. For an example see the one of my main site (I don’t need one for my forum with just 3000+ posts). In a sitemap you can set the crawling frequency for any resource to one of the following values: always, daily, weekly, monthly, yearly, never.

Another hint: avoid double content (see this thread in the 1.x-forum).

Other search engines should give comparable programs to control it.

A nasty bot is Yahoo!Slurp. The only way I found out to decrease the access rate are two lines in robots.txt:
User-agent: Slurp
Crawl-delay: 10

According to Yahoo! a value of 10 is the slowest rate; in my experience a higher number is ignored.

For MSN-Bot (formerly MS Live Search, now Bing Beta):
User-agent: msnbot
Crawl-delay: XXX

where XXX are seconds between requests.

@Alex: I would sugest to modify the scripts in such a way that links to the contact form - whether to the admin or a user - are given the attribute rel="nofollow", e.g. instead of

<a href="index.php?mode=contact" title="foo">bar</a>

to

<a href="index.php?mode=contact" title="foo" rel="nofollow">bar</a>

--
Cheers,
Alfie (Helmut Schütz)
BEBA-Forum (v1.8β)


Complete thread:

 RSS Feed of thread