RSS2.0

Technical guidelines

http://tbn0.google.com/images?q=tbn:kprOabpxwNQlVM:http://www.masternewmedia.org/images/open_door_free_access.jpg

1. Use a text browser such as Lynx to examine your site, because most search engine spiders see your site much as Lynx would. If fancy features such as JavaScript, cookies, session IDs, frames, DHTML, or Flash keep you from seeing all of your site in a text browser, then search engine spiders may have trouble crawling your site.

2. Allow search bots to crawl your sites without session IDs or arguments that track their path through the site. These techniques are useful for tracking individual user behavior, but the access pattern of bots is entirely different. Using these techniques may result in incomplete indexing of your site, as bots may not be able to eliminate URLs that look different but actually point to the same page.

3. Make sure your web server supports the If-Modified-Since HTTP header. This feature allows your web server to tell Google whether your content has changed since we last crawled your site. Supporting this feature saves you bandwidth and overhead.

4. Make use of the robots.txt file on your web server. This file tells crawlers which directories can or cannot be crawled. Make sure it's current for your site so that you don't accidentally block the Googlebot crawler. Visit http://www.robotstxt.org/wc/faq.html to learn how to instruct robots when they visit your site. You can test your robots.txt file to make sure you're using it correctly with the robots.txt analysis tool available in Google webmaster tools.

5. If your company buys a content management system, make sure that the system can export your content so that search engine spiders can crawl your site.

6. Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines.