Google updates its Robots.txt and Meta Tags featuresAdd to Aug. 16, 2007 Today, Google has improved its Robots.txt analysis tool to better recognize sitemap declarations and relative URLs. Google's earlier versions weren't aware of sitemaps at all, and understood only absolute URLs. Anything else was reported as 'Syntax not Understood'. Google's new and improved version of the robots.txt file now tells a site owner or webmaster whether the sitemap's URL and scope are valid. According to Google, overall testing can also be done against relative URLs with a lot less typing. Line reporting is also better too. Webmasters now will be told of multiple problems per line if they exist, unlike Google's earlier versions which only reported the first problem encountered. Google has also made other general improvements to the robots.txt analysis tool and its overall validation. For instance, a webmaster or developer managing the domain www.site.com wishes search engines to index everything on the site, except for the /images folder. The webmaster wants to make sure his or her sitemap gets noticed, so he or she saves the following code as the new robots.txt file:
disalow images Webmasters can visit Google's Webmaster Central to test their site against the robots.txt analysis tool, using these two test URLs: http://www.site.com Google also wants to make sure webmasters understand the new "unavailable_after META" tag announced on the Google Blog August 2. This allows for a more dynamic relationship between any website site and Googlebot. With the above example (www.site.com), any time a site has a temporarily available news story or limited offer sale or promotion page, site owners can now specify the exact date and time they want specific pages to stop being crawled and indexed in Google's database. For instance, a site promotion that expires at the end of 2007: in the headers of page www.site.com/2007promotion.html, a site manager can use the following: Another new feature: the new "X-Robots-Tag" directive, which adds Robots Exclusion Protocol (REP) META tag support for non-HTML pages. Now, webmasters and site owners can have the same control over videos, spreadsheets and other indexed file types. Using the example above, if a promotion page is in PDF format on a typical link such as www.site.com/2007promotion.pdf, a developer now would use the following command in the file's HTML headers: X-Robots-Tag: unavailable_after: 31 Dec 2007 23:59:59 EST. Google underlines that webmasters need to remember that REP META tags can be useful for implementing no-archive, no-snippet and now unavailable_after tags for page-level instruction. This is in comparison to the previously used robots.txt file, which was controlled at the root of the domain. Google says it gets some requests from bloggers and webmasters for these new features. To learn more, visit Google's Webmaster Help Group. Add to
Source: Microsoft
home |
news archives |
site search |
advertise with us
Search engine marketing by Rank for $ales
Web design by MWD
Get our free search engine newsletter Web hosting by Avantex Copyright © Search Engines Today. All rights reserved. |