Keep sections of web pages out of Yahoo! with class=”robots-nocontent”


There are occasions when some content on a web page just shouldn’t appear in search engines. The most frequent example is repeating header and footer details, such as site copyright information. This site uses the hcard format to provide contact information site visitors can save in a vcard for use with a PIM such as Thunderbird or Outlook. Yet some of the information required for a detailed vcard is not really appropriate for a search engine’s index. Historically, the best solution was to place such content on a page using JavaScript as have avoided indexing JavaScript (they probably do analyze it). A JavaScript approach to keeping some page information out of search engines isn’t perfect – not all visitors will have JavaScript enabled.

The same folks behind the hcard format proposed providing robots instructions in the css class html attribute to give search engine crawlers detailed handling information for tagged page content sections.

introduces class=”-nocontent”

In May 2007 Yahoo! introduced class=”robots-nocontent”, which is similar but different to the microformat class=”robots-noindex” proposal. It is not clear why Yahoo! didn’t embrace this part of the microformat proposal.

class=”robots-nocontent”, added to a html element, such as a <p> paragraph, <div> section or a <span>, tells Yahoo! to avoid using the content in the class=”robots-nocontent” section when calculating and composing search results. Yahoo will crawl and presumably index this content; the content will just be flagged as “off-limits”. Yahoo! will crawl links in the robots-nocontent sections, so the rel=”” mechanism still needs to be used in robots-nocontent sections, if that is your intent.

Yahoo suggests using the class=”robots-nocontent” attribute on page template sections, sections which repeat throughout a site, such as header and/or footer sections. Should you? For most sites, the answer is don’t bother unless a site’s header, footer or other template information is appearing in search results. The reason is that search engine indexers are already very good at identifying header, footer, navigation and other repeating template information on a site. A site probably doesn’t need extra code to do this.

Lack of universal search engine support

Another consideration is the lack of universal search engine support for the robots-nocontent attribute. Google, Microsoft’s Live and will all ignore the nocontent attribute. It remains to be seen if they adopt Yahoo!’s class=”robots-nocontent” or if they coalesce around the microformat proposal.

Test implementation with userContent.css stylesheet in Firefox

Sites choosing to deploy pages with the robots-nocontent class attribute would be wise to test their implementation. Many browsers offer the possibility to create user specific css settings.

In Firefox, create a userContent.css file in your Firefox profile directory. Add a line similar to the following:

.robots-nocontent {background:#ff6677 !important;}

You will have to restart your browser after modifying your stylesheet. This text should appear with a reddish background if you have the above line in your browser stylesheet.

You may want to add nofollow link highlighting as well:

a[rel="nofollow"] {color:#520F64 !important;background:#049D86 !important;}

Pro

  • Site webmasters have greater granular control on how a site appears in a search engine’s results

Con

  • lack of coordination and common support with Google, Bing and Ask.

Last Update: June 2009

Similar Posts:

Registration is now open for the next SEO Course and Google Analytics Course in Milan. Don’t miss the opportunity!


About Sean Carlos

Sean Carlos is a digital marketing consultant & teacher, assisting companies with their Search (SEO + SEA = SEM), Social Media & Digital Media Measurement strategies. Sean first worked with text indexing in 1990 in a project for the Los Angeles County Museum of Art. Since then he worked for Hewlett-Packard Consulting and later as IT Manager of a real estate website before founding Antezeta in 2006. Sean is an official instructor of the Digital Analytics Association and collaborates with the Bocconi University. He is Chairman of the SMX Search and Social Media Conference, 13 & 14 November in Milan. He is also a co-author of the Treccani encyclopedic dictionary of computer science, ICT & digital media. Born in Providence, RI, USA, Sean received Honors in Physics from Bates College, Maine. He speaks English, Italian and German.

Leave a reply

Warning: Comments are very welcome insofar as they add something to the discussion. Spam and/or polemical comments without a rational justification of the author's position risk being mercilessly deleted at the sole discretion of the administrator. Yes, life is hard :-).