Antezeta LogoAntezeta Web Marketing

Reflections on search engine optimization, web analytics and web marketing

Antezeta Web Marketing header image 2

Now there are 6 ways to keep website content out of search engines

by · No Comments ·

Several months ago a client inspired me to write a comprehensive guide to keeping website content out of search engines. Usually website owners are focused on the opposite side of search engine optimization, insuring web content is well indexed. Yet, as many can attest, can be all too efficient at finding documents they shouldn’t. Thus, the need to understand what options exist, how they work and which search engines support them.

One problem with the techniques available up until now is that options for digital media have been limited. The official way to keep video, audio and pdf files out of search engines was through the .txt protocol, not a very efficient tool when setting indexing options on a file level.

Google, acutely aware of the growing popularity of video, image and other non-html file types, has responded to the gap by introducing a way to add indexing instructions to the http headers via a “X-Robots-Tag” directive. Any of the Google supported meta robots values may be specified. While the “X-Robots-Tag” directive is an excellent tool, I suspect usage will be limited: most website administrators are probably not too familiar with Apache’s mod_headers or Microsoft’s custom headers.

A second common problem with search engine indexing has been the delay between when a page is removed from a website and when it is finally removed from a search engine’s index. Google is addressing this problem as well by introducing a meta tag attribute called . With this meta tag, sites can specify when a page should be removed from search results. Unfortunately, Google says this tag is currently only limited to “web search” which is a bit strange as they also said that web search has become “universal search“, integrating images, video and maps into the standard document search.

There are a few enhancements I’d like to see:

  • Google’s X-Robots-Tag and unavailable_after entry refers to the ambiguous and obsolete RFC 850 date format. I’d like to see Google refer to a current specification, such as IETF Internet standard RFC 3339, to insure proper date parsing.
  • Google’s webmaster tools console should show site pages and their expiration dates, offering confirmation that unavailable_after has been properly set.
  • Google’s online documentation does not yet seem to reflect these new indexing options.
  • Yahoo!, Microsoft and Ask: please continue the cooperation you’ve show with sitemaps and the noodp meta tag. Please adopt the X-Robots-Tag and unavailable_after indexing directives.

Related article in this site: How to prevent Search Engines from indexing parts of your website: 6 ways.

Similar Posts:

Registration is now open for the next SEO Course (March 22 and 23) and Google Analytics Course (March 14 and 15) in Milan. Don’t miss the opportunity!

Originally published July 28th, 2007

  • Sean Carlos is a web marketing consultant & teacher, assisting companies with their Search (SEO + PPC = SEM), Social Media & Digital Media Measurement strategies. Sean first worked with text indexing in 1990 in a project for the Los Angeles County Museum of Art. Since then he worked for Hewlett-Packard Consulting and later as IT Manager of a real estate website before founding Antezeta in 2006. Sean is an official instructor of the Web Analytics Association and collaborates with the Bocconi University. Born in Providence, RI, USA, Sean received Honors in Physics from Bates College, Maine. He speaks English, Italian and German.


No Comments so far ↓

There are no comments yet...Kick things off by filling out the form below.

Leave a Comment

Warning: Comments are very welcome insofar as they add something to the discussion. Spam and/or polemical comments without a rational justification of the author's position risk being mercilessly deleted at the sole discretion of the administrator. Yes, life is hard :-).