Antezeta LogoAntezeta Web Marketing

Reflections on search engine optimization, web analytics and web marketing

Antezeta Web Marketing header image 2

Tracking Search Engine Cache Page Views with Web Analytics

by · 1 Comment ·

A small percentage of search engine users may view a web site using a search engine’s saved copy of site pages, their cached version. The cached copy the search engine serves to the user usually contains to embedded objects present in the original site: images, stylesheets, javascript, etc. Organizations focusing on web marketing activities, such as search engine optimization, will want to track all search engine activity, including cached page views.

Referrers from the search engine’s cached copy will show up in the site’s web server log files, including the keywords and keyword phrases used to find the cached copy. In some cases, the user will click through to the original website, viewing a real page with cache referring information in the web server log file.

Cache views are more difficult for Web Analytics software to recognize, but it can be done.

A tool must dissect the search engine referring URL as in this example:

http://64.233.179.104/search?q=cache:l5D4yOKeZaYJ:www.antezeta.com/search-engines-site-
localization-duplicate-content.html+google+dialect&hl=en&ct=clnk&cd=9
ItemDescription
http://64.233.179.104/A known Google IP address.
searchThe Google Service. Others you may see include translate_c
q=cache:l5D4yOKeZaYJ:Indicates a query, made to an item in cache. The cache ID is a 12 character alphanumeric string.
www.antezeta.comDomain containing item matching query terms
search-engines-site-localization-duplicate-content.htmlObject matching query terms (html page, pdf…)
google dialectQuery words entered by user
hl=enGoogle Interface Human Language code (English)
ct=clnkNot needed
cd=9Not needed

In some cases, a user may view a search engine’s cached copy of a page without entering search words in a search engine. How? Through a search engine browser toolbar. Such a referrer will look like this example:

http://72.14.207.104/search?sourceid=navclient&ie=UTF-8&rls=GGLG,GGLG:2005-50,GGLG:
en&q=cache:http%3A%2F%2Fwww.antezeta.com%2Fawstats.html

We have added logic to the Web Analytics application Search Engine Recognition Module to better recognize Search Engine Cache query terms, page views and click-throughs to a site.

  1. Google Service IPs list has been increased. To do: find definitive list
  2. Introduced logic to parse search keywords. Currently only works for Google cache IDs without numbers. The main AWStats program will probably have to be modified to recognize alphanumeric cache IDs.
  3. Google Translate traffic is currently included in Google Cache traffic. Ideally, this would be separated out. It appears again that this will require a change to the main AWStats program.

Yahoo!, Ask and MSN

Originally published June 24th, 2006

  • Sean Carlos is a web marketing consultant & teacher, assisting companies with their Search (SEO + PPC = SEM), Social Media & Digital Media Measurement strategies. Sean first worked with text indexing in 1990 in a project for the Los Angeles County Museum of Art. Since then he worked for Hewlett-Packard Consulting and later as IT Manager of a real estate website before founding Antezeta in 2006. Sean is an official instructor of the Digital Analytics Association and collaborates with the Bocconi University. He is a co-author of the Treccani encyclopedic dictionary of computer science, ICT & digital media. Born in Providence, RI, USA, Sean received Honors in Physics from Bates College, Maine. He speaks English, Italian and German.


One Comment so far ↓

Leave a Comment

Warning: Comments are very welcome insofar as they add something to the discussion. Spam and/or polemical comments without a rational justification of the author's position risk being mercilessly deleted at the sole discretion of the administrator. Yes, life is hard :-).