Improve search engine and keyword reporting in Google Analytics, a SEO strategy


In three short years 1 Google Analytics has become an important tool for many companies looking to get more out of their presence on the web. Google Analytics’ wide range of website reports, from traffic sources to conversion rates, provide invaluable insight into a site’s business performance for an initial cost which is difficult to beat.

One particular report, the Search Engine report, is of particular interest to companies looking to optimize their organic search engine marketing activity. This report identifies sources of search traffic that brought visitors to the website.

For each search engine source, a drill-down feature shows the keywords people used – the very keywords which express a visitor’s intent as they came to your website.

Updated 2013-08-26: The examples now refer to the asynchronous 3rd version of Google’s tracking code, released in 2009. Universal Analytics users should consult this updated article.

Just to clarify, for the purpose of this article, by search engine or search engine source, we mean search driven traffic – whether it be from a pure search engine like Google, or from an ISP portal which offers a search function, such as Earthlink or Virgin Media.

The default search engine list

Google Analytics has a built in list of about 40 search engine match strings (taking into account duplicate entries for AOL and mamma/mama). For many sites, Google’s default list is probably good enough. That said, two potential limitations to the search engine report’s usefulness become apparent to many as they become more familiar with the report.

The first issue is that many important sources of search traffic aren’t tracked as accurately as may be desired. Google Analytics processes search traffic from many ISP portals, such as ATT or Verizon in the United States and many important regional search engines such Korea’s Naver and Russia’s Yandex & Rambler, as simple referral traffic – all valuable keyword information, i.e. visitor intent, is lost. The same is true of new search engines such as cuil.

The second issue is a tendency to aggregate, or lump if you will, search sources together, making it difficult, if not impossible, to understand where a site is really performing well search wise. Instead of traffic from Google search sites worldwide appearing simply as “google”, it might be nice to know how many visitors came from google.com vs. google.ca and google.co.uk (you can break down traffic from “google” by the dimension “Country/Territory” but you then lose the keyword list for that country). In other cases, as in search driven traffic from ISP portals such Comcast and Earthlink in the US and Orange and Virgin Media in the UK, Google Analytics uses the generic label search.

Microsoft’s renaming of Live Search to Bing has raised another issue: Google seems to be inexplicably slow in updating their standard search engine recognition list even when the need to do so is compelling. (2009-06-05)

Is our site performing better in Google.com than in Google.ca? What about Google.co.uk?

The official list of tracked search engines appears in Google Analytics help2, but this doesn’t always reflect what is actually being tracked. At the time of this writing, Google Analytics currently tracks kvasir, sesam, ozu, terra, nostrum, mynet, ekolay and ilse3 in addition to the search engines officially documented. Fortunately Google makes it relatively easy to modify how Google Analytics detects search driven web traffic.

How to add a search engine to Google Analytics

The Google Analytics documentation notes that you should just insert an entry in your Google Analytics tracking code for each engine you want to add4. The change is highlighted here in red:

<script type="text/javascript">

  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-xxxxx-z']);
  _gaq.push(['_addOrganic','name_of_searchengine','q_var']);
  _gaq.push(['_trackPageview']);

  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();

</script>

where name_of_searchengine is one to four components of the search engine domain name, e.g. www.google.com or it.search.yahoo.com, and q_var is the parameter which contains a web navigator’s keywords. This code can be placed anywhere AFTER _setAccount is set and before _trackPageview is called.

In Google’s example search referrer, http://www.google.com/search?q=motorcycle, our search engine recognition string should be

_gaq.push(['_addOrganic','google','q']);

or if we want to be very specific we could add separate entries for each Google domain variant, eg:

_gaq.push(['_addOrganic','google.com','q']);
_gaq.push(['_addOrganic','google.co.uk','q']);
_gaq.push(['_addOrganic','google.de','q']);
_gaq.push(['_addOrganic','google.it','q']);

If we to cover our bases and make sure we haven’t forgotten a local variant, we should follow the specific search engine domains with generic versions, e.g.

_gaq.push(['_addOrganic','google.com','q']);
_gaq.push(['_addOrganic','google.co.uk','q']);
_gaq.push(['_addOrganic','google.de','q']);
_gaq.push(['_addOrganic','google.it','q']);
_gaq.push(['_addOrganic','google','q']);

Google can be excruciatingly slow in updating their standard search engine recognition list. 5 days after its launch, Google still treats Bing like any other site. The solution for now is to add _gaq.push(['_addOrganic','bing','q']); to your Google Analyitcs tracking code. (2009-06-05)

While Google’s default listing isn’t in any particular order (the generic “google” appears before the more specific “google.interia” as “search” appears before of “search.ilse”), my testing indicates that order is important (Google implies this as well5), i.e. Google Analytics stops processing the search engine list once it finds a match. There are several implications here. This line

_gaq.push(['_addOrganic','google.com','q']);

will match search traffic from google.com.mx if google.com.mx isn’t already specified. Thus, you really should specify all known regional variants of a search engine to avoid misleading matches.

_gaq.push(['_addOrganic','google.com.mx','q']);
…other google.com local variants here…
_gaq.push(['_addOrganic','google.com','q']);

A very generic entry

_gaq.push(['_addOrganic','google','q']);

will also match non-google properties which have used google as a subdomain, e.g. http://google.isp-portal-example.com/. The only way to avoid these false positives is to avoid using generic search engine matching. Any missed search engines will still be found in the referring sites report, albeit without keywords.

In the case of Google, the search string parameter is always q. Some companies, such as the Italian Tiscali, are less consistent. Sometimes Tiscali uses q sometimes query:

_gaq.push(['_addOrganic','tiscali.cz','query']);
_gaq.push(['_addOrganic','tiscali.it','q']);
_gaq.push(['_addOrganic','tiscali.nl','q']);
_gaq.push(['_addOrganic','tiscali.co.uk','query']);

If multiple keyword parameters are currently in use, we need a separate entry for each:

_gaq.push(['_addOrganic','tiscali.cz','query']);
_gaq.push(['_addOrganic','tiscali.it','q']);
_gaq.push(['_addOrganic','tiscali.nl','q']);
_gaq.push(['_addOrganic','tiscali.co.uk','query']);
_gaq.push(['_addOrganic','tiscali.co.uk','q']);

If we want to add specific regional variants of search engines Google already recognizes, we must stop Google's generic version from being evaluated before our custom list. Unfortunately, the only way to do that is to tell Google Analytics to ignore their entire default search engine list. Thus we need to insert6

_gaq.push(['_clearOrganic();

at the beginning of the search engine list and we need to add any of Google’s predefined search engines that we’re still interested in to the bottom of our custom list. Naturally if we decide to ignore Google Analytics’ default search engine list, it will be our responsibility to add new search engines or otherwise keep our search engine list up to date with the right domain names and query parameter strings used by search engines and sources of search traffic.

A custom search engine list can start to become rather long, adding more bloat to our web pages. The solution to this problem is to simply put all of the custom search engines in an external JavaScript file so that the overhead of loading the list only occurs once during a user’s navigation session (and even less often for repeat visitors depending on the settings of a web server’s cache http directives).

Returning back to our original JavaScript code, we need to close the code which sets up the tracking parameters. After this, we insert our external list of search engines. Once this is done, we add any additional personalization and call _gaq.push(['_trackPageview']);.

<script type="text/javascript">

  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-xxxxx-z']);
</script>
<script src="/js/gasea.js" type="text/javascript"></script>
<script type="text/javascript">
  _gaq.push(['_trackPageview']);

  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();

</script>

Note: change /js/gasea.js to match the directory and file name of your JavaScript include file. Replace the Google Analytics ID in blue with your account and profile ID.

Search Engine Cache Page Views

An often overlooked part of tracking traffic from search engines involves tracking search engine cache page views. When a search engine displays results, there is often a link to a copy of the page, the cache link, as captured by the search engine crawler next to the URL for the original page (use of the noarchive robots meta tag is the primary reason a cache link won’t appear).

While the number of cache page views is probably low for most sites, adding search engine cache detection can improve search report accuracy and readability with minimal effort. If a web user decides to view a page from your website using Google’s cache view link, a numeric IP address will show up in your referral report. Currently Google’s cached page view IP addresses end in 104, such as 72.14.205.104. Yahoo! uses the domain *.wrs.yahoo.com. Microsoft’s cache views are from the domain cc.msnscache.com. Ask uses the domain *.askcache.com, e.g. www.askcache.com, uk.askcache.com, although cache results don’t seem to be available for all geographies.

Blog, image and product searches

The initial list of search engines has focused mostly on traditional web search. Google’s blog search and rudimentary tracking for Google’s image search is also included. Note that Google’s image search uses a parameter encoded within a parameter7 rendering this tracking only partially useful. Google doesn’t currently use a dedicated subdomain for product searches (formally known as Froogle); they should show up as normal web searches. More attention will be dedicated to blog and product search as this project evolves.

Performance considerations

An exhaustive list of search engines could degrade your website’s performance when a new user requests their first page – an extensive listing is currently more than 30k. Good webserver cache directives can insure successive page views load the external JavaScript file from the browser cache. Performance can also be mitigated by loading analytics code at the bottom of your html page rather than at the top. This is a case of putting the user experience first (at the risk of reducing pages tracked) and I highly recommend it. Should the size of the JavaScript include file be a consideration, consider renaming the pageTracker variable (inside and outside the file – from the declaration to the TrackPage call!) to something with a shorter name. Also consider using just a subset of the complete search engine list presented here. The table below can be reordered by geography or search engine to help you choose the most useful search engines and sources of search traffic for your market.

A few parting caveats

  1. Note that modification of Google Analytics search engine recognition is not retroactive – it impacts data at the moment it is collected. This may make comparison of old and new data more difficult.
  2. Breaking out local variants of international search engines such as Google and Yahoo! will allow detailed analysis of traffic and keyword performance in each local variant, but make it difficult to have an overall view of world-wide performance for a given search engine.
  3. This article assumes you’re using the current ga.js tracking code rather than the legacy urchin.js. See Brian Clifton’s excellent article on search engine detection with urchin.js.
  4. Should you decide to replace Google’s standard search engine list, you will have to maintain your list, changing search parameters and/or adding new search engines as appropriate. You lose the benefit of Google doing this for you! Don’t underestimate this point, the web is a very dynamic place. And, as noted above, data incorrectly collected can not be reprocessed.
  5. Some search engines don’t provide URL parameters in their referrals (they often rewrite parameters using directory slash notation). Google Analytics is unable to parse keywords from these search engines; you’ll find these in your referrer report. Examples include Blekko, Excite.com, the meta search engine dogpile.com, and Google’s own search laboratory, Searchmash.com (now closed).
  6. The information here is provided without any guarantee – use this information and make changes to Google Analytics at your own risk. Use of this information may result in data loss and/or corruption.
    1. It is easy to loose data with browser based tracking systems due to configuration issues.
    2. Google may make changes to Google Analytics at any time which could invalidate custom search engine lists as presented in this article.
    3. I don’t have access to the Google Analytics source code thus cannot be exactly certain as to how Google Analytics works, neither today nor tomorrow.
    4. Etc.!

Search Engine Recognition Data

Simply copy our JavaScript search engine listing for Google Analytics version 2 or asynchronous version 3 tracking code to a directory on your site and modify your Google Analytics tracking code as indicated above. While you may refer to this file with a link in a post on your site, please don’t include it in your site page code so that it is called from our server. This would be bad netiquette (using someone else’s bandwidth without permission) and a potential security risk for you. We could change the code executing on your site at any time, to display incriminating images or worse. You wouldn’t want that now, would you? Feel free to consult the list of search engines used for the JavaScript file.

Microsoft’s adCenter Analytics

The other leading free browser based web analytics tool is Microsoft’s adCenter Analytics, based on the former DeepMetrix’s LiveSTATS application. Microsoft currently breaks down search visits and keyword traffic for each google and yahoo local domain, e.g. Referrers > Inbound Totals > Referrals – Natural Search > google > phrase > www.google.it. Microsoft’s own live search provides regional information in a parameter rather than in a domain clue; unfortunately adCenter Analytics ignores this information. Microsoft’s adCenter analytics online help seems to be silent on the subject of adding organic search engines. The tracking code doesn’t provide any hint either.

Yahoo!’s IndexTools

Unfortunately I haven’t yet received an invite to use IndexTools, so I’m reluctant to propose a potential solution I cannot test.


1 Google Analytics was briefly opened to the public on November 14, 2005. Due to unexpected demand, new sign-ups were successively limited while additional infrastructure was put into place.
2 What search engines does Google Analytics identify?
3 Review the values in the variable d.fa contained in the Google Analytics tracking code for a current list.
4 How can I make Google Analytics identify additional search engines in the Referral reports?
5 The documentation for the _clearOrganic() method says “Use this method when you want to define a customized search engine ordering precedence.
6 This is documented in Google Analytics technical help. See Tracking API: Search Engines and Referrers.
7 This webmaster world thread has a good example of code to solve this problem.

Similar Posts:

Registration is now open for the next SEO Course and Google Analytics Course in Milan. Don’t miss the opportunity!


About Sean Carlos

Sean Carlos is a digital marketing consultant & teacher, assisting companies with their Search (SEO + SEA = SEM), Social Media & Digital Media Analytics strategies. Sean first worked with text indexing in 1990 in a project for the Los Angeles County Museum of Art. Since then he worked for Hewlett-Packard Consulting and later as IT Manager of a real estate website before founding Antezeta in 2006. Sean is an official instructor of the Digital Analytics Association and collaborates with the Bocconi University. He is Chairman of the SMX Search and Social Media Conference, 13 & 14 November in Milan. He is also a co-author of the Treccani encyclopedic dictionary of computer science, ICT & digital media. Born in Providence, RI, USA, Sean received Honors in Physics from Bates College, Maine. He speaks English, Italian and German.

2 Responses to "Improve search engine and keyword reporting in Google Analytics, a SEO strategy"

Leave a reply

Warning: Comments are very welcome insofar as they add something to the discussion. Spam and/or polemical comments without a rational justification of the author's position risk being mercilessly deleted at the sole discretion of the administrator. Yes, life is hard :-).