In three short years 1 Google Analytics has become an important tool for many companies looking to get more out of their presence on the web. Google Analytics’ wide range of website reports, from traffic sources to conversion rates, provide invaluable insight into a site’s business performance for an initial cost which is difficult to beat.
One particular report, the Search Engine report, is of particular interest to companies looking to optimize their organic search engine marketing activity. This report identifies sources of search traffic that brought visitors to the website.
For each search engine source, a drill-down feature shows the keywords people used – the very keywords which express a visitor’s intent as they came to your website.
Just to clarify, for the purpose of this article, by search engine or search engine source, we mean search driven traffic – whether it be from a pure search engine like Google, or from an ISP portal which offers a search function, such as Earthlink or Virgin Media.
The default search engine list
Google Analytics has a built in list of about 40 search engine match strings (taking into account duplicate entries for AOL and mamma/mama). For many sites, Google’s default list is probably good enough. That said, two potential limitations to the search engine report’s usefulness become apparent to many as they become more familiar with the report.
The first issue is that many important sources of search traffic aren’t tracked as accurately as may be desired. Google Analytics processes search traffic from many ISP portals, such as ATT or Verizon in the United States and many important regional search engines such Korea’s Naver and Russia’s Yandex & Rambler, as simple referral traffic – all valuable keyword information, i.e. visitor intent, is lost. The same is true of new search engines such as cuil.
The second issue is a tendency to aggregate, or lump if you will, search sources together, making it difficult, if not impossible, to understand where a site is really performing well search wise. Instead of traffic from Google search sites worldwide appearing simply as “google”, it might be nice to know how many visitors came from google.com vs. google.ca and google.co.uk (you can break down traffic from “google” by the dimension “Country/Territory” but you then lose the keyword list for that country). In other cases, as in search driven traffic from ISP portals such Comcast and Earthlink in the US and Orange and Virgin Media in the UK, Google Analytics uses the generic label search.
Microsoft’s renaming of Live Search to Bing has raised another issue: Google seems to be inexplicably slow in updating their standard search engine recognition list even when the need to do so is compelling. (2009-06-05)
Is our site performing better in Google.com than in Google.ca? What about Google.co.uk?
The official list of tracked search engines appears in Google Analytics help2, but this doesn’t always reflect what is actually being tracked. At the time of this writing, Google Analytics currently tracks kvasir, sesam, ozu, terra, nostrum, mynet, ekolay and ilse3 in addition to the search engines officially documented. Fortunately Google makes it relatively easy to modify how Google Analytics detects search driven web traffic.
How to add a search engine to Google Analytics
The Google Analytics documentation notes that you should just insert an entry in your Google Analytics tracking code for each engine you want to add4. The change is highlighted here in red:
where name_of_searchengine is one to four components of the search engine domain name, e.g. www.google.com or it.search.yahoo.com, and q_var is the parameter which contains a web navigator’s keywords. This code can be placed anywhere AFTER
_setAccount is set and before
_trackPageview is called.
In Google’s example search referrer, http://www.google.com/search?q=motorcycle, our search engine recognition string should be
or if we want to be very specific we could add separate entries for each Google domain variant, eg:
_gaq.push(['_addOrganic','google.com','q']); _gaq.push(['_addOrganic','google.co.uk','q']); _gaq.push(['_addOrganic','google.de','q']); _gaq.push(['_addOrganic','google.it','q']);
If we to cover our bases and make sure we haven’t forgotten a local variant, we should follow the specific search engine domains with generic versions, e.g.
_gaq.push(['_addOrganic','google.com','q']); _gaq.push(['_addOrganic','google.co.uk','q']); _gaq.push(['_addOrganic','google.de','q']); _gaq.push(['_addOrganic','google.it','q']); _gaq.push(['_addOrganic','google','q']);
Google can be excruciatingly slow in updating their standard search engine recognition list. 5 days after its launch, Google still treats Bing like any other site. The solution for now is to add
_gaq.push(['_addOrganic','bing','q']); to your Google Analyitcs tracking code. (2009-06-05)
While Google’s default listing isn’t in any particular order (the generic “google” appears before the more specific “google.interia” as “search” appears before of “search.ilse”), my testing indicates that order is important (Google implies this as well5), i.e. Google Analytics stops processing the search engine list once it finds a match. There are several implications here. This line
will match search traffic from google.com.mx if google.com.mx isn’t already specified. Thus, you really should specify all known regional variants of a search engine to avoid misleading matches.
_gaq.push(['_addOrganic','google.com.mx','q']); …other google.com local variants here…
A very generic entry
will also match non-google properties which have used google as a subdomain, e.g. http://google.isp-portal-example.com/. The only way to avoid these false positives is to avoid using generic search engine matching. Any missed search engines will still be found in the referring sites report, albeit without keywords.
In the case of Google, the search string parameter is always q. Some companies, such as the Italian Tiscali, are less consistent. Sometimes Tiscali uses q sometimes query:
_gaq.push(['_addOrganic','tiscali.cz','query']); _gaq.push(['_addOrganic','tiscali.it','q']); _gaq.push(['_addOrganic','tiscali.nl','q']); _gaq.push(['_addOrganic','tiscali.co.uk','query']);
If multiple keyword parameters are currently in use, we need a separate entry for each:
_gaq.push(['_addOrganic','tiscali.cz','query']); _gaq.push(['_addOrganic','tiscali.it','q']); _gaq.push(['_addOrganic','tiscali.nl','q']); _gaq.push(['_addOrganic','tiscali.co.uk','query']); _gaq.push(['_addOrganic','tiscali.co.uk','q']);
If we want to add specific regional variants of search engines Google already recognizes, we must stop Google's generic version from being evaluated before our custom list. Unfortunately, the only way to do that is to tell Google Analytics to ignore their entire default search engine list. Thus we need to insert6
at the beginning of the search engine list and we need to add any of Google’s predefined search engines that we’re still interested in to the bottom of our custom list. Naturally if we decide to ignore Google Analytics’ default search engine list, it will be our responsibility to add new search engines or otherwise keep our search engine list up to date with the right domain names and query parameter strings used by search engines and sources of search traffic.
Search Engine Cache Page Views
An often overlooked part of tracking traffic from search engines involves tracking search engine cache page views. When a search engine displays results, there is often a link to a copy of the page, the cache link, as captured by the search engine crawler next to the URL for the original page (use of the noarchive robots meta tag is the primary reason a cache link won’t appear).
While the number of cache page views is probably low for most sites, adding search engine cache detection can improve search report accuracy and readability with minimal effort. If a web user decides to view a page from your website using Google’s cache view link, a numeric IP address will show up in your referral report. Currently Google’s cached page view IP addresses end in 104, such as 220.127.116.11. Yahoo! uses the domain *.wrs.yahoo.com. Microsoft’s cache views are from the domain cc.msnscache.com. Ask uses the domain *.askcache.com, e.g. www.askcache.com, uk.askcache.com, although cache results don’t seem to be available for all geographies.
Blog, image and product searches
The initial list of search engines has focused mostly on traditional web search. Google’s blog search and rudimentary tracking for Google’s image search is also included. Note that Google’s image search uses a parameter encoded within a parameter7 rendering this tracking only partially useful. Google doesn’t currently use a dedicated subdomain for product searches (formally known as Froogle); they should show up as normal web searches. More attention will be dedicated to blog and product search as this project evolves.
TrackPage call!) to something with a shorter name. Also consider using just a subset of the complete search engine list presented here. The table below can be reordered by geography or search engine to help you choose the most useful search engines and sources of search traffic for your market.
A few parting caveats
- Note that modification of Google Analytics search engine recognition is not retroactive – it impacts data at the moment it is collected. This may make comparison of old and new data more difficult.
- Breaking out local variants of international search engines such as Google and Yahoo! will allow detailed analysis of traffic and keyword performance in each local variant, but make it difficult to have an overall view of world-wide performance for a given search engine.
- This article assumes you’re using the current
ga.jstracking code rather than the legacy
urchin.js. See Brian Clifton’s excellent article on search engine detection with
- Should you decide to replace Google’s standard search engine list, you will have to maintain your list, changing search parameters and/or adding new search engines as appropriate. You lose the benefit of Google doing this for you! Don’t underestimate this point, the web is a very dynamic place. And, as noted above, data incorrectly collected can not be reprocessed.
- Some search engines don’t provide URL parameters in their referrals (they often rewrite parameters using directory slash notation). Google Analytics is unable to parse keywords from these search engines; you’ll find these in your referrer report. Examples include Blekko, Excite.com, the meta search engine dogpile.com, and Google’s own search laboratory, Searchmash.com (now closed).
- The information here is provided without any guarantee – use this information and make changes to Google Analytics at your own risk. Use of this information may result in data loss and/or corruption.
- It is easy to loose data with browser based tracking systems due to configuration issues.
- Google may make changes to Google Analytics at any time which could invalidate custom search engine lists as presented in this article.
- I don’t have access to the Google Analytics source code thus cannot be exactly certain as to how Google Analytics works, neither today nor tomorrow.
Search Engine Recognition Data
Microsoft’s adCenter Analytics
The other leading free browser based web analytics tool is Microsoft’s adCenter Analytics, based on the former DeepMetrix’s LiveSTATS application. Microsoft currently breaks down search visits and keyword traffic for each google and yahoo local domain, e.g. Referrers > Inbound Totals > Referrals – Natural Search > google > phrase > www.google.it. Microsoft’s own live search provides regional information in a parameter rather than in a domain clue; unfortunately adCenter Analytics ignores this information. Microsoft’s adCenter analytics online help seems to be silent on the subject of adding organic search engines. The tracking code doesn’t provide any hint either.
Unfortunately I haven’t yet received an invite to use IndexTools, so I’m reluctant to propose a potential solution I cannot test.
1 Google Analytics was briefly opened to the public on November 14, 2005. Due to unexpected demand, new sign-ups were successively limited while additional infrastructure was put into place.
2 What search engines does Google Analytics identify?
3 Review the values in the variable d.fa contained in the Google Analytics tracking code for a current list.
4 How can I make Google Analytics identify additional search engines in the Referral reports?
5 The documentation for the _clearOrganic() method says “Use this method when you want to define a customized search engine ordering precedence.”
6 This is documented in Google Analytics technical help. See Tracking API: Search Engines and Referrers.
7 This webmaster world thread has a good example of code to solve this problem.
- Use events to track 404 page not found errors in Google Analytics
- Track DuckDuckGo as a search engine in Google Universal Analytics
- Tracking Search Engine Cache Page Views with Web Analytics
- Comparison of Google Analytics / Urchin Tracking Scripts