Search Engines and Site Localization
While there are few differences between the UK and US English dialects which might lead to miscomprehension, Noah Webster‘s spelling reform does lead to interesting issues which need to be considered when designing sites for international audiences.
Update: This document was written in 2006 and no longer represents the current state of search affairs. It has been left here as a historical reference. Search engines continually refine their algorithms and that is reflected in how they currently handle regional linguistic differences.
Is it “my favorite color” or “my favourite colour”?
While it may seem like an arcane academic question, how you spell your English language content can determine your site’s visibility in search engines and how your site is perceived by your visitors.
With about two-thirds of native English speakers in the US, American spelling predominates the web. Not surprisingly, a non-scientific survey of search expressions using both US and UK spellings yields more matches for the US variant:
|Search (try it!)||Google Matches||% Total|
|US: my favorite color||36,100,000||84.9|
|GB: my favourite colour||6,430,000||15.1|
|Search (try it!)||Google Matches||% Total|
|US: search engine optimization||33,300,000||89.9|
|GB: search engine optimisation||3,760,000||10.1|
Several interesting points emerge from these search results:
- Google does not respond with suggested alternatives Did you mean… as commonly occurs with misspellings
- Google does not seem to be using an equivalences dictionary — the results are clearly different.
- The British Council may need to up their investments to stem US linguistic hegemony :-).
Note that some pages with US spellings will show up in queries using British spellings and vice versa. This is probably due to the the use of UK or US spelling in link text on external sites which points to this content, i.e. a link Search Engine Optimisation Resources will point to a document written using American English, i.e. Search Engine Optimization Resources.
Should I use US or UK English?
If your audience is predominantly limited to the US or the UK, the choice is easy: use the dialect which will most resonate with your audience; this will be the dialect they use to search the Internet and it will be what they expect to see when they browse your site.
For sites targeting an international audience, the choice is a bit more difficult. As US English will resonate more closely with two-thirds of native speakers, this might be the best choice. US English is often taught as the preferred dialect in many parts of Asia. UK English should be considered when targeting primarily Europe and / or Commonwealth countries. Keyword research and analysis will help answer this question. For starters, once identified, are your keywords exactly the same in US and UK English?
Should I Mix and Match US and UK English?
A problem inherent in limiting your site’s content to either the US or UK dialect is that you’ll limit your search engine visibility — users searching with terms specific to one dialect or the other will be less likely to find you (your content may still appear if links on external sites pointing to your content contain the search terms in the same dialect as the search query).
One solution is to use both US and UK terms in your site’s content. You may choose to to mix dialects in the same page or to write some pages in US English and others in UK English, all at the risk of appearing inconsistent to a perceptive site visitor.
Site Localization and the Google Duplicate Content Penalty
A user-oriented solution to addressing an international audience is to maintain a .com site for the US dialect and a .co.uk site for the UK dialect. While a sensitive choice from a usability point of view, this approach runs afoul of Google’s recently refined duplicate content detection algorithm (changes were made in the so-called Jagger update last fall).
In essence, Google attempts to identify and penalize sites which are predominately carbon copies of other sites. Should you decide to offer your content in both US and UK dialects, we suggest you avoid Google’s potential wrath by focusing your marketing efforts on one of the two sites. Tell Google to ignore the other site with a robots exclusion file (at the site level) or a robots exclusion meta tag (at the page level). While a bit draconian, excluding your “duplicate” content from Google’s reach seems to be the only solution currently available to avoid incurring Google’s duplicate content penalty.
Consider blocking all search engine crawlers from your secondary English site. By doing so now, you’ll avoid future problems should Yahoo, MSN and others adopt a duplicate content penalty similar to Google’s. Of course, your alternative English spellings won’t be found….
Tag Your Content with the Language Dialect to Facilitate Proper Search Engine Indexing
You can help a search engine identify the language dialect of a page’s content by using the html lang tag in the html declaration, i.e.
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-GB" lang="en-GB">
as the case may be. Refer to the W3C Language tags in HTML and XML document for further information.
While we’re on the topic, note that language codes can also be set at the http heading level. This is used mostly by browsers.
for the Apache server, add a line similar to
AddLanguage en-GB .html
to your web server configuration file or .htaccess file. Your server will then include
What’s your experience?
What experience have you had resolving internationalization issues?
The use of the term Merit-based™ in conjunction with Search Engine Optimization is a Trademark of Antezeta.
- How to Specify an HTML Web Document Language for good SEO
- Internationalization of Web Sites at ZenaCamp, Genoa (Genova)
- Top level domains, subdomains or directories for Search Engine Optimization of multilingual websites?
- Tracking Search Engine Cache Page Views with Web Analytics
- Accented Characters, Symbols and Special Characters in HTML Documents: Considerations for Search Engine Optimization, Usability and XML Feeds.