UK and US English Dialect Considerations for Site Internationalization


Search Engines and Site Localization

While there are few differences between the UK and US English dialects which might lead to miscomprehension, Noah Webster‘s spelling reform does lead to interesting issues which need to be considered when designing sites for international audiences.

Note Update: This document was written in 2006 and no longer represents the current state of search affairs. It has been left here as a historical reference. Search engines continually refine their algorithms and that is reflected in how they currently handle regional linguistic differences.

Is it “my favorite color” or “my favourite colour”?

While it may seem like an arcane academic question, how you spell your English language content can determine your site’s visibility in search engines and how your site is perceived by your visitors.

With about two-thirds of native English speakers in the US, American spelling predominates the web. Not surprisingly, a non-scientific survey of search expressions using both US and UK spellings yields more matches for the US variant:

Google search for “my favorite color” (2006-01-17)
Search (try it!)Google Matches% Total
US: my favorite color36,100,00084.9
GB: my favourite colour6,430,00015.1
42,530,000
Google search for “search engine optimization” (2006-01-17)
Search (try it!)Google Matches% Total
US: search engine optimization33,300,00089.9
GB: search engine optimisation3,760,00010.1
37,060,000

Several interesting points emerge from these search results:

  • Google does not respond with suggested alternatives Did you mean… as commonly occurs with misspellings
  • Google does not seem to be using an equivalences dictionary — the results are clearly different.
  • The British Council may need to up their investments to stem US linguistic hegemony :-).

NoteNote that some pages with US spellings will show up in queries using British spellings and vice versa. This is probably due to the the use of UK or US spelling in link text on external sites which points to this content, i.e. a link Search Engine Optimisation Resources will point to a document written using American English, i.e. Search Engine Optimization Resources.

Should I use US or UK English?

If your audience is predominantly limited to the US or the UK, the choice is easy: use the dialect which will most resonate with your audience; this will be the dialect they use to search the Internet and it will be what they expect to see when they browse your site.

For sites targeting an international audience, the choice is a bit more difficult. As US English will resonate more closely with two-thirds of native speakers, this might be the best choice. US English is often taught as the preferred dialect in many parts of Asia. UK English should be considered when targeting primarily Europe and / or Commonwealth countries. Keyword research and analysis will help answer this question. For starters, once identified, are your keywords exactly the same in US and UK English?

Should I Mix and Match US and UK English?

A problem inherent in limiting your site’s content to either the US or UK dialect is that you’ll limit your search engine visibility — users searching with terms specific to one dialect or the other will be less likely to find you (your content may still appear if links on external sites pointing to your content contain the search terms in the same dialect as the search query).

One solution is to use both US and UK terms in your site’s content. You may choose to to mix dialects in the same page or to write some pages in US English and others in UK English, all at the risk of appearing inconsistent to a perceptive site visitor.

Site Localization and the Google Duplicate Content Penalty

A user-oriented solution to addressing an international audience is to maintain a .com site for the US dialect and a .co.uk site for the UK dialect. While a sensitive choice from a usability point of view, this approach runs afoul of Google’s recently refined duplicate content detection algorithm (changes were made in the so-called Jagger update last fall).

In essence, Google attempts to identify and penalize sites which are predominately carbon copies of other sites. Should you decide to offer your content in both US and UK dialects, we suggest you avoid Google’s potential wrath by focusing your marketing efforts on one of the two sites. Tell Google to ignore the other site with a robots exclusion file (at the site level) or a robots exclusion meta tag (at the page level). While a bit draconian, excluding your “duplicate” content from Google’s reach seems to be the only solution currently available to avoid incurring Google’s duplicate content penalty.

TipConsider blocking all search engine crawlers from your secondary English site. By doing so now, you’ll avoid future problems should Yahoo, MSN and others adopt a duplicate content penalty similar to Google’s. Of course, your alternative English spellings won’t be found….

Tag Your Content with the Language Dialect to Facilitate Proper Search Engine Indexing

You can help a search engine identify the language dialect of a page’s content by using the html lang tag in the html declaration, i.e.

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">

or

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-GB" lang="en-GB">

as the case may be. Refer to the W3C Language tags in HTML and XML document for further information.

While we’re on the topic, note that language codes can also be set at the http heading level. This is used mostly by browsers.

for the Apache server, add a line similar to

AddLanguage en-GB .html

to your web server configuration file or .htaccess file. Your server will then include

Content-Language: en-GB

in its http headers. http headers can be viewed in Firefox using the livehttpheaders extension. Microsoft users should consider Microsoft’s wfetch.exe tool.

What’s your experience?

What experience have you had resolving internationalization issues?

Contact Us with feedback on your experience or to let us help you with your Search Engine Optimization and Web Analytics needs.

The use of the term Merit-based™ in conjunction with Search Engine Optimization is a Trademark of Antezeta.

Similar Posts:

Registration is now open for the next SEO Course and Google Analytics Course in Milan. Don’t miss the opportunity!


About Sean Carlos

Sean Carlos is a digital marketing consultant & teacher, assisting companies with their Search (SEO + SEA = SEM), Social Media & Digital Media Analytics strategies. Sean first worked with text indexing in 1990 in a project for the Los Angeles County Museum of Art. Since then he worked for Hewlett-Packard Consulting and later as IT Manager of a real estate website before founding Antezeta in 2006. Sean is an official instructor of the Digital Analytics Association and collaborates with the Bocconi University. He is Chairman of the SMX Search and Social Media Conference, 12 & 13 November in Milan. He is also a co-author of the Treccani encyclopedic dictionary of computer science, ICT & digital media. Born in Providence, RI, USA, Sean received Honors in Physics from Bates College, Maine. He speaks English, Italian and German.

2 Responses to "UK and US English Dialect Considerations for Site Internationalization"