Antezeta LogoAntezeta Web Marketing

Reflections on search engine optimization, web analytics and web marketing

Antezeta Web Marketing header image 2

Domain & URL Strategies for Multilingual & Multinational Sites

by · 8 Comments ·

One problem search engines face when indexing and a website’s content is to identify the target geographic and linguistic market a particular website page is trying to reach. The world wide web is indeed that, and the issue is particularly complicated for websites in languages which have a broad geographic reach such as English and Spanish.

Fortunately for site owners, there are clues search engines use to match website content with searcher location. By understanding these clues and user behavior, site owners can choose a domain and strategy which best fits their needs.

I discussed domain and URL strategies at the SMX West 2010 Search Marketing Expo conference. For the benefit of those who couldn’t attend, the slides and a rough transcript follow. I’d strongly recommend that you attend a future SMX conference in person – from search marketing tips to great networking (and fine food), you won’t regret it.

Location is relative

Mark Twain often cited Benjamin Franklin in saying that there are only two certainties in life: death and taxes. Slides 2-4 outline a few probabilities, namely the SMX audience can be located at a point on the map, much of the audience is from nearby and is probably targeting the domestic US market. All rather straightforward stuff until one looks at the problem from the point of view of a search engine (slide 5). A company’s website is potentially in competition with similar websites in English, from Canada to the UK, from South Africa to New Zealand. Even those targeting a domestic audience should be aware of the basic issues to avoid unpleasant surprises.

The Five Options for Organizing Website Content

There are five ways to organize web content along the two dimensions to the market targeting problem, geography and/or language, using domains and URLs.

1. Top Level Domains (Domain Suffixes)

The first (and my preferred) way to organize web content is to put each site on a separate domain. Domains come in two flavors. In addition to the generic top level domains (gTLD), e.g. .com, .net, .org, .biz, we have over 250 country code top level domains (ccTLD) to choose from. A few actually refer to a region rather than a country (.cat Catalonia, .eu Europe) and some are often used generically (.tv) despite their official designation. A common reason to use multiple domains for the same language is to deliver content tailored to a local audience, e.g. products priced in $ for the US, £for the UK. Content may be essentially the same but should be tweaked as appropriate, e.g. a shopping cart becomes a trolley.

www.mysite.comGeneric Top Level Domains (gTLD)
www.mysite.ca
www.mysite.mx
www.mysite.it
Country code Top Level Domains (ccTLD)

2. Subdomains (domain prefixes)

As an alternative to registering domains for every geography of interest, subdomains of a primary domain can be used to create individual sites. Typically a generic top level domain is used for this purpose. While there are no formal naming conventions or requirements, sites generally use two or three letter ISO abbreviations for languages and countries. Each subdomain will share the same master domain, yet will be considered as a unique site, with its own identity, by search engines.

en.mysite.com
uk.mysite.com
es.mysite.com
it.mysite.com

3. Directories (Folders)

Directories on the web server can be used to organize web content for a geography or language group. Unlike the previous two options, each folder is part of the same website – the directories solution creates one website with multiple identities.

www.mysite.com/en
www.mysite.com/uk
www.mysite.com/es
www.mysite.com/it

4. URL Parameters

URL parameters are another option for delivering a page with geotargeted content and/or to set the page’s language. While google makes extensive use of top level domains, in many cases they also support URL parameters, e.g. hl=<code>, to change the language of a page:

http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=62399

Bing allows users to change the country and language targeting through their preferences menu. Doing so temporarily adds the parameter setmkt=<code> to the bing URL, e.g.

http://www.bing.com/?setmkt=de-CH

where de stands for the German language and CH for Switzerland.

URL parameters are not recommend if a site is looking for search engine traffic as complex URLs are neither user nor search engine friendly.

5. Chance

The final option for organizing website content targeting a geographic location or a linguistic group is chance. Typically content is deployed on a domain in a haphazard fashion. In the most egregious cases, multiple languages may appear in a single page. No, this is not a good strategy.

Search Engine Signals – Considerations Driving a Domain / URL Strategy

Search Engines Guess “Intended” Target Market

Search engines may associate a website to a geographic market by looking at domain country codes (ccTLD) or server IP for generic domains (gTLD). Google Engineer John Mueller noted in the Google Webmaster Forum,

“Yes, we do try to find context from these two factors (TLD & server IP) … however, if your site has a geographic TLD/ccTLD (like .co.nz) then we will not use the location of the server as well. Doing that would be a bit confusing, we can’t really “average” between New Zealand and the USA… At any rate, if you are using a ccTLD like .co.nz you really don’t have to worry about where you’re hosting your website, the ccTLD is generally a much stronger signal than the server’s location could ever be. ”

In her presentation at SMX West 2010, Google Engineer Maile Ohye stated that in rare cases Google may consider address information found on the website itself.

Google Webmaster Tools Region Setting

In case a site uses a generic top level domain (gTLD) Google offers the possibility to assign the site to a target market using Google’s webmaster Tools. This can also be done for subdomains and folders, as long as each is set up as a separate “site” in Google Webmaster tools. As this option is Google specific, it won’t solve your problems for Bing or any of the international search engines (e.g. Baidu, Naver, Yandex, Rambler).

Language Recognition

As there may be multiple languages used within a target geography, e.g. Canada, Belgium, Switzerland, search engines will analyze each content page to determine its primary language. http header (Content-Language) and attribute tags (lang) exist to specify a web document’s language, yet they aren’t used too often. Search engines must use linguistic tools, such as n-gram analysis, to figure out the human language of web content.

Inbound Links ()

practitioners know that incoming links are one of the strongest signals search engines use in determining search engine results. Yet not all links are equal – links have multiple attributes and these attributes need to be considered. Search engines will most certainly look at the language and geography of the pages linking to a site’s page when considering what the page’s target market might be.

URLs and User Behavior

Various studies have shown that when scanning search engine results, users will consider not just a search result title and summary, but they consider the search result URL as well. While I haven’t seen a study which looked at ccTLDs, I think it is a safe bet that users will prefer a domain from their country. The exception might be for geographies where fraud and corruption are rife. User behavior should be a significant consideration when choosing an appropriate URL strategy.

Specific Considerations for the Three Recommended Options

The following section lists some of the pros and cons for the recommended domain and URL options. While not exhaustive, it should help you in defining the best strategy for your company.

Country Code Top Level Domains (www.mysite.it)

Advantages:

  • It is easy to tailor each site for local content differences, e.g. currency, product availability, legal terms and conditions
  • Sends a strong signal to search engines and users regarding our target market
  • As each domain is a separate site, they are easy to track separately in Bing & Google Webmaster Tools and in various packages

Disadvantages:

  • If a language is spoken in multiple countries incoming links are diluted as they will be divided across multiple domains. That said, these back links may be more relevant as presumably “local” (national) links will point to “local” domains. Many seem to worry about a “duplicate content” penalty which might occur in this scenario where essentially similar content is deployed on multiple domains. Search engines know that multinational companies need to tailor their content to local markets, so this isn’t really a problem.
  • If multiple languages used in country, we still have this issue to manage using subdomains or folders, e.g. for Canada (French, English), Switzerland (French, German, Italian), etc.
  • Country specific domains often present registration hurdles. The domain may already taken or there may be physical presence or other requirements.

Subdomains for Each Market (it.mysite.com)

Advantages:

  • No registration issues
  • Each subdomain can be hosted in the local target market, sending a strong geolocation signal (but how many IT departments will support this level of complexity?)
  • As each subdomain is a separate site, they are easy to track separately in Bing & Google Webmaster Tools and in various Web Analytics packages

Open issues:

  • Market needs, e.g. currency, product availability, etc. may still require multiple subdomains for some languages

Disadvantages:

  • We still have inbound link dilution (but more relevant links)
  • Subdomains (domain prefixes) provide a weaker URL signal for users scanning search results when compared to top level domains (domain suffixes)

Directories for Each Market (www.mysite.com/it)

Advantages:

  • No registration issues
  • Link aggregation: all backlinks go to the same domain (but are from discordant sources in terms of language, geography)

Open issues:

  • We may still need multiple directories for some languages

Disadvantages:

  • Geolocation targeting is ambiguous. Targeting can be specified in Google Webmaster tools for generic top level domains if you set each folder up as an authenticated site. As this is a Google only solution, this isn’t really a substitute for the top level domain and subdomain strategies outlined above.
  • Multiple languages are mixed in same site, potentially sending an ambiguous message to search engines.
  • Directories will most certainly be a weaker URL signal for users scanning search results
  • It may be difficult to track folders as separate sites in Web Analytics tools and in Bing Webmaster Tools

Defining an Appropriate Domain / URL Strategy

Hopefully the options available to site owners are now clearer! Naturally I’m available to help you in defining the best strategy for your business. Sometimes just having a third party at the table in discussions with IT can be a big help. Alternatively, you might find our SEO course right for you. Feel free to contact me today for more information.

If you can read this, your browser doesn’t know about the object tag. International SEO Domain & URL Strategies
Download the entire international domain selection presentation.

Similar Posts:

Registration is now open for the next SEO Course (May 14 and 15) and Google Analytics Course (May 9 and 10) in Milan. Don’t miss the opportunity!

Originally published March 7th, 2010

  • Sean Carlos is a web marketing consultant & teacher, assisting companies with their Search (SEO + PPC = SEM), Social Media & Digital Media Measurement strategies. Sean first worked with text indexing in 1990 in a project for the Los Angeles County Museum of Art. Since then he worked for Hewlett-Packard Consulting and later as IT Manager of a real estate website before founding Antezeta in 2006. Sean is an official instructor of the Digital Analytics Association and collaborates with the Bocconi University. He is a co-author of the Treccani encyclopedic dictionary of computer science, ICT & digital media. Born in Providence, RI, USA, Sean received Honors in Physics from Bates College, Maine. He speaks English, Italian and German.


8 Comments so far ↓

  • Tyler Durden

    I just wanted to say that I found this article very very useful.

    Thanks for the effort

  • mrT

    Hello!

    Well, according to http://tools.ietf.org/html/rfc3986#section-3.3, the language is best postfixed to the url making it unique for search engines, but still understandable to the human eye.

    The great problem is that urls “/about_our_strategy” mean DIDLEY in slovenian, even if you postfix ;si – “/about_our_strategy;si”. A much better URL would be “/o_naši_strategiji”

    I am now at a crossroads on how to tackle this, as the same content in multiple languages “should” be accessible in the sam URL, but “should” have an understadable, semantic URL as well. Preferably in THAT language.

    Any solutions?

  • sean

    I’ve never seen the language specified in the URL, interesting from a technical point of view.

    From a user point of view, the URL should be in the language of the page content as users will scan URLs in search results in order to decide which result best matches their query.

  • SEO Translator

    I agree with most of what you state, but do NOT agree with the language topic – search engines rarely use the http header (Content-Language) and html attribute tags (lang). Google plainly states in its webmaster forum that it disregards those tags and attributes, though Yahoo DOES use them.

    I recently wrote a series of 4 posts on language recognition by the search engines – the first one is posted at http://www.seo-translator.com/do-search-engines-understand-your-localized-pages-1-%E2%80%93-basic-research/

    Many languages (say, English, Spanish, or Arabic) are spoken in many countries and can therefore not be used for geo-location. This might not be true for languages that are spoken only in one country (say, Japanese), but based on the research I made for my series of posts it will be more likely that such recognition is based on n-grams and character mapping than relying on html headers and language attributes, at least for Google.

  • sean

    @SEO Translater: Regarding my statement,

    http header (Content-Language) and html attribute tags (lang) exist to specify a web document’s language, yet they aren’t used too often.

    I’m not sure what you think you disagree with.

    I didn’t provide guidance as to if they should be used or not. I just said they aren’t used often. That said, I would use them as part of a “set and forget” strategy. It is unlikely either Content-Language or lang will ever be important signals as too often they are improperly configured. Yet as they add very little overhead to a web page and proper configuration isn’t too difficult, there really isn’t much point to not using them. That way they’re there should target search engines, Naver, Yandex, Rambler, Baidu, Bing or Google as they may be, ever choose to include these signals, albeit weak ones, in their basket of tricks.

    Just to be clear, there are somethings that I would argue should be avoided, such as using the meta keywords tag, but that is a different story for a different day.

  • SEO Translator

    What I meant is that language is very difficult to use for geo-targeting, unless a specific language corresponds to one region only.

    But I agree that Content-Language or lang should be used, even though Google ignores them because, as you say, they are very often mis-configured. Yahoo however uses them to confirm the detected language.

    Other aspects I have noticed that search engines are using for geo-targeting are the server location for non-ccTLDs (at least Google has stated that in its Webmaster central), and also more devious tricks such as identifying telephone numbers (International dialing codes). One of my customers, with no attempt to geo-location, was correctly identified by Google as Spanish, even though the server is in Texas and she uses a .NET TLD. She tops her keywords in google.es, but is just on the third page in google.com… And the only thing that could give away the location is the word “Madrid” and a Spanish international telephone code (+34….).

  • Jakuza

    I am having dreadful trouble with this, using three TLD’s for a client (a .com, .co.uk and .com.au) and they are all populating in each other’s Google engines – the US is ranking in the AUS Google, vice-versa. And all has been set in Webmaster Tools months ago to avoid this. I’m not sure if multi TLDs is the answer, and if it is maybe we need Google to think of a different one :-)

    • TheMuffler

      Same problem here. I manage 3 sites – .com, .co.uk and .eu and we’re seeing the same. At the moment we’re looking to rebuild our opration from the ground up, but we are struggling to decide whether to consolidate onto a single domain (the .com) or keep the three domains.

      Any insights?

Leave a Comment

Warning: Comments are very welcome insofar as they add something to the discussion. Spam and/or polemical comments without a rational justification of the author's position risk being mercilessly deleted at the sole discretion of the administrator. Yes, life is hard :-).