Tag Archives: Search Engine Crawlers

Why SEO & Usability are like two peas in a pod

Good user experience is fundamental for the success of a website:

On the Internet, it’s survival of the easiest: If customers can’t find a product, they can’t buy it. Give users a good experience and they’re apt to turn into frequent and loyal customers. But the Web also offers low switching costs … Only if a site is extremely easy to use will anybody bother staying around. – Usability guru Jakob Nielsen1

While Nielsen probably had site design and information architecture in mind, his point also encompasses search engine visibility. Without search engine visibility a website is hidden away on a dead-end street instead of being front and center on main street2, where the people are.


Google Crawling and Execution of JavaScript: where are we at today?

For a long time, Google’s advice to website developers was to keep things simple to ensure search engine spiders could successfully crawl and process website content:

Use a text browser such as Lynx to examine your site, because most search engine spiders see your site much as Lynx would. If fancy features such as JavaScript, cookies, session IDs, frames, DHTML, or Flash keep you from seeing all of your site in a text browser, then search engine spiders may have trouble crawling your site.1

In reality, Google often found links in Flash objects, significantly improving this ability as announced last June (creating much confusion by misrepresenting this as a new feature rather than an improvement). And despite the hoopla, there are still many good reasons to avoid Flash.


Keep sections of web pages out of Yahoo! with class=”robots-nocontent”

There are occasions when some content on a web page just shouldn’t appear in search engines. The most frequent example is repeating header and footer details, such as site copyright information. This site uses the hcard format to provide contact information site visitors can save in a vcard for use with a PIM such as Thunderbird or Outlook. Yet some of the information required for a detailed vcard is not really appropriate for a search engine’s index. Historically, the best solution was to place such content on a page using JavaScript as search engines have avoided indexing JavaScript (they probably do analyze it). A JavaScript approach to keeping some page information out of search engines isn’t perfect – not all visitors will have JavaScript enabled.

The same folks behind the hcard format proposed providing robots instructions in the css class html attribute to give search engine crawlers detailed handling information for tagged page content sections.

Comments Off

6 methods to control what and how your content appears in search engines

While it may seem paradoxical, there are many occasions where you may want to exclude a website or portion of a site from search engine crawling and indexing. One typical need is to keep duplicate content, such as printer friendly versions, out of a search engine’s index. The same is true for pages available both in HTML and PDF or word processor formats. Other examples include site “service pages” such as user friendly error message and activity confirmation pages. Special considerations apply for ad campaign landing pages.

There are several ways to prevent Google, Yahoo!, Bing or Ask from indexing a site’s pages. In this article, we look at the different search engine blocking methods, considering each method’s pros and cons.

Just need to review REP directive support? Jump to the:


Google Sitemap Standard Adopted by Leading International Search Engines

Each of the three major search engines (Google, Yahoo! and Microsoft’s Live Search, now Bing) have announced joint support of Google’s sitemaps protocol.

2007-04-11: Ask announces support of the sitemap standard. It is not yet clear if they actually use sitemaps. While Google and Yahoo do process sitemaps, Microsoft does not yet use them.

2008-06-02: Yandex supports xml sitemaps. China’s Baidu also supports sitemaps through their Baidu webmaster tools which is currently (2011) invite only. For those interenested in the Czech Republic, Seznam supports sitemaps.

A new site, www.sitemaps.org, has been created to support the sitemaps protocol. While the Yahoo blog indicates Yahoo is apparently already accepting submissions, there is not mention of this on their Site Explorer submission form. Microsoft is committed to supporting sitemaps after finishing internal testing which is currently underway.

Comments Off

Search Engine Crawlers: Who’s visiting my site and why?

Organizations implementing search engine optimization (SEO) strategies will sooner or later consider monitoring search engine crawling activity. Before a web page can appear in search results, the content has to be discovered through a crawling or spidering process. This is done through software which automatically navigates the web, finding and downloading web content for the search engine to parse, index and rank.

search engine spider
A “spider”, also known as a “crawler”, “robot” or simply “bot”, finds and retrieves web pages. Once a search engine finds your site, either through a link from another site or through a submission form, the “spider” will begin to crawl your site.

Search engine crawling activity is an early sign that SEO is functioning or a potential warning sign of site issues impeding content discovery.


The Google Webmaster Dashboard, a.k.a. Google Sitemaps

In order to index and display web content in their search results, search engines need to be able to find the content. The first generation of Internet search engines relied on webmasters to submit a site’s primary URL, the site’s “home page”, to the search engine’s crawler database. The crawler would then follow each link it found on the home page. Problems soon emerged – much site content can be inadvertently hidden from crawlers, such as that behind drop-down lists and forms.

Update: Google Sitemaps was renamed Google Webmaster Tools on 5-Aug-2005 to better reflect its more expansive role.

Fast forward to 2005. Search engine crawlers have improved their ability to find sites through from other sites – site submission is no longer relevant. Yet many web sites are still coded in ways which impede automatic search engine discovery of the rich content often available in larger, complex web sites.

Comments Off

Creating Search Engine Friendly Drop-down menus using CSS

JavaScript drop-down menus are employed by many medium to large size web sites as primary navigation tool for site visitors. Drop-down menus offer many advantages. They are already familiar to computer uses who encounter them in almost all mainstream software. By collapsing when not needed, the menus take up little screen space – yet offer a wealth of options when the user hovers over one of the visible categories.

Technically, the use of JavaScript to code drop-down menus is problematic. While some code can be relegated to an external JavaScript file, much JavaScript usually ends up bloating HTML pages. In most cases, search engine crawlers are not able to follow the JavaScript navigation links, leading to poor search engine crawling and visibility.

Comments Off

UK and US English Dialect Considerations for Site Internationalization

Search Engines and Site Localization

While there are few differences between the UK and US English dialects which might lead to miscomprehension, Noah Webster‘s spelling reform does lead to interesting issues which need to be considered when designing sites for international audiences.

Note Update: This document was written in 2006 and no longer represents the current state of search affairs. It has been left here as a historical reference. Search engines continually refine their algorithms and that is reflected in how they currently handle regional linguistic differences.

Is it “my favorite color” or “my favourite colour”?

While it may seem like an arcane academic question, how you spell your English language content can determine your site’s visibility in search engines and how your site is perceived by your visitors.

With about two-thirds of native English speakers in the US, American spelling predominates the web. Not surprisingly, a non-scientific survey of search expressions using both US and UK spellings yields more matches for the US variant: