Tag Archives: Search Engine Crawlers
Google Crawling and Execution of JavaScript: where are we at today?
For a long time, Google’s advice to website developers was to keep things simple to ensure search engine spiders could successfully crawl and process website content:
Use a text browser such as Lynx to examine your site, because most search engine spiders see your site much as Lynx would. If fancy features such as JavaScript, cookies, session IDs, frames, DHTML, or Flash keep you from seeing all of your site in a text browser, then search engine spiders may have trouble crawling your site.1
In reality, Google often found links in Flash objects, significantly improving this ability as announced last June (creating much confusion by misrepresenting this as a new feature rather than an improvement). And despite the hoopla, there are still many good reasons to avoid Flash.
Keep sections of web pages out of Yahoo! with class=”robots-nocontent”
There are occasions when some content on a web page just shouldn’t appear in search engines. The most frequent example is repeating header and footer details, such as site copyright information. This site uses the hcard format to provide contact information site visitors can save in a vcard for use with a PIM such as Thunderbird or Outlook. Yet some of the information required for a detailed vcard is not really appropriate for a search engine’s index. Historically, the best solution was to place such content on a page using JavaScript as search engines have avoided indexing JavaScript (they probably do analyze it). A JavaScript approach to keeping some page information out of search engines isn’t perfect – not all visitors will have JavaScript enabled.
The same folks behind the hcard format proposed providing robots instructions in the css class html attribute to give search engine crawlers detailed handling information for tagged page content sections.
Google Sitemap Standard Adopted by Leading International Search Engines
Each of the three major search engines (Google, Yahoo! and Microsoft’s Live Search, now Bing) have announced joint support of Google’s sitemaps protocol.
2007-04-11: Ask announces support of the sitemap standard. It is not yet clear if they actually use sitemaps. While Google and Yahoo do process sitemaps, Microsoft does not yet use them.
2008-06-02: Yandex supports xml sitemaps. China’s Baidu also supports sitemaps through their Baidu webmaster tools which is currently (2011) invite only. For those interenested in the Czech Republic, Seznam supports sitemaps.
A new site, www.sitemaps.org, has been created to support the sitemaps protocol. While the Yahoo blog indicates Yahoo is apparently already accepting submissions, there is not mention of this on their Site Explorer submission form. Microsoft is committed to supporting sitemaps after finishing internal testing which is currently underway.
Search Engine Crawlers: Who’s visiting my site and why?
Organizations implementing search engine optimization (SEO) strategies will sooner or later consider monitoring search engine crawling activity. Before a web page can appear in search results, the content has to be discovered through a crawling or spidering process. This is done through software which automatically navigates the web, finding and downloading web content for the search engine to parse, index and rank.
- search engine spider
- A “spider”, also known as a “crawler”, “robot” or simply “bot”, finds and retrieves web pages. Once a search engine finds your site, either through a link from another site or through a submission form, the “spider” will begin to crawl your site.
Search engine crawling activity is an early sign that SEO is functioning or a potential warning sign of site issues impeding content discovery.
The Google Webmaster Dashboard, a.k.a. Google Sitemaps
In order to index and display web content in their search results, search engines need to be able to find the content. The first generation of Internet search engines relied on webmasters to submit a site’s primary URL, the site’s “home page”, to the search engine’s crawler database. The crawler would then follow each link it found on the home page. Problems soon emerged – much site content can be inadvertently hidden from crawlers, such as that behind drop-down lists and forms.
Update: Google Sitemaps was renamed Google Webmaster Tools on 5-Aug-2005 to better reflect its more expansive role.
Fast forward to 2005. Search engine crawlers have improved their ability to find sites through from other sites – site submission is no longer relevant. Yet many web sites are still coded in ways which impede automatic search engine discovery of the rich content often available in larger, complex web sites.
Creating Search Engine Friendly Drop-down menus using CSS
JavaScript drop-down menus are employed by many medium to large size web sites as primary navigation tool for site visitors. Drop-down menus offer many advantages. They are already familiar to computer uses who encounter them in almost all mainstream software. By collapsing when not needed, the menus take up little screen space – yet offer a wealth of options when the user hovers over one of the visible categories.
Technically, the use of JavaScript to code drop-down menus is problematic. While some code can be relegated to an external JavaScript file, much JavaScript usually ends up bloating HTML pages. In most cases, search engine crawlers are not able to follow the JavaScript navigation links, leading to poor search engine crawling and visibility.
UK and US English Dialect Considerations for Site Internationalization
Search Engines and Site Localization
While there are few differences between the UK and US English dialects which might lead to miscomprehension, Noah Webster‘s spelling reform does lead to interesting issues which need to be considered when designing sites for international audiences.
Update: This document was written in 2006 and no longer represents the current state of search affairs. It has been left here as a historical reference. Search engines continually refine their algorithms and that is reflected in how they currently handle regional linguistic differences.
Is it “my favorite color” or “my favourite colour”?
While it may seem like an arcane academic question, how you spell your English language content can determine your site’s visibility in search engines and how your site is perceived by your visitors.
With about two-thirds of native English speakers in the US, American spelling predominates the web. Not surprisingly, a non-scientific survey of search expressions using both US and UK spellings yields more matches for the US variant:

Recent Comments