Antezeta LogoAntezeta Web Marketing

Reflections on search engine optimization, web analytics and web marketing

Antezeta Web Marketing header image 2

Say It Isn’t So: Marketing Resource Site Marketing Profs Seems To Be Cloaking Search Engines – Inadvertantly?

by · 6 Comments ·

Years ago savvy webmasters realized they could achieve better by creating two copies of a web page. One, text rich and graphics poor, would be seen by search engine , such as Googlebot, Yahoo Slurp and Microsoft Bing’s msnbot/bingbot. Everyday web users, surfing with Internet Explorer, Firefox, Chrome or Safari, would see a different version, often graphics rich and text poor.

The process of providing different web content to search engines and site visitors is often called although some may prefer terms such as conditional content delivery. Cloaking is expressly prohibited by Google, Yahoo and Microsoft’s bing.

The real world problem is that cloaking works, and if you’re important enough, you can get away with it until you get caught. At that point you’ll probably get a slap on the wrist, but little more. As an consultant, this leads to many frustrating discussions with clients (and their webmasters) who can’t understand why they shouldn’t cloak too. My official answer is that if you’re site isn’t a throwaway site, you shouldn’t take the risk. Yet this discussion happens too often and, needless to say, clients don’t really like my answer.

Today I performed a search in Google for “web analytics errors configuration” as I want to properly credit the source of a citation I’m going to use in a web analytics course I’m updating. One particular result really bothered me when I clicked on it. What I saw was an invitation to pay for an article which I wasn’t even sure would answer my question. A very frustrating user experience. What also surprised me was that the page ranked despite having very little real content on it. So, as often is the case, I decided to take a closer look. What I found really surprised me, so I decided to document the case as I suspect it may be helpful to a reader or two.

This is the Marketing Profs page I expected to see:

Marketing Profs cloaked page

This is what I saw instead:

Marketing Profs page served to user

Now, there is always the slight chance that Marketing Profs changed their site logic at some point between when Googlebot last visited the specific page (25 Jul 2010 05:24:42 GMT) and today, the 29th of July. So I decided to download the page, first providing Marketing Profs my true identity – a Firefox user on Fedora – then downloading the page telling Marketing Profs that I was Googlebot:

wget http://www.marketingprofs.com/4/sterne16.asp --server-response --save-headers --user-agent="Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -O googlebot.html

wget http://www.marketingprofs.com/4/sterne16.asp --server-response --save-headers --user-agent="Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.11) Gecko/20100720 Fedora/3.5.11-1.fc12 Firefox/3.5.11 GTB7.1" -O linux-user.html

In the above commands, the only difference was the browser I specified, the so-called user-agent.

Guess what? The page sizes, 78k vs 11k, were rather different:

$ ls -l
total 92
-rw-rw-r-- 1 sean sean 78846 2010-07-29 11:53 googlebot.html
-rw-rw-r-- 1 sean sean 11121 2010-07-29 11:53 linux-user.html

Houston, we have a problem, this type of behavior is exactly what search engines call cloaking. But perhaps there is a logical explanation, after all.

Out of professional curiosity, I also checked the page using msnbot as the user agent – Marketing Profs is cloaking both msnbot versions but is not yet cloaking calls from bingbot.

Why a Subscription Site Might Want to Cloak Google: the Content Paywall Problem

Quality content is expensive to produce. I know, and I sympathize with newspapers and sites like Marketing Profs which want to publish great content but need to cover their expenses in the process. Just researching and writing this article on cloaking took several hours. It often happens that others may then publish their own versions of a story like this without even the courtesy of citing the source.

To cover their expenses, many sites turn to a subscription model, perhaps providing a few sample articles, along with article abstracts, to non-subscribers.

Unfortunately these types of sites face a myriad of problems. They have to convince their visitors that the site does indeed offer original content worth paying for. In many cases, such as many newspapers, there is the risk of realizing that your “product” is really just a commodity freely available elsewhere. The solution is of course to differentiate the product, perhaps by insightful reporting and a dose of investigative journalism, but that is a different story.

There still remains the problem of how to attract potential subscribers in the first place. As Andrei Broder et al pointed out in their seminal A taxonomy of web search, many web surfers use search engines to navigate to web sites rather than typing in a site URL directly. Love them or loath them, a website must be in search engines if it wants traffic. So how can a site get protected content, e.g. that behind a paywall, in a search engine index?

Google’s First Click Free cloaking exception

Google does allow subscription sites, such as Marketing Profs, to cloak their content for paywall purposes as long as users who navigate to a page from Google can see the entire text of the page as Googlebot sees it. This rule, called first click free, is usually implemented by checking the URL referrer sent by the user. I did confirm that my browser correctly sent the Google referrer to Marketing Profs:

Referer: http://www.google.com/url?sa=t&source=web&cd=5&ved=0CCwQFjAE&url=http%3A%2F%2Fwww.marketingprofs.com%2F4%2Fsterne16.asp&rct=j&q=web%20analytics%20errors%20configuration&ei=BmRRTIykHtqTsQbgusm6AQ&usg=AFQjCNF_i6OkmPL-TnFULAMrSWLthlxBoA

wget users can try this at home:

wget http://www.marketingprofs.com/4/sterne16.asp --server-response --save-headers --referer="http://www.google.com/url?sa=t&source=web&cd=5&ved=0CCwQFjAE&url=http%3A%2F%2Fwww.marketingprofs.com%2F4%2Fsterne16.asp&rct=j&q=web%20analytics%20errors%20configuration&ei=BmRRTIykHtqTsQbgusm6AQ&usg=AFQjCNF_i6OkmPL-TnFULAMrSWLthlxBoA" --user-agent="Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.11) Gecko/20100720 Fedora/3.5.11-1.fc12 Firefox/3.5.11 GTB7.1" -O linux-user-with-google-referer.html

There has to be an Innocent Explanation Somewhere

I’m going to go out on a limb and assume that Marketing Profs is well aware of Google’s Webmaster Guidelines. I’ll also assume that Marketing Profs is familiar with Google’s First Click Free requirement and that Marketing Profs has absolutely no intention of manipulating search engine results (I’m in a generous, and trusting, mood; of course, I could be missing something else and I trust a reader will enlighten us all).

Thus what has clearly happened here is that Marketing Profs has suffered a temporary technical hiccup :-) . As they say, stuff happens. Some poor web server administrator probably updated a configuration file, “breaking” Marketing Profs’ adherence to Google’s first click free rule.

To be clear, the purpose of this article is NOT to call out Marketing Profs for improper conduct. By the time you read this article, I’m sure Marketing Profs‘ site will have been fixed, and will again conform to Google’s Webmaster Guidelines.

What I do want to do is use this real world case to illustrate

  1. a user’s frustration
  2. a business need (pay for content development)
  3. the options that do exist for a site owner

What Marketing Profs should know about their options

First and foremost, Marketing Profs needs to either conform to Google’s first click free policy, as noted above, or to provide exactly the same pages to end users as it does to search engine robots such as Googlebot and msnbot (soon bingbot).

Should Marketing Profs choose the first click free option, they don’t necessary have to make it easy for the end user (internal meeting code word: freeloader). Marketing Profs could engage a SEO consultant for ideas, helping to subsidize this or a similar site in the process. A SEO consultant worth his or her salt would then point to first click free implementations which adhere to Google’s requirements without directly giving away the store. Experts exchange immediately comes to mind. They do provide the full text of an article to end users who arrive from Google, but the user does have to do lots (and I mean lots) of scrolling past subscription invites to get to the desired information.

Geo IP Delivery – legal cloaking by another name

Web sites can legally deliver different content to different users – they just need to insure that search engine robots are treated exactly like a human user accessing the site with Firefox or a similar browser. The most common example is Geo IP detection and site redirection. As an example, if a user accesses Google.com outside the US, Google will automatically redirect them to the local country version based on their current IP. Such conditional content delivery is allowed and should not be confused with search engine cloaking bans.

Similar Posts:

Registration is now open for the next SEO Course (May 14 and 15) and Google Analytics Course (May 9 and 10) in Milan. Don’t miss the opportunity!

Originally published July 29th, 2010

  • Sean Carlos is a web marketing consultant & teacher, assisting companies with their Search (SEO + PPC = SEM), Social Media & Digital Media Measurement strategies. Sean first worked with text indexing in 1990 in a project for the Los Angeles County Museum of Art. Since then he worked for Hewlett-Packard Consulting and later as IT Manager of a real estate website before founding Antezeta in 2006. Sean is an official instructor of the Digital Analytics Association and collaborates with the Bocconi University. He is a co-author of the Treccani encyclopedic dictionary of computer science, ICT & digital media. Born in Providence, RI, USA, Sean received Honors in Physics from Bates College, Maine. He speaks English, Italian and German.


6 Comments so far ↓

  • Andy Nattan

    Strange. If you get to the bottom of this, let us know. It’s strange that Google would fight so hard for first click free only to not insist it’s implemented.

  • Curios

    Hmmm, this isn’t cloaking.

    It appears to just be that they require membership to access their content. What you see is the sign up page, because you are not a member. Google see the actual content page, since there’s no reason to redirect Google to a sign up page.

    This is pretty common for subscription sites. Most subscription model sites do the same thing, that’s how they get people to sign up. All you have to do is sign up for free to see the same thing Google indexed.

  • sean

    @Curious – If you read Google’s official documentation on both cloaking and First Click Free (the links are in the article), the official message is you shouldn’t serve different content to users and Googlebot yet an exception is made for subscription sites via the First Click Free program. In that case, Google says a subscription site should

    allow all users who find a document on your site via Google search to see the full text of that document, even if they have not registered or subscribed to see that content

    That message does not seem to allow much wiggle room.

  • Andrew Clark

    Well, it’s approaching a year on and Marketing Profs is STILL taking this approach, seemingly without penalty. For all of the huff and puff Google kicks up about this practice there are very few houses that are blown down.

  • MikeG

    Actually, Google fully supports cloaking in the manner described above :

    http://www.google.com/support/news_pub/bin/answer.py?hl=en&answer=40543

  • chris

    Good find Sean, I still see the cloaking as well. From a users point of view this is wrong that I click on a result expecting to see the all text, instead I’m being shown a cut down version

Leave a Comment

Warning: Comments are very welcome insofar as they add something to the discussion. Spam and/or polemical comments without a rational justification of the author's position risk being mercilessly deleted at the sole discretion of the administrator. Yes, life is hard :-).