Antezeta LogoAntezeta Web Marketing

Reflections on search engine optimization, web analytics and web marketing

Antezeta Web Marketing header image 2

Unofficial documentation of Ask’s Web Search API

by · 2 Comments ·

In part one of this article, we set out to document the little known Ask web search API by providing background information. In this continuation, we’ll look at the actual API details.

Note Update: Ask disabled access to their API on 6 March 2007. We are working on obtaining additional information. Write us if you would like to be notified of further developments.

NoteThe following information was determined by observation and conjecture. Write us if you want to be notified when we update this page with more complete information. We are assuming the reader has already worked with REST queries and is familiar with parsing data.

Request URL

The request URL is formed by adding query parameter and their values to a base URL using the format query parameter=values. Successive parameters are added using a & before each parameter.

Base URL: http://xml..com/e?

Request URL parameters should be URL encoded.

Before considering the data elements in detail, try a query using the ASK Web Search API. This example will return the first 20 results for “Italy”. If you have trouble viewing the xml results in your browser, try Firefox.

NoteUnless indicated otherwise, Ask parameters, tag attributes and the tags themselves are suppressed when their values are 0 or empty.

Request Parameters

ParameterValueDescription
ainteger?
finteger; default 0First result number (offset) to return. Default is 0. Used to “page” through results.
iinteger?
pinteger; default 0Treat query words as a phrase
tstring. required.Query terms.
uinteger; default 10. Takes values 1 to 200.Maximum number of results to return
yinteger?

Ask advanced query parameter modifiers

The following advanced query modifiers are documented for general use with ask search queries. They appear to work with the web services API as well.

ParameterValueDescription
site:string, e.g. domain nameRestrict query to specific site. Must be used with another query term. To query an entire site, use site:www.mydomain.com inurl:www.mydomain.com.
intitle:stringlimit query to page titles containing text
inurl:stringlimit query to page URLs
lang:language code. (NL Dutch; EN English; FR French; DE German; IT Italian; PT Portuguese; ES Spanish)limit query to specific language.
geoloc:Region code. (CA Central America; EU Europe; IA India / Asia; NA North America; OC Oceania; SA South America)limit query to specific geographic region. We noticed that Ask’s spelling system is not aware of geoloc as a query parameter. Ask suggests an alternate spelling!
inlink:stringlimit query to containing a string.
last:week, 2weeks, month, 3months, 6months, year, 2yearslimit query to pages indexed in a specific time frame.
afterdate:afterdate:yyyymmddlimit query to pages modified after date e.g. afterdate:20061015
beforedate:beforedate:yyyymmddlimit query to pages modified before date, e.g. beforedate:20060312
betweendate:betweendate:yyyymmdd,yyyymmddlimit query to pages modified during a date range, e.g. betweendate: 19960115,20030412

Response Fields

The response is delivered as a xml version 1.0 file using UTF-8 encoding. The entire response is wrapped in a SEARCHRESULTS tag. Tags may contain data, attributes or both.

TagAttributesValueDescription
RESULTSETWrapper tag for both topic and web search result sets.
QUERYstringContains the processed query string.
ESTIMATERESULTintegerEstimate of total matching results in ASK.
TOTALRESULTintegerTotal results available for queries. Maximum of 200.
FIRSTRESULTinteger0 based offset, used for looping through “pages” of results.
NUMRESULTSintegerCount of results in result set. Default is 10. maximum is TOTALRESULT, i.e. 200.
SORTstring; default: rankSort order. Other values?
MORERESULTSstring; true or false
STOPWORDSstringList of stop words (i.e. the, a, for) filtered from query
COUNTintegerNumber of stop words excluded from query
TagAttributesValueDescription
RESULTSETWrapper tag for both topic and web search result sets.
QUERYstringContains the processed query string.
ESTIMATERESULTintegerEstimate of total matching results in ASK.
TOTALRESULTintegerTotal results available for queries. Maximum of 200.
FIRSTRESULTinteger0 based offset, used for looping through “pages” of results.
NUMRESULTSintegerCount of results in result set. Default is 10. maximum is TOTALRESULT, i.e. 200.
SORTstring; default: rankSort order. Other values?
MORERESULTSstring; true or false
STOPWORDSstringList of stop words (i.e. the, a, for) filtered from query
COUNTintegerNumber of stop words excluded from query

Topics

In the relentless pursuit of relevant results, search engines strive to find patterns and relationships in the relatively unstructured data that is the web. Most search engines are able to group similar results from the same site, displaying just one or two results with an option to see more results from the same site. This grouping is called site level topic clustering.

Teoma Logo

The Teoma search engine, integrated into the current Ask, was one of the first to apply topic clustering to the greater web. Teoma determined clusters by identifying communities, web pages which link to each other, automatically naming the clusters based common phrases appearing in the group. The clustering was dynamic, performed for each query (although presumably search engines cache popular queries).

At one point, Teoma displayed topic clusters as folders above the main search results under the heading Web Pages Grouped by Topic. Later they were called “Refine”. Users could click on a topic to refine their search by organizing results into specific sub-topic categories based on the search query.

Topics are not currently displayed in Ask search query results. Fortunately, this information is still available via the web services API.

At most 10 topics are returned.

Topic Response Header

TagAttributesValueDescription
TOPICGROUPSNUMGROUPSinteger; 1 to 10Number of topic clusters returned.
 MORERESULTSstring; true or falseMore results are available than those returned?

Topic Response Details

TagAttributesValueDescription
TOPICGROUPIDinteger, 0 basedDetail item wrapper tag. Contains item id, an integer starting at 0.
NAMEstringContains keyword or keyword phrase used to name topic group.
URLstring; partial query URLA concatenation of t= and the words appearing in name, each separated with “+” as required to create a URL

Web Search

Web Search Response Header

TagAttributesValueDescription
PREVWEBPAGEstringTag appears if prior “pages” of results are available. Contains query string to append to base URL which will return previous results.
Ainteger; default 1.?
FintegerFirst result number (offset) to return in the next result set. Is the sum of the current first result ID and the number of records requested.
Pinteger; default: 0. 1 if phrase.Treat query words as a phase for proximity searching.
Uinteger; default 10. Values 1 to 200.Number of records requested/ to request.
MOREWEBPAGEstringTag appears if more “pages” of results are available. Contains query string to append to base URL which will return next results.
Ainteger; default 1.?
FintegerFirst result number (offset) to return in the next result set. Is the sum of the current first result ID and the number of records requested.
Pinteger; default: 0. 1 if phrase.Treat query words as a phase for proximity searching.
Uinteger; default 10. Values 1 to 200.Number of records requested/ to request.
RELATEDstringQuery string to perform related query
Ainteger; default 1.?
Iinteger; default 0. Not suppressed as with other attributes.?
Yinteger; default 1.?
WEBPAGEWrapper tag for web search results detail records

Web Search Response Details

TagAttributesValueDescription
RESPONSEIDinteger0 based offset
SCOREdecimal; values 0.01 to 1.00Site rank for query
PARTITIONx,y integer pair? …worth some reflection.
INDinteger, tag not present or 1Appears to be a presentation tag, meaning “indent”. Appears when a site cluster result is present.
DOCTYPEstring, default: tag not present. PDF appears to be the only currently used value.mime type. Ask officially supports Flash (swf) and PDF documents in addition to standard text. In reality, flash documents are not highlighted in Ask search results nor is the DOCTYPE tag populated. Ask does not yet support a doctype search filter, although it was promised in the past. inurl:pdf or inurl:swf work to a degree as a workaround.
TITLEstring (Truncated after the first ~65 characters)Document title
URLstringDocument URL
ABSTRACTstring (upto ~140 characters)Document Abstract. Based on user query.
SITEstring, default not present. Value is a URLMore results from current site. Present when IND is 1, is URL to query site for given keywords
CACHEDKEYstring; e.g. 00*knpldsckmkezTeoma 3.0 introduced cached versions of “popular” sites. A document may not be cached if the site is not popular or the document has used the “noachieve” meta tag.
URLstringcomplete URL to access cached document.

Teoma Experts’ Links / Resources

Teoma offered Expert Links, later called Resources, which we have not encountered in the xml API.

Expert links are Web sites created by individual enthusiasts, or “fans”, containing lists of resources relevant to the search topic. For example, an amateur golfer might have created a page devoted to his personal collection of favorite golfing sites. Teoma’s expert identification technology discovers and presents these pages as “Expert Links.”

Query the Ask Web Search API with our example program

We have created a sample Perl program to query Ask’s Web Search API, saving the returned xml file and placing the results in a spreadsheet file for analysis. Note that this program is only a sample; it is not intended for production use. Be kind to Ask: don’t kill their servers with incessant automated queries.

Download and uncompress ask-search-1.0.pl.tgz. Read the licence terms and warnings at the beginning of the program. Note that the name of this file will change. Link to this page, not the file! To use, type

perl ask-search.pl <query terms>

i.e.

perl ask-search.pl the rolling stones site:bbc.co.uk

In this case, Ask will ignore the, a stop word. All sites with the domain suffix bbc.co.uk will be queried for rolling stones. Three files will be created:

  • ask-query-response.xml – contains the raw results from Ask, in XML.
  • ask-query-results.txt – contains a processed view of the XML, used by the perl module XML::Simple
  • ask-search.xls – contains the query results in an Excel format spreadsheet. The first tab contains the search results. The second tab contains up to 10 topics.

Known Limitations

  • “Encyclopedia” entries, such as those from Wikipedia, are not integrated in this data.
  • “Narrow your search” options, as in the current Ask.com interface, are not the same as the topic group options.
  • Sponsored ads are not present in the xml data.
  • The binocular preview option URL is not present.

NoteUnfortunately Ask’s current crawling frequency of web sites appears to be much lower than Ask’s competitors Google, and Microsoft’s MSN Search and Windows Live. Consequently, the base data processed by Ask’s algorithms is more likely to be out of date, clearly impacting the quality of search results.

Ask Maintains Separate Regional Search Engines?

Our initial analysis indicates that the results available from xml.teoma.com are the same as those from the US version of Ask, www.ask.com. It appears that localized versions of Ask, such as Ask Italia, are probably using a different underlying data base. We see richer results for Italian sites in Ask Italia compared to Ask.com. With other search engines, a simply modification of language and/ or region code, in this case lang and geoloc, usually solves the problem. Not with Ask. We also note the lack of an advanced query options page for Ask Italia.

External Resources

Last updated: 2009-06-17

Similar Posts:

Registration is now open for the next SEO Course (May 14 and 15) and Google Analytics Course (May 9 and 10) in Milan. Don’t miss the opportunity!

Originally published July 26th, 2006

  • Sean Carlos is a web marketing consultant & teacher, assisting companies with their Search (SEO + PPC = SEM), Social Media & Digital Media Measurement strategies. Sean first worked with text indexing in 1990 in a project for the Los Angeles County Museum of Art. Since then he worked for Hewlett-Packard Consulting and later as IT Manager of a real estate website before founding Antezeta in 2006. Sean is an official instructor of the Digital Analytics Association and collaborates with the Bocconi University. He is a co-author of the Treccani encyclopedic dictionary of computer science, ICT & digital media. Born in Providence, RI, USA, Sean received Honors in Physics from Bates College, Maine. He speaks English, Italian and German.


2 Comments so far ↓

Leave a Comment

Warning: Comments are very welcome insofar as they add something to the discussion. Spam and/or polemical comments without a rational justification of the author's position risk being mercilessly deleted at the sole discretion of the administrator. Yes, life is hard :-).