Evaluating Google Search Results

This is a discussion on Evaluating Google Search Results within the RSS News forums, part of the KP's Network Forum category; </img> Google uses many testers to evaluate the quality of search results. This posts gives some details about their work ...

Try Our customized MMoRPG experience, using World of Warcraft Client to connect
  

Reply

 

LinkBack Thread Tools
Old September 10th, 2006   #1 (permalink)
Super Moderating Bot!
Points: 52,065, Level: 70 Points: 52,065, Level: 70 Points: 52,065, Level: 70
Activity: 60.0% Activity: 60.0% Activity: 60.0%
 
Evul Bot!'s Avatar
 
Join Date: Jun 2004
Posts: 12,691
Thanks: 0
Thanked 3 Times in 3 Posts
Evul Bot! will become famous soon enoughEvul Bot! will become famous soon enough
Lightbulb Evaluating Google Search Results

</img> Google uses many testers to evaluate the quality of search results. This posts gives some details about their work and speculates that Google will allow everyone to be a tester.

{ Information from this post is based on Henk van Ess' Search Bistro. }

Google has great search results because it has good algorithms, great data centers and (secret) evaluation tests. There are many people that are paid (some say they are paid with $20 per hour) just to test the relevance of the search results. Google doesn't manually adjust search results, they try to find the problem that generated the low-quality results and tweak their algorithms.

So what conditions should be met by a tester? Here's a job ad from 2005:

"You would work at your own pace, and the time and length of any particular work session would be up to you. Candidates will evaluate search results and rate their relevance. Thus, all candidates must be web-savvy and analytical, have excellent web research skills and a broad range of interests. Specific areas of expertise are highly desirable. We are looking for smart people who read voraciously and have a wide variety of interests.

Raters should have all the following qualifications:

* Native-level fluency in Dutch, Italian, Spanish, or French

* In-depth, up-to-date familiarity with the web culture of at least one predominantly Dutch, Italian, Spanish, or French-speaking country.

* Excellent web research skills and analytical abilities.

* A high-speed internet connection.

* Perfect English is not necessary; however, you must be able to read and write English well enough to use software with an English interface, understand fairly complicated instructions written in English, and make yourself understood in informal written communication.

* The job involves frequent written communication with fellow Quality Raters."


SearchBistro found more about this evaluation. Google selects a number of random queries and sends the list to a group of quality raters. The raters evaluate the results in a CommQuest Evaluation Interface.

"During random-query evaluation, each result URL for every randomly selected query is rated independently by a group of raters using the options given in a pull-down menu on the Quest interface. The rating results are subsequently analyzed. Thats where CommQuest comes in. When you the raters disagree with each other by a wide margin, the result URL will be presented to you again in the uniquely interactive CommQuest interface until a certain level of agreement among you is reached. CommQuest allows you to share your comments on queries and/or URLs with each other, explain the reasoning behind your initial ratings, and revise the ratings based on what youre learning from each other."

</img>

It's hard to define relevant results, but Google evaluates results "based on relevance not to a specific person who actually posed the query, but to an imaginary rational mind behind the query. Oftentimes, a query may have more than one meaning, or interpretation. In such cases we will have to look at the hypothetical set of rational search engine users behind an ambiguous query, and deduce, or roughly estimate, the make-up of that set; for instance, we will consider the relative presence of zoology enthusiasts and car shoppers in a hypothetical representative sample of the users who could have queried [jaguar]."

Google thinks there are three types of queries:

* navigational queries, that have one result (like "BMW" or "MSN")

* informational queries, with more than one possible result (like "renaissance paintings", "what is a shark")

* transactional queries, where the user wants to make an acquisition ("download text editor", "buy blackberry")

There are also nine ratings for each result: Vital, Useful, Relevant, Off Topic, Offensive, Erroneous, Didnt Load, Foreign Language, Unrated.

Here are the tasks that should be performed by each tester:

* Understanding the meaning of the query and its type is it navigational, informational, transactional, or a mixture of two or three?

* If you come to the realization that the query could have been posted by different users with different intentions, crudely assigning possibilities for each interpretation and/or intent

* Researching the query coverage on the web using search engines other than Google, directories, specialized databases, and other sites, or offline resources

* Examining each result for attributes that would call for assigning an applicable special category rather than a merit-based assessment, and, in the absence of those attributes

* Determining the merit rating in light of the query coverage and considering various utility dimensions, as well as taking into account evidence of deceitful web design where appropriate.

So, as you can see, it's not an easy task to evaluate search results and this work influences a lot of what you see in Google search today.

You may be wondering what's the point of this post. Gary Price from Resource Shelf found that Google has registered some interesting domain names recently:

* indexbench.com (and .org, .info, .net) and similar domains
* Google-testing.com (and .net, .org) and similar domains

After the experience with Google Image Labeler, I think Google will try to have more quality testers, but this time for free. If there's a lot of fun in the process and the system is good enough to deal with spam and low-quality raters, the whole world could rate search results. This is just a speculation, but it wouldn't be the first time when users actively modify the order of search results (if you click directly on the third result of a search, Google will know the first two weren't relevant).</img> </img> </img>
Evul Bot! is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply

Bookmarks

Tags
evaluating, google, search, results


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 03:38.

no new posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89