Google Is All About Large Amounts of Data - KingPin's Forum
KPsDenKPsDen ArmoryImage HostingLegendz text RPGIp DisplayIRCFlash Arcade
Forum
      |        
Register
         Double dragon (Posted By:The_FreeMan - Replies:0 - Views:11)      « »     [epic] post you pic thread [/epic] (Posted By:Ego - Replies:323 - Views:7729)      « »     no wai! (Posted By:shek - Replies:15 - Views:193)      « »     Mangos : Creature, Items, Spells & object ID's (Posted By:KingPin - Replies:36 - Views:27061)      « »     Insurgency Server (Posted By:KingPin - Replies:8 - Views:188)      « »     [Check4SPAM] RE: Top 10 Most Popular iPod Video Tools (Posted By:Isabell - Replies:0 - Views:1)      « »     Leather conversions via .tradein (Posted By:vohrtechs - Replies:18 - Views:509)      « »     Infraction for wowgolds987: Repeated Rule Violaion (Posted By:Texan - Replies:0 - Views:1)      « »     Infraction for wowgolds987: Repeated Rule Violaion (Posted By:Texan - Replies:0 - Views:1)      « »     [Check4SPAM] RE: Re: wow gold (Posted By:wowgolds987 - Replies:0 - Views:1)      « »     
KingPin's Forum
 
K.P.s.N. Register vbBux / KPs Mall Bugs Blogs FAQ Search Today's Posts Mark Forums Read Donate
Go Back   KingPin's Forum > KP's Network Forum > RSS News
Reload this Page Google Is All About Large Amounts of Data
 


RSS News This is a discussion on Google Is All About Large Amounts of Data in the RSS News;
Description: In a very interesting interview from October, Google's VP Marissa Mayer confessed that having access to large amounts of data ...

Reply
 
LinkBack Thread Tools
Google Is All About Large Amounts of Data
(#1 (permalink))
Old
Evul Bot!'s Avatar
Evul Bot! is Offline
Super Moderating Bot!
Points: 52,065, Level: 70 Points: 52,065, Level: 70 Points: 52,065, Level: 70
Activity: 60% Activity: 60% Activity: 60%
Evul Bot! will become famous soon enoughEvul Bot! will become famous soon enough
 
Evul Bot! is too awesome too moods
Rupees: 682,072.29
Bank: 500.00
Total Rupees: 682,572.29
 
Posts: 10,965
Thanks: 0
Thanked 1 Time in 1 Post
Join Date: Jun 2004
Lightbulb Google Is All About Large Amounts of Data - December 16th, 2007


In a very interesting interview from October, Google's VP Marissa Mayer confessed that having access to large amounts of data is in many instances more important than creating great algorithms.
Right now Google is really good with keywords, and that's a limitation we think the search engine should be able to overcome with time. People should be able to ask questions, and we should understand their meaning, or they should be able to talk about things at a conceptual level. We see a lot of concept-based questions -- not about what words will appear on the page but more like "what is this about?" A lot of people will turn to things like the semantic Web as a possible answer to that. But what we're seeing actually is that with a lot of data, you ultimately see things that seem intelligent even though they're done through brute force.

When you type in "GM" into Google, we know it's "General Motors." If you type in "GM foods" we answer with "genetically modified foods." Because we're processing so much data, we have a lot of context around things like acronyms. Suddenly, the search engine seems smart like it achieved that semantic understanding, but it hasn't really. It has to do with brute force. That said, I think the best algorithm for search is a mix of both brute-force computation and sheer comprehensiveness and also the qualitative human component.
Marissa Mayer admitted that the main reason why Google launched the free 411 service is to get a lot of data necessary for training speech recognition algorithms.
You may have heard about our [directory assistance] 1-800-GOOG-411 service. Whether or not free-411 is a profitable business unto itself is yet to be seen. I myself am somewhat skeptical. The reason we really did it is because we need to build a great speech-to-text model ... that we can use for all kinds of different things, including video search.

The speech recognition experts that we have say: If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation. So we need a lot of people talking, saying things so that we can ultimately train off of that. ... So 1-800-GOOG-411 is about that: Getting a bunch of different speech samples so that when you call up or we're trying to get the voice out of video, we can do it with high accuracy.
Peter Norvig, director of research at Google, seems to agree. "I have always believed (well, at least for the past 15 years) that the way to get better understanding of text is through statistics rather than through hand-crafted grammars and lexicons. The statistical approach is cheaper, faster, more robust, easier to internationalize, and so far more effective." Google uses statistics for machine translation, question answering, spell checking and more, as you can see in this video. The same video explains that the more data you have, the better your AI algorithm will perform, even if it isn't the best.

Peter Norvig says that Google developed its own speech recognition technology. "We wanted speech technology that could serve as an interface for phones and also index audio text. After looking at the existing technology, we decided to build our own. We thought that, having the data and computational resources that we do, we could help advance the field. Currently, we are up to state-of-the-art with what we built on our own, and we have the computational infrastructure to improve further. As we get more data from more interaction with users and from uploaded videos, our systems will improve because the data trains the algorithms over time."

Google is in the privileged position to gain access to large amounts of data that could be used to improve other services. </img> </img> </img> </img>

 
Reply With Quote
Reply

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Rupees Per Thread View: 1.00
Rupees Per Thread: 15.00
Rupees Per Post: 5.00
Forum Jump



Powered by vBulletin® Version 3.8.0 Beta 1
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
Copyright 2004-2009 KPsN


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81