Mining the Web - KingPin's Forum
KPsDenKPsDen ArmoryImage HostingLegendz text RPGIp DisplayIRCFlash Arcade
Forum
      |        
Register
         Giving away my account (Posted By:Headshot - Replies:0 - Views:1)      « »     we have cheap shoes,example :gucci sneaker,air force... (Posted By:jimei - Replies:0 - Views:5)      « »     we have cheap shoes,example :gucci sneaker,air force... (Posted By:jimei - Replies:0 - Views:5)      « »     we have cheap shoes,example :gucci sneaker,air force... (Posted By:jimei - Replies:0 - Views:5)      « »     we have cheap shoes,example :gucci sneaker,air force... (Posted By:jimei - Replies:0 - Views:7)      « »     Lol thx (Posted By:vohrtechs - Replies:16 - Views:159)      « »     Networking not working (Posted By:vohrtechs - Replies:2 - Views:26)      « »     [epic] post you pic thread [/epic] (Posted By:Bionuclear - Replies:263 - Views:6038)      « »     Problem/Network (Posted By:Jewells02 - Replies:0 - Views:16)      « »     Wrath of the lich king (Posted By:Trisha - Replies:23 - Views:272)      « »     
KingPin's Forum
 
K.P.s.N. Register vbBux / KPs Mall Bugs Blogs FAQ Search Today's Posts Mark Forums Read Donate
Go Back   KingPin's Forum > KP's Network Forum > RSS News
Reload this Page Mining the Web
 


RSS News This is a discussion on Mining the Web in the RSS News;
Description: When you have a lot of indexed web pages, and information about user queries, you can extract a lot of ...

Reply
 
LinkBack Thread Tools
Mining the Web
(#1 (permalink))
Old
Evul Bot!'s Avatar
Evul Bot! is Offline
Super Moderating Bot!
Points: 52,065, Level: 70 Points: 52,065, Level: 70 Points: 52,065, Level: 70
Activity: 60% Activity: 60% Activity: 60%
Evul Bot! will become famous soon enoughEvul Bot! will become famous soon enough
 
Evul Bot! is too awesome too moods
Rupees: 561,384.29
Bank: 500.00
Total Rupees: 561,884.29
 
Posts: 10,796
Thanks: 0
Thanked 1 Time in 1 Post
Join Date: Jun 2004
Lightbulb Mining the Web - February 18th, 2007

When you have a lot of indexed web pages, and information about user queries, you can extract a lot of meaningful data. A simple way to do that is by exploiting the document markup structure.

Reworking the navigation links for a site
Google shows four internal links for the top result, when they are available and also relevant. Usually, they are the most popular links. This is a great way to compress a big list of navigation links. Google detects navigation links by looking at groups of links that belong to a phrase.


Finding definitions
Google mines the web to find glossaries. Most of them use the DD tag, so it's pretty easy the detect them. The result: you find definitions that aren't available in traditional dictionaries or encyclopedias. You can find definitions by adding define: in front of your query.


Spell checking
"Google's spell checking software automatically looks at your query and checks to see if you are using the most common version of a word's spelling. If it calculates that you're likely to generate more relevant search results with an alternative spelling, it will ask "Did you mean: (more common spelling)?". Clicking on the suggested spelling will launch a Google search for that term." Google's spell checker recognizes frequent typos, common misspellings, but also terms that are generally confused. So Google is good at detecting misspellings for words that aren't included in dictionaries.


Lists of related terms
Google Sets lets you enter a list of terms and generates related terms. It's a good way to find a list of US presidents, similar illnesses, competitors or movie recommendations. There's no description of the algorithm at Google Sets site, but Google could look at phrases that appear more frequently in a web page, for example in a list.

Universal autocomplete
By looking at popular queries, Google Suggest autocompletes your query, so you type less and also use better queries. This might be extended to a general autocomplete for input boxes, that could be restricted to a domain (for example, music artists).

Google could also mine FAQs (lists of frequently answered questions), create a search engine for files by listing different mirrors and context from the web pages that linked to the files, show related images by mining photo albums, show what sites embed a YouTube video or have frequently updated feeds, or create summaries for web pages by looking at the anchors. When you have hundreds of terabytes of information, the possibilities are endless.</img> </img> </img>

 
Reply With Quote
Revenue Shared Ads
(#2 (permalink))
Old
tux's Avatar
tux is Offline
Kingpin's pet penguin
Points: 2,763, Level: 15 Points: 2,763, Level: 15 Points: 2,763, Level: 15
Activity: 0% Activity: 0% Activity: 0%
tux is an unknown quantity at this point
 
tux
Rupees: 2,990.63
Bank: 525.94
Total Rupees: 3,516.57
 
Posts: 121
Thanks: 0
Thanked 0 Times in 0 Posts
Join Date: Mar 2005
Location: Chicago
February 19th, 2007

good shit right here
 
Reply With Quote
Revenue Shared Ads
Reply

Bookmarks

Tags
mining, web

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Rupees Per Thread View: 1.00
Rupees Per Thread: 15.00
Rupees Per Post: 5.00
Forum Jump



Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
Copyright 2004-2009 KPsN


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81