Does someone know of a list of the most frequently linked domain names from Wikipedia (preferably just En)?
On 4/30/07, Anthony wikilegal@inbox.org wrote:
Does someone know of a list of the most frequently linked domain names from Wikipedia (preferably just En)?
Gregory Kohs had posted on 4/24/07 a link on Wikipedia Review to this site:
http://www.online-utility.org/wikipedia/top_500_websites_wikipedia.jsp
Which is just such a breakdown. It's in the Articles section of the boards, under "Top 500..." Alkivar posted a bit of a conversation he had with Greg Maxwell about the listing--it's not perfect, as they show--but I suppose it's close enough to start digging. The Top 10:
14410 www.findagrave.com 12394 en.wikipedia.org 8844 www.geocities.com 8117 news.bbc.co.uk 7228 www.imdb.com 6861 www.myspace.com 5743 www.bbc.co.uk 3641 www.youtube.com 3614 maps.google.com 2707 www.parl.gc.ca
On 4/30/07, Joe Szilagyi szilagyi@gmail.com wrote:
On 4/30/07, Anthony wikilegal@inbox.org wrote:
Does someone know of a list of the most frequently linked domain names from Wikipedia (preferably just En)?
Gregory Kohs had posted on 4/24/07 a link on Wikipedia Review to this site:
http://www.online-utility.org/wikipedia/top_500_websites_wikipedia.jsp
Thanks. I also found that it's possible to download just the external links for the en.wiki from download.wikimedia.org. It's still nearly a gig if not more uncompressed, but I should be able to write a quick parser for it.
Anthony
http://www.online-utility.org/wikipedia/top_500_websites_wikipedia.jsp
Unless I'm misreading something, that's not even close to accurate. They claim Amazon had only 1721 links in Wikipedia in November 2006, but according to Special:Linksearch they had over 19000 links a month before that and they have over 25000 now.
Angela
On 5/1/07, Angela beesley@gmail.com wrote:
http://www.online-utility.org/wikipedia/top_500_websites_wikipedia.jsp
Unless I'm misreading something, that's not even close to accurate. They claim Amazon had only 1721 links in Wikipedia in November 2006, but according to Special:Linksearch they had over 19000 links a month before that and they have over 25000 now.
How do you get Special:Linksearch to give you the count? Just increase the view size until it fits?
I parsed the external links table (using zcat, as it's over 2 gigs uncompressed), and managed to extract 14181297 links before my script broke. I realized a problem though: I'm really only interested in links from the article namespace, so I've gotta download and parse another table to get that. Anyway, from my count www.amazon.com had 22985 links, so I guess my script broke before I got them all. If you want any more information contact me. Here's my table of the top 20:
6122405 | en.wikipedia.org 0642644 | www.google.com 0349654 | wikimediafoundation.org 0322938 | tools.wikimedia.de 0155488 | www.britannica.com 0121251 | www.bartleby.com 0110458 | encarta.msn.com 0108980 | scienceworld.wolfram.com 0095577 | www.imdb.com 0073350 | maps.google.com 0064746 | creativecommons.org 0061350 | www.rhaworth.myby.co.uk 0057467 | news.bbc.co.uk 0056602 | local.live.com 0051487 | www.nlm.nih.gov 0044527 | www.findagrave.com 0038422 | babelfish.altavista.com 0034436 | www.wikimapia.org 0033752 | terraserver-usa.com 0031066 | topozone.com
On 5/1/07, Anthony wikilegal@inbox.org wrote:
How do you get Special:Linksearch to give you the count? Just increase the view size until it fits?
Make a rough guess and change &offset= until you're close enough.
Angela
On 5/1/07, Angela beesley@gmail.com wrote:
Unless I'm misreading something, that's not even close to accurate. They claim Amazon had only 1721 links in Wikipedia in November 2006, but according to Special:Linksearch they had over 19000 links a month before that and they have over 25000 now.
Rereading that thread apparently they parsed the data specially just looking for "citations" in some way, exclusing external links that weren't that (it seems like). But yeah, in hindsight, it looks flawed.