Ed Summers has done some nice analysis of the top hosts referenced in article space, based on SQL dumps: http://inkdroid.org/journal/2010/08/25/top-hosts-referenced-in-wikipedia-par...
People with more in-depth knowledge might make something of this -- for instance the importance of bots in external links, or the prevalence of certain types of information.
For instance, why/where/how is edwardbetts.com used? (doesn't seem to be postcode data, which was my first guess)
See also his linkypedia code: http://github.com/edsu/linkypedia
-Jodi
On 25/08/10 13:41, Jodi Schneider wrote:
Ed Summers has done some nice analysis of the top hosts referenced in article space, based on SQL dumps: http://inkdroid.org/journal/2010/08/25/top-hosts-referenced-in-wikipedia-par...
People with more in-depth knowledge might make something of this -- for instance the importance of bots in external links, or the prevalence of certain types of information.
For instance, why/where/how is edwardbetts.com used? (doesn't seem to be postcode data, which was my first guess)
That would be this:
http://edwardbetts.com/findlink
Enter the title of an article and it will find other articles that could link to it.
For an example see this:
http://edwardbetts.com/findlink?q=planned+community
On Wed, Aug 25, 2010 at 5:03 PM, Edward Betts edward@archive.org wrote:
That would be this:
http://edwardbetts.com/findlink
Enter the title of an article and it will find other articles that could link to it.
For an example see this:
I imagine that the "suggestions are available" link in orphan template [1] might be the source of a lot of these?
Thanks for bringing up this question on list Jodi. I actually was wondering about all the values for page_namespace in the the page table dump [2]. I assumed that "0" was article, but wasn't sure about the many other integer values. Is there a lookup available in some other dump, or elsewhere?
//Ed
[1] http://en.wikipedia.org/wiki/Template:Orphan [2] http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-page.sql.gz
On 25/08/10 14:13, Ed Summers wrote:
On Wed, Aug 25, 2010 at 5:03 PM, Edward Betts edward@archive.org wrote:
I imagine that the "suggestions are available" link in orphan template [1] might be the source of a lot of these?
I had no idea that the Template:Orphan linked to my 'find link' prototype. I should fix the bugs and update the content of my search index.
On 25 Aug 2010, at 22:13, Ed Summers wrote:
On Wed, Aug 25, 2010 at 5:03 PM, Edward Betts edward@archive.org wrote:
That would be this:
http://edwardbetts.com/findlink
Enter the title of an article and it will find other articles that could link to it.
For an example see this:
I imagine that the "suggestions are available" link in orphan template [1] might be the source of a lot of these?
Ah, thanks! I did a search, but only found one page. I guess the template obscures them!
Thanks for bringing up this question on list Jodi. I actually was wondering about all the values for page_namespace in the the page table dump [2]. I assumed that "0" was article, but wasn't sure about the many other integer values. Is there a lookup available in some other dump, or elsewhere?
Codes are here: http://en.wikipedia.org/wiki/Wikipedia:Namespace
based on the built-in namespaces: http://www.mediawiki.org/wiki/Manual:Namespace#Built-in_namespaces
-Jodi
//Ed
[1] http://en.wikipedia.org/wiki/Template:Orphan [2] http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-page.sql.gz
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On Wed, 25 Aug 2010, Jodi Schneider wrote:
Ed Summers has done some nice analysis of the top hosts referenced in article space, based on SQL dumps: http://inkdroid.org/journal/2010/08/25/top-hosts-referenced-in-wikipedia-par...
People with more in-depth knowledge might make something of this -- for instance the importance of bots in external links, or the prevalence of certain types of information.
A few weeks ago I looked on hosts in the cite news template, and I get results in line with Ed Summers with news.bbc.co.uk on the top and New York Times second:
1 14443 news.bbc.co.uk 2 3224 www.nytimes.com 3 2729 query.nytimes.com 4 2675 www.washingtonpost.com 5 1838 www.cnn.com 6 1781 www.guardian.co.uk 7 1584 www.time.com 8 1443 www.telegraph.co.uk 9 1420 www.smh.com.au 10 1278 www.usatoday.com 11 1198 www.abc.net.au 12 1119 www.variety.com 13 1026 select.nytimes.com 14 1006 www.theage.com.au 15 1005 www.timesonline.co.uk 16 987 www.sfgate.com 17 975 sports.espn.go.com 18 969 www.msnbc.msn.com 19 913 findarticles.com 20 904 www.news.com.au
There is a short commentary here:
http://fnielsen.posterous.com/top-news-cites-referenced-from-wikipedia
/Finn
___________________________________________________________________
Finn Aarup Nielsen, DTU Informatics, Denmark Lundbeck Foundation Center for Integrated Molecular Brain Imaging http://www.imm.dtu.dk/~fn/ http://nru.dk/staff/fnielsen/ ___________________________________________________________________
wiki-research-l@lists.wikimedia.org