Guillaume,
a bunch of references off the top of my head to get you started
Getting to the source: where does Wikipedia get its information from? http://www.opensym.org/wsos2013/proceedings/p0203-ford.pdf H Ford, S Sen, DR Musicant, N Miller - Proceedings of the 9th international symposium on Wikis, 2013
{{Citation needed}}: The dynamics of referencing in Wikipedia http://dl.acm.org/citation.cfm?id=2462943 Chih-Chun Chen, Camille Roth. WikiSym 8th Intl Symposium on Wikis, Linz, Austria, Aug 2012
Top hosts referenced in English Wikipedia http://inkdroid.org/journal/2010/08/21/top-hosts-referenced-in-english-wikipedia/ Ed Summers, inkdroid (2010)
Top news cites referenced from wikipedia. https://finnaarupnielsen.wordpress.com/2010/08/25/top-news-cites-referenced-from-wikipedia/ F.A. Nielsen (2010)
There’s also a growing body of data on citations of scholarly references in Wikipedia
Scientific citations in Wikipedia http://arxiv.org/pdf/0705.2106.pdf F. A. Nielsen, ArXiv (2008)
Wikipedia Cite-o-Meter http://tools.wmflabs.org/cite-o-meter/ http://tools.wmflabs.org/cite-o-meter/
Crossref’s cronograph http://chronograph.labs.crossref.org/domains/wikipedia.org http://chronograph.labs.crossref.org/domains/wikipedia.org
Scholarly article citations in Wikipedia http://dx.doi.org/10.6084/m9.figshare.1299540 http://dx.doi.org/10.6084/m9.figshare.1299540
Dario
On Apr 17, 2015, at 11:34 AM, Guillaume Paumier gpaumier@wikimedia.org wrote:
Hello, fellow researchers,
I'm looking to see if any research has been done recently around external links in Wikipedia, and more specifically external links contained in references. My main goal is to identify the most cited domains, ideally with their count.
Labs tools or similar that could help in this regard are also welcome. I haven't been able to find much so far, and before I dive into the database myself, I'd like to check I haven't missed anything obvious :)
The context for this work is "citoid" [1], the new Zotero-based citation tool used by VisualEditor to automatically fetch and format references using only their URL. Zotero works well for many scientific journals, but is weaker with regard to newspapers and non-English sources.
By looking into the most used URLs/domains in citations, I'm hoping to identify those that are not yet properly supported by Zotero and, by extension, citoid. This would then help developers focus their efforts on adding support for URLs to the most high-value websites.
Any pointers you might have are very welcome :)
Thanks,
[1] https://www.mediawiki.org/wiki/Citoid
-- Guillaume Paumier
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l