Greetings everyone,
Now that the the WMF summer research program in the Community Department has come to a close, I wanted to point interested parties to the body of findings we've produced.
We covered a lot of territory so to save you the trouble if you just want to browse, we collected our most salient results into one wiki page.
- Relevant blog post here: http://blog.wikimedia.org/2011/09/06/summer-research-findings/
- Summary of findings on Meta, with links to further documentation: https://secure.wikimedia.org/wikipedia/meta/wiki/Research:Wikimedia_Summer_o...
Next steps are twofold for this program:
1. We'll be working with the Global Development team and some volunteers from the local community to extend these analyses to cover Portuguese Wikipedia, specifically to support Global Dev's work in Brazil. 2. We're choosing and implementing a platform to release not just our code, but the datasets we compiled over the summer. You'll hear more about this soon, but we're taking our time in order to decide on a solution that will work in the long term for sharing open data beyond the dumps.
Last but not least, if anyone would like to have a more in-depth discussion about these findings and the research that produced them, I'm definitely open to hosting an IRC office hours with some members of the team. Just let me know if you're interested (on or offlist) and I'll set something up soon.
Thanks Steven, and the Community Department.
I am instantly drawn to the analysis of redlinks. Can we please have this data!! Article writers are on stand by ready to kill red links ;-)
The special page for this is dead.
http://en.wikipedia.org/wiki/Special:WantedPages
-- John Vandenberg
Thanks for the interest, John! I put the list of the top 250 up at http://en.wikipedia.org/wiki/Wikipedia:Most_wanted_articles -- but I didn't exactly publicize it. I guess this is my chance to do so now! Also, a list of the top 1000 redlinked articles is up on a separate page at http://en.wikipedia.org/wiki/Wikipedia:Most_wanted_articles/July_2011 and the entire dataset is up at http://toolserver.org/~swalker/redlink_list.csv -- note that it is 42.8mb!
If you have any other questions about the redlinks/bluelinks dataset, feel free to ask me. And you can check out the meta page for more fun links data, such as how many more links we added between 2009 and 2011, or incoming links to articles about countries / each country's population: http://meta.wikimedia.org/wiki/Research:One_Link,_Two_Links,_Red_Links,_Blue...
Stuart
---- Stuart Geiger User:Staeiou / @staeiou Ph.D student, UC-Berkeley School of Information
On Tue, Sep 6, 2011 at 10:19 AM, John Vandenberg jayvdb@gmail.com wrote:
Thanks Steven, and the Community Department.
I am instantly drawn to the analysis of redlinks. Can we please have this data!! Article writers are on stand by ready to kill red links ;-)
The special page for this is dead.
http://en.wikipedia.org/wiki/Special:WantedPages
-- John Vandenberg
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On Tue, 6 Sep 2011 10:45:41 -0700, "R.Stuart Geiger" sgeiger@gmail.com wrote:
Thanks for the interest, John! I put the list of the top 250 up at http://en.wikipedia.org/wiki/Wikipedia:Most_wanted_articles -- but I didn't exactly publicize it. I guess this is my chance to do so now! Also, a list of the top 1000 redlinked articles is up on a separate page at http://en.wikipedia.org/wiki/Wikipedia:Most_wanted_articles/July_2011 and the entire dataset is up at http://toolserver.org/~swalker/redlink_list.csv -- note that it is 42.8mb!
If you have any other questions about the redlinks/bluelinks dataset, feel free to ask me. And you can check out the meta page for more fun links data, such as how many more links we added between 2009 and 2011, or incoming links to articles about countries / each country's population:
http://meta.wikimedia.org/wiki/Research:One_Link,_Two_Links,_Red_Links,_Blue...
Stuart
Stuart Geiger User:Staeiou / @staeiou Ph.D student, UC-Berkeley School of Information
On Tue, Sep 6, 2011 at 10:19 AM, John Vandenberg jayvdb@gmail.com
wrote:
Thanks Steven, and the Community Department.
I am instantly drawn to the analysis of redlinks. Can we please have this data!! Article writers are on stand by ready to kill red links ;-)
The special page for this is dead.
http://en.wikipedia.org/wiki/Special:WantedPages
-- John Vandenberg
From what I see, the page http://en.wikipedia.org/wiki/Special:WantedPages
is just misleading: For instance, one of the most ranking missing articles, [[Alison Campbell]], has all 5000+ links leading not from other articles, but from article talk pages, where it is not explicitly present, which means someone put this red link into one of the highly used templates for project evaluations (I did not investigate which one). I actually doubt that the person is even notable, though there is a short stub in Dutch Wikipedia. There is no way that this is really one of the most wanted articles. Others I tried from the first page share the same problem.
Cheers Yaroslav
On 8 September 2011 10:58, Yaroslav M. Blanter putevod@mccme.ru wrote:
From what I see, the page http://en.wikipedia.org/wiki/Special:WantedPages is just misleading: For instance, one of the most ranking missing articles, [[Alison Campbell]], has all 5000+ links leading not from other articles, but from article talk pages, where it is not explicitly present, which means someone put this red link into one of the highly used templates for project evaluations (I did not investigate which one). I actually doubt that the person is even notable, though there is a short stub in Dutch Wikipedia. There is no way that this is really one of the most wanted articles. Others I tried from the first page share the same problem.
It's in a project-specific to-do list - for a fairly minor project, as these things go, but even a smallish project on enwiki has a lot of articles!
http://en.wikipedia.org/wiki/Template:Northern_Ireland_tasks
(If anyone's wondering, Alison Clarke is the former Miss Northern Ireland, engaged to marry a prominent sportsman, and thus presumably something of a minor local celebrity. I make no comment on notability.)
For future research on redlinks, it would definitely be worth distinguishing between "links in article text" and "links from projectspace / inline templates". Technically more difficult to figure out, of course, but that's why we call them researchers ;-)
I would be interested to know what the most wanted pages would be if all links from templates were excluded. If I introduce a redlink into a template that's transcluded on 2000 pages, it immediately becomes a most wanted article. I'd also be very interested in seeing this data for other Wikipedias, particularly Spanish (es) and Serbo-Croatian (sh).
2011/9/6 R.Stuart Geiger sgeiger@gmail.com
Thanks for the interest, John! I put the list of the top 250 up at http://en.wikipedia.org/wiki/Wikipedia:Most_wanted_articles -- but I didn't exactly publicize it. I guess this is my chance to do so now! Also, a list of the top 1000 redlinked articles is up on a separate page at http://en.wikipedia.org/wiki/Wikipedia:Most_wanted_articles/July_2011 and the entire dataset is up at http://toolserver.org/~swalker/redlink_list.csv -- note that it is 42.8mb!
If you have any other questions about the redlinks/bluelinks dataset, feel free to ask me. And you can check out the meta page for more fun links data, such as how many more links we added between 2009 and 2011, or incoming links to articles about countries / each country's population: http://meta.wikimedia.org/wiki/Research:One_Link,_Two_Links,_Red_Links,_Blue...
Stuart
Stuart Geiger User:Staeiou / @staeiou Ph.D student, UC-Berkeley School of Information
On Tue, Sep 6, 2011 at 10:19 AM, John Vandenberg jayvdb@gmail.com wrote:
Thanks Steven, and the Community Department.
I am instantly drawn to the analysis of redlinks. Can we please have this data!! Article writers are on stand by ready to kill red links ;-)
The special page for this is dead.
http://en.wikipedia.org/wiki/Special:WantedPages
-- John Vandenberg
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
The interesting thing here is, 4.8M unique red links in 2009, and unique 5.6M red links in 2011. *The more articles are created, the more articles are missing*.
2011/9/6 Steven Walling swalling@wikimedia.org
Greetings everyone,
Now that the the WMF summer research program in the Community Department has come to a close, I wanted to point interested parties to the body of findings we've produced.
We covered a lot of territory so to save you the trouble if you just want to browse, we collected our most salient results into one wiki page.
- Relevant blog post here:
http://blog.wikimedia.org/2011/09/06/summer-research-findings/
- Summary of findings on Meta, with links to further documentation:
https://secure.wikimedia.org/wikipedia/meta/wiki/Research:Wikimedia_Summer_o...
Next steps are twofold for this program:
- We'll be working with the Global Development team and some
volunteers from the local community to extend these analyses to cover Portuguese Wikipedia, specifically to support Global Dev's work in Brazil. 2. We're choosing and implementing a platform to release not just our code, but the datasets we compiled over the summer. You'll hear more about this soon, but we're taking our time in order to decide on a solution that will work in the long term for sharing open data beyond the dumps.
Last but not least, if anyone would like to have a more in-depth discussion about these findings and the research that produced them, I'm definitely open to hosting an IRC office hours with some members of the team. Just let me know if you're interested (on or offlist) and I'll set something up soon.
-- Steven Walling Fellow at Wikimedia Foundation wikimediafoundation.org
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On 10/09/2011, at 23:04, emijrp emijrp@gmail.com wrote:
The interesting thing here is, 4.8M unique red links in 2009, and unique 5.6M red links in 2011. *The more articles are created, the more articles are missing*.
Along those lines, I recall seeing (at least three years ago) some research that said the proportion of redlinks was remaining stable even as the number of articles grew. They hypothesised that if the proportion decreased then that would imply that we would eventually stop and "finish" the encyclopedia. And on the other hand if the proportion of redlinks increased that it would imply that the project would eventually decay through too much entropy. Instead of the two extremes the research said that, a bit like goldilocks, the growth was "just right" and could continue indefinitely. Does anyone else remember this research or it's name/author?
-Liam
Wittylama.com/blog Peace, love & metadata
On Sun, Sep 11, 2011 at 3:04 AM, Liam Wyatt liamwyatt@gmail.com wrote:
On 10/09/2011, at 23:04, emijrp emijrp@gmail.com wrote:
The interesting thing here is, 4.8M unique red links in 2009, and unique 5.6M red links in 2011. *The more articles are created, the more articles are missing*.
Along those lines, I recall seeing (at least three years ago) some research that said the proportion of redlinks was remaining stable even as the number of articles grew. They hypothesised that if the proportion decreased then that would imply that we would eventually stop and "finish" the encyclopedia. And on the other hand if the proportion of redlinks increased that it would imply that the project would eventually decay through too much entropy. Instead of the two extremes the research said that, a bit like goldilocks, the growth was "just right" and could continue indefinitely. Does anyone else remember this research or it's name/author?
http://dl.acm.org/citation.cfm?id=1378720 Spinellis/Louridas, "The collaborative organization of knowledge", complemented in http://www.spinellis.gr/blog/20080808/ and summarized in https://secure.wikimedia.org/wikipedia/en/wiki/Wikipedia:Wikipedia_Signpost/...
wikimedia-l@lists.wikimedia.org