Greetings everyone,
Now that the the WMF summer research program in the Community Department has come to a close, I wanted to point interested parties to the body of findings we've produced.
We covered a lot of territory so to save you the trouble if you just want to browse, we collected our most salient results into one wiki page.
- Relevant blog post here: http://blog.wikimedia.org/2011/09/06/summer-research-findings/
- Summary of findings on Meta, with links to further documentation: https://secure.wikimedia.org/wikipedia/meta/wiki/Research:Wikimedia_Summer_o...
Next steps are twofold for this program:
1. We'll be working with the Global Development team and some volunteers from the local community to extend these analyses to cover Portuguese Wikipedia, specifically to support Global Dev's work in Brazil. 2. We're choosing and implementing a platform to release not just our code, but the datasets we compiled over the summer. You'll hear more about this soon, but we're taking our time in order to decide on a solution that will work in the long term for sharing open data beyond the dumps.
Last but not least, if anyone would like to have a more in-depth discussion about these findings and the research that produced them, I'm definitely open to hosting an IRC office hours with some members of the team. Just let me know if you're interested (on or offlist) and I'll set something up soon.
Thanks Steven, and the Community Department.
I am instantly drawn to the analysis of redlinks. Can we please have this data!! Article writers are on stand by ready to kill red links ;-)
The special page for this is dead.
http://en.wikipedia.org/wiki/Special:WantedPages
-- John Vandenberg
Thanks for the interest, John! I put the list of the top 250 up at http://en.wikipedia.org/wiki/Wikipedia:Most_wanted_articles -- but I didn't exactly publicize it. I guess this is my chance to do so now! Also, a list of the top 1000 redlinked articles is up on a separate page at http://en.wikipedia.org/wiki/Wikipedia:Most_wanted_articles/July_2011 and the entire dataset is up at http://toolserver.org/~swalker/redlink_list.csv -- note that it is 42.8mb!
If you have any other questions about the redlinks/bluelinks dataset, feel free to ask me. And you can check out the meta page for more fun links data, such as how many more links we added between 2009 and 2011, or incoming links to articles about countries / each country's population: http://meta.wikimedia.org/wiki/Research:One_Link,_Two_Links,_Red_Links,_Blue...
Stuart
---- Stuart Geiger User:Staeiou / @staeiou Ph.D student, UC-Berkeley School of Information
On Tue, Sep 6, 2011 at 10:19 AM, John Vandenberg jayvdb@gmail.com wrote:
Thanks Steven, and the Community Department.
I am instantly drawn to the analysis of redlinks. Can we please have this data!! Article writers are on stand by ready to kill red links ;-)
The special page for this is dead.
http://en.wikipedia.org/wiki/Special:WantedPages
-- John Vandenberg
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
The interesting thing here is, 4.8M unique red links in 2009, and unique 5.6M red links in 2011. *The more articles are created, the more articles are missing*.
2011/9/6 Steven Walling swalling@wikimedia.org
Greetings everyone,
Now that the the WMF summer research program in the Community Department has come to a close, I wanted to point interested parties to the body of findings we've produced.
We covered a lot of territory so to save you the trouble if you just want to browse, we collected our most salient results into one wiki page.
- Relevant blog post here:
http://blog.wikimedia.org/2011/09/06/summer-research-findings/
- Summary of findings on Meta, with links to further documentation:
https://secure.wikimedia.org/wikipedia/meta/wiki/Research:Wikimedia_Summer_o...
Next steps are twofold for this program:
- We'll be working with the Global Development team and some
volunteers from the local community to extend these analyses to cover Portuguese Wikipedia, specifically to support Global Dev's work in Brazil. 2. We're choosing and implementing a platform to release not just our code, but the datasets we compiled over the summer. You'll hear more about this soon, but we're taking our time in order to decide on a solution that will work in the long term for sharing open data beyond the dumps.
Last but not least, if anyone would like to have a more in-depth discussion about these findings and the research that produced them, I'm definitely open to hosting an IRC office hours with some members of the team. Just let me know if you're interested (on or offlist) and I'll set something up soon.
-- Steven Walling Fellow at Wikimedia Foundation wikimediafoundation.org
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On 9/10/2011 5:04 PM, emijrp wrote:
The interesting thing here is, 4.8M unique red links in 2009, and unique 5.6M red links in 2011. /The more articles are created, the more articles are missing/.
Doesn't surprise me; my rough calculations (http://en.wikipedia.org/wiki/User:Piotrus/Wikipedia_interwiki_and_specialize...) suggest Wikipedia is not even a tenth-complete at this point (just talking about existing articles, and not their quality).
Interesting Piotr, I'm working in a similar approach http://en.wikipedia.org/wiki/User:Emijrp/All_human_knowledge
2011/9/12 Piotr Konieczny pik1@pitt.edu
On 9/10/2011 5:04 PM, emijrp wrote:
The interesting thing here is, 4.8M unique red links in 2009, and unique 5.6M red links in 2011. *The more articles are created, the more articles are missing*.
Doesn't surprise me; my rough calculations ( http://en.wikipedia.org/wiki/User:Piotrus/Wikipedia_interwiki_and_specialize...) suggest Wikipedia is not even a tenth-complete at this point (just talking about existing articles, and not their quality).
-- Piotr Konieczny PhD Candidate Dept of Sociology Uni of Pittsburgh http://pittsburgh.academia.edu/PiotrKonieczny/http://en.wikipedia.org/wiki/U...
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org