Wiki-research-l February 2013

wiki-research-l@lists.wikimedia.org

20 participants
15 discussions

by song＠cs.umn.edu

Pursuant to prior discussions about the need for a research policy on Wikipedia, WikiProject Research is drafting a policy regarding the recruitment of Wikipedia users to participate in studies. At this time, we have a proposed policy, and an accompanying group that would facilitate recruitment of subjects in much the same way that the Bot Approvals Group approves bots. The policy proposal can be found at: http://en.wikipedia.org/wiki/Wikipedia:Research The Subject Recruitment Approvals Group mentioned in the proposal is being described at: http://en.wikipedia.org/wiki/Wikipedia:Subject_Recruitment_Approvals_Group Before we move forward with seeking approval from the Wikipedia community, we would like additional input about the proposal, and would welcome additional help improving it. Also, please consider participating in WikiProject Research at: http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Research -- Bryan Song GroupLens Research University of Minnesota

9 months, 2 weeks

Statistics on wiki page editing/creation by country?

by alina ostling

Hi! I am doing a PhD on online civic participation project (e-participation). Within my research, I have carried out a user survey, where I asked how many people ever edited/created a page on a Wiki. Now I would like to compare the results with the overall rate of wiki editing/creation on country level. I've found some country-level statistics on Wikipedia Statistics (e.g. 3,000 editors of Wikipedia articles in Italy) but data for UK and France are not available since Wikipedia provides statistics by languages, not by countries. I'm thus looking for statistics on UK and France (but am also interested in alternative ways of measuring wiki editing/creation in Sweden and Italy). I would be grateful for any tips! Sunny regards, Alina -- Alina ÖSTLING PhD Candidate European University Institute www.eui.eu

9 years, 7 months

WikiPapers has now over 1,000 publications

by emijrp

Hi all; WikiPapers has reached recently the 1,000 publications milestone.[1] Looks like the publication rate peaked in 2009 and has plateaued in the last 3 years. I continue adding more data... but with little help. Don't you like editing wikis? ; ) Regards, emijrp [1] http://wikipapers.referata.com/wiki/List_of_publications -- Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com Pre-doctoral student at the University of Cádiz (Spain) Projects: AVBOT <http://code.google.com/p/avbot/> | StatMediaWiki<http://statmediawiki.forja.rediris.es> | WikiEvidens <http://code.google.com/p/wikievidens/> | WikiPapers<http://wikipapers.referata.com> | WikiTeam <http://code.google.com/p/wikiteam/> Personal website: https://sites.google.com/site/emijrp/

10 years, 6 months

A wiki search engine

by emijrp

Hi all; I'm starting a new project, a wiki search engine. It uses MediaWiki, Semantic MediaWiki and other minor extensions, and some tricky templates and bots. I remember Wikia Search and how it failed. It had the mini-article thingy for the introduction, and then a lot of links compiled by a crawler. Also something similar to a social network. My project idea (which still needs a cool name) is different. Althought it uses an introduction and images copied from Wikipedia, and some links from the "External links" sections, it is only a start. The purpose is that community adds, removes and orders the results for each term, and creates redirects for similar terms to avoid duplicates. Why this? I think that Google PageRank isn't enough. It is frequently abused by farmlinks, SEOs and other people trying to put their websites above. Search "Shakira" in Google for example. You see 1) Official site, 2) Wikipedia 3) Twitter 4) Facebook, then some videos, some news, some images, Myspace. It wastes 3 or more results in obvious nice sites (WP, TW, FB). The wiki search engine puts these sites in the top, and an introduction and related terms, leaving all the space below to not so obvious but interesting websites. Also, if you search for "semantic queries" like "right-wing newspapers" in Google, you won't find real newspapers but "people and sites discussing about ring-wing newspapers". Or latex and LaTeX being shown in the same results pages. These issues can be resolved with disambiguation result pages. How we choose which results are above or below? The rules are not fully designed yet, but we can put official sites in the first place, then .gov or .edu domains which are important ones, and later unofficial websites, blogs, giving priority to local language, etc. And reaching consensus. We can control aggresive spam with spam blacklists, semi-protect or protect highly visible pages, and use bots or tools to check changes. It obviously has a CC BY-SA license and results can be exported. I think that this approach is the opposite to Google today. For weird queries like "Albert Einstein birthplace" we can redirect to the most obvious results page (in this case Albert Einstein) using a hand-made redirect or by software (some little change in MediaWiki). You can check a pretty alpha version here http://www.todogratix.es (only Spanish by now sorry) which I'm feeding with some bots. I think that it is an interesting experiment. I'm open to your questions and feedback. Regards, emijrp -- Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com Pre-doctoral student at the University of Cádiz (Spain) Projects: AVBOT <http://code.google.com/p/avbot/> | StatMediaWiki<http://statmediawiki.forja.rediris.es> | WikiEvidens <http://code.google.com/p/wikievidens/> | WikiPapers<http://wikipapers.referata.com> | WikiTeam <http://code.google.com/p/wikiteam/> Personal website: https://sites.google.com/site/emijrp/

10 years, 8 months

Inventory of articles with video?

by Andrew Lih

Hi all, I'm wondering if anyone has done any research into identifying which articles in Wikipedia have associated video? There is this category, which only has 280 or so articles: http://en.wikipedia.org/wiki/Category:Articles_containing_video_clips It seems far from complete. Appreciate any advice or previous work in this area. The background: I'm working with some grad students on staging a Wiki Makes Video contest in April, and we'd like to do some measurement of the current state of video in Wikipedia. Thanks, and email me if you'd like to know more about the video project for April. -Andrew

11 years, 1 month

"whitepaper" tool to find PDFs of articles

by Sumana Harihareswara

https://github.com/wilkie/whitepaper "This gem will perform a whitepaper lookup on major scholarly databases. Its purpose is to easily find related papers and organize your paper collection. With this application, you can easily download pdfs or use it as a library to automatically assign metadata. "Currently, CiteSeerX, ACM and IEEE are the only databases it uses along with a google pdf/ps search to find other pdf or ps links to download." The author says it is just for personal use. -- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation

11 years, 1 month

Documenting WMF-related data souces: some questions to help me do it better.

by Maria Miteva

Hi everyone, I am working on creating a single entry page describing all the data about Wikipedia and WMF projects available for researchers. The idea is to have a single location, which introduces all possible source of data and makes it easy for a newbie to understand what suits his/her needs and how to get and work with the data. This is meant to be useful to the users ( which is you), so I have a few questions to help me make it better: 1. I was wondering if any of you has used data from sources other than the listed below and if yes, what? • XML dumps • the API • the Toolserver (or it's future replacement on WMF Labs) • our live IRC feeds • our raw hourly pageview data dumps (and the rudimentary API that you can use to query them atstats.grok.se) • the sources listed on our (experimental) open data registry on the DataHub http://datahub.io/group/wikimedia (includes DBpedia) 2. Is there any specific information that you wished you had known when you started using WMF data but is not documented online? 3. Do you have any datasets or tools for parsing/manipulating/visualizing data, which you think can be reused and you want to share? (Could be something you built or something you found and liked) 4. What information should be included about each source. I am thinking about : 1. description of the data - content, format , method of collection or how you can collect it, how often it is collected, for what period 2. skills required to get and work with the data ( PHP, SQL, etc.) 3. short sample 4. existing tools - for parsing, importing, etc. 5. maybe examples of projects where it was used? Any other comments/suggestions will be appreciated. Thank you in advance. Mariya

11 years, 1 month

Looking for mirrors for Data dumps

by Maria Miteva

Hi everyone, As you can see on top of https://meta.wikimedia.org/wiki/Data_dumps, WMF is actively looking for help archiving and distributing data dumps. It would be great if you could check with the institutions you are associated with if they have available storage and bandwidth to donate. It would make it easier to keep better dump archives and improve downlaod speed. You can see more about the requirements here<http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Requir…>. Ariel Glenn(in CC) will be happy to help anyone willing to host a mirror. Thank you. Mariya

11 years, 1 month

visualization tool sought (FLOSS)

by koltzenburg＠w4w.net

hi everybody, can you recommend any FLOSS tool for visualizing the following for a certain sample from namespaces 0 and 1 * language linking revisions for a sample of topically related pages ("how has interwiki linking been revised over time in this and that article in this and that version?") * individual user activity in more than one WP version, in a sample group of about 30 languages ("which other version has user x contributed to on this topic?") * temporal aspects of possible relatedness in revision frequency ("when was revision activity highest in certain language versions of this sample?") thanks & cheers, Claudia koltzenburg(a)w4w.net

11 years, 1 month

Modeling Wikipedia admin elections using multidimensional behavioral social networks

by Everton Zanella Alvarenga

Abstract: Wikipedia admins are editors entrusted with special privileges and duties, responsible for the community management of Wikipedia. They are elected using a special procedure defined by the Wikipedia community, called Request for Adminship (RfA). Because of the growing amount of management work (quality control, coordination, maintenance) on the Wikipedia, the importance of admins is growing. At the same time, there exists evidence that the admin community is growing more slowly than expected. We present an analysis of the RfA procedure in the Polish-language Wikipedia, since the procedure’s introduction in 2005. With the goal of discovering good candidates for new admins that could be accepted by the community, we model the admin elections using multidimensional behavioral social networks derived from the Wikipedia edit history. We find that we can classify the votes in the RfA procedures using this model with an accuracy level that should be sufficient to recommend candidates. We also propose and verify interpretations of the dimensions of the social network. We find that one of the dimensions, based on discussion on Wikipedia talk pages, can be validly interpreted as acquaintance among editors, and discuss the relevance of this dimension to the admin elections. Link: http://link.springer.com/article/10.1007/s13278-012-0092-6 >From the conclusion: "[...] We have noticed the decreasing amount of successful admin elections and have formulated two hypotheses that could explain this phenomenon. Hypothesis A stated that new admins are elected on the basis of acquaintance of the voter and candidate. If this would be a valid explanation, we could conclude that the community of admins is becoming increasingly closed, which would be detrimental to the sustainable development of the Wikipedia. Hypothesis B stated that new admins are elected on the basis of similarity of experience in editing various topics of the voter and candidate. Since voters are other active admins whose experience increases with time, their thresholds of accepting a candidate are likely to increase (as has been observed from the simple statistics of RfA votings)." I would love to see this research on other Wikipedias. Tom -- Everton Zanella Alvarenga (also Tom) "A life spent making mistakes is not only more honorable, but more useful than a life spent doing nothing."

11 years, 2 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l February 2013