Wiki-research-l April 2015

wiki-research-l@lists.wikimedia.org

47 participants
34 discussions

Re: [Wiki-research-l] Grant Proposal: Request for Feedback
by aaron shaw 12 Apr '15

12 Apr '15

Greeting Christina! Thanks for sharing this and notifying us on the list. Overall, I am very supportive of additional attempts to do more rigorous survey research on Wikipedians. Some questions that I think you could try to address in the proposal: - *Sampling*: You mention that you plan to stratify your sample based on past edit history and recruit via talk page messages. However, beyond this you say nothing about the logistics of subject sampling, recruitment, or any approaches you will take to address the fact that conducting representative surveys in online communities is very, very difficult. Can you elaborate on this aspect of your study? In particular, how will your approach address shortcomings in data and sample quality that have affected previous surveys <http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0065782> of Wikipedia contributors? - *Self-report measures of edit history: *Why ask the respondents to self-report their edit histories (this kind of thing is notoriously hard to do accurately) when you could ask them to provide their usernames or at least link their usernames to their survey responses (since you're recruiting via talk page messages anyway)? - *Collaboration w related studies: *There are several other ongoing efforts to survey wikipedians -- even at least one other one <http://dl.acm.org/citation.cfm?id=2145264>(link to "official" publication is gated, but other versions are available for free) focused on social psychological concerns. Also, my impression is that the WMF is involved in planning another editor survey in the near future. How will your approach complement/extend/overlap with these other efforts? Will you make any effort to collaborate with these ongoing studies? How will your study avoid subject exhaustion -- especially among more active wikipedians who may find themselves invited to participate in many surveys? - *Missing measures and missing people:* Previous studies have shown that a variety of additional factors may figure in shaping the participation practices of Wikipedians as well as those who might edit Wikipedia but choose not to do so. For example, in a recent paper <http://www.tandfonline.com/doi/abs/10.1080/1369118X.2014.957711#.VSqb244adE4> (again, gated link, but I am also happy to provide copies to those who would like access) that I co-wrote w Eszter Hargittai, we find that web use skills are, in some ways, even more robust predictors of wikipedia contribution than gender. There are many other examples of important measures that predict participation in various ways as well, whether it be individual's trust/caution attitudes, newcomer experiences, etc. Which of these measures will you include? How will you ensure that you have included the most important measures in this survey study since survey results are otherwise quitre prone to omitted variable bias? *Missing people and sampling on the dependent variable: *Maybe most importantly, insofar as you say that you are interested in understanding factors that determine who edits, you are selecting on the dependent variable (wikipedia editing) by limiting your study to individuals who have accounts on the encyclopedia and edit already. It strikes me as especially egregious that you are requiring survey respondents to read and reply to the survey recruitment materials via talk page message. This means that precisely those individuals who participate least (and who would provide your study with necessary variation on the outcome of interest) are the least likely to respond and to be included in the study. As a result, I fear that your findings will not speak to these questions effectively unless you find an alternative method of sampling and recruitment. I hope that these comments are helpful for you as you continue to refine the study design. I really think you're pursuing a critical set of concerns in this study and I am eager to see it succeed in the most effective way possible! yours, Aaron On Sun, Apr 12, 2015 at 8:44 AM, Christina Shane-Simpson < christinam.shane(a)gmail.com> wrote: > Hello Fellow Wiki Researchers, > > I’ve recently posted a project proposal under the Inspire Campaign and > would love feedback from this community on the research proposal, *Characterization > of Editors on Wikipedia*: > > In order to accurately explore the main goals of the Inspire Campaign, > we must be able to effectively characterize our community. Any > interventions that we develop should reflect and match the needs of the > target population, requiring a thorough understanding of the traits and > behaviors of our community of editors. As a direct extension of the recent > gender gap research on Wikipedia and to explore other potential areas of > inequality, we’d like to conduct another study that compares the traits of > the super-editor, the active editor (moderate editing), and the inactive > editor (infrequent edits). > > The proposed project would use an online self-report survey that is posted > on editor talk pages. The research team has experience conducting online > surveys and will monitor responses on this survey to identify any potential > misuse of the survey (i.e. vandalism) and/or outliers in the data. This > entire project would only be implemented after an IRB approval from the > lead researcher's academic institution. > > Full proposal: > https://meta.wikimedia.org/wiki/Grants:IdeaLab/Characterization_of_Editors_… > Thank you in advance for your assistance in developing this proposal! > > Christina Shane-Simpson > Psychology Department > The College of Staten Island and > The Graduate Center, CUNY > >

1 0

Grant Proposal: Request for Feedback
by Christina Shane-Simpson 12 Apr '15

12 Apr '15

Hello Fellow Wiki Researchers, I’ve recently posted a project proposal under the Inspire Campaign and would love feedback from this community on the research proposal, Characterization of Editors on Wikipedia: In order to accurately explore the main goals of the Inspire Campaign, we must be able to effectively characterize our community. Any interventions that we develop should reflect and match the needs of the target population, requiring a thorough understanding of the traits and behaviors of our community of editors. As a direct extension of the recent gender gap research on Wikipedia and to explore other potential areas of inequality, we’d like to conduct another study that compares the traits of the super-editor, the active editor (moderate editing), and the inactive editor (infrequent edits). The proposed project would use an online self-report survey that is posted on editor talk pages. The research team has experience conducting online surveys and will monitor responses on this survey to identify any potential misuse of the survey (i.e. vandalism) and/or outliers in the data. This entire project would only be implemented after an IRB approval from the lead researcher's academic institution. Full proposal: https://meta.wikimedia.org/wiki/Grants:IdeaLab/Characterization_of_Editors_… Thank you in advance for your assistance in developing this proposal! Christina Shane-Simpson Psychology Department The College of Staten Island and The Graduate Center, CUNY

1 0

IdeaLab Grant Call for Feedback and Participants
by Jason Radford 09 Apr '15

09 Apr '15

Hi Everyone, I wanted to share two projects currently under consideration for IdeaLab funding and which may be of direct interest to the Wiki research community. Note, one purpose of the consciousness-raising repository is to create a collection of stories for use by researchers studying marginalized identities on Wikipedia. If you are interested or know someone who is interested, let me know. If you have feedback for these projects, please submit them to their discussion pages. Thanks! *Consciousness-Raising Repository Call for Working Group Participants* *Purpose*: We're recruiting a group of diverse Wikipedians to help put together a repository of stories from users experiencing marginalization on Wikipedia. The purpose of the repository is to serve as a database of knowledge about the forms marginalization can take for researchers studying marginalized identities and as an outlet for users experiencing marginalization. *Requirements*: Experience working with marginalization and marginalized groups. Interest in the Wiki-community. Willing to attend an hour-long biweekly meeting. *For more information*: https://meta.wikimedia.org/wiki/Grants:IdeaLab/A_Consciousness_Raising_Repo… *Wiki Controversy Monitoring Engine Call for Developers* *Purpose*: The controversy monitoring engine maintains a real-time rating of the controversiality of Wikipedia articles by listening to the live stream of edits from Wikipedia. We need someone who is interested in building the web interface and interactive visualizations around these controversies to enable administrators to monitor, investigate, and, if need be, intervene to deescalate controversies. The goal is to create a site like stats.wikimedia.org. *Requirements*: Knowledge of web development, web-based visualization, and\or data analysis using Wikipedia's API or WikiData. *For More Information*: see https://meta.wikimedia.org/wiki/Grants:IdeaLab/Controversy_Monitoring_Engine -- Jason Radford Doctoral Student, Sociology, University of Chicago Visiting Researcher, Lazer Lab, Northeastern University *Connect*: LinkedIn <http://www.linkedin.com/in/jsradford>, Twitter <http://www.twitter.com/jsradford>, University of Chicago <http://home.uchicago.edu/%7Ejsradford/> *Play Games for Science at Volunteer Science <http://www.volunteerscience.com>*

2 1

Does anyone have access to this research..?
by Shani 09 Apr '15

09 Apr '15

Hello fellow wiki-researchers, I've started working on a research for a seminar paper (which will probably evolve to an M.A. dissertation) on Wikipedia in Higher Education as part of my MA in "Technology in Learning" program at the School of Education, Tel-Aviv University. The research will focus on a elective course on Wikipedia for Med students, which I've developed and run at TAU. *I was wondering if any of you have access to this research -- * Aibar, E., Lerga, M., Lladós, J., Meseguer, A., & Minguillón, J. (2014). Wikipedia in Higher Education: an Empirical Study RUSC VOL. 11 No 2 Special Issue| Universitat Oberta de Catalunya and University of New England| Barcelona. Other than an abstract, I wasn't able to find it online (nor is it available through the university library) and I would really appreciate a copy (word, PDF, whatever you can get your hands on). Any help getting this article would be much appreciated, as well as any directions to other relevant existing researches on the topic. Cheers, Shani.

2 2

Re: [Wiki-research-l] Wiki-research-l Digest, Vol 116, Issue 16
by Ed H. Chi 09 Apr '15

09 Apr '15

Oliver, Here is one paper on mapping topic coverage in Wikipedia from 2009: Kittur, A., Chi, E. H., and Suh, B. 2009. What's in Wikipedia?: Mapping Topics and Conflict using Socially Annotated Category Structure <http://www-users.cs.umn.edu/~echi/papers/2009-CHI2009/p1509.pdf>. In Proceedings of the 27th International Conference on Human Factors in Computing Systems (Boston, MA, USA, April 04 - 09, 2009). CHI '09. ACM, New York, NY, 1509-1512. ACM Link <http://doi.acm.org/10.1145/1518701.1518930> (24% acceptance rate) --Ed (on my tablet) On Apr 8, 2015 05:01, <wiki-research-l-request(a)lists.wikimedia.org> wrote: > Send Wiki-research-l mailing list submissions to > wiki-research-l(a)lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > or, via email, send a message with subject or body 'help' to > wiki-research-l-request(a)lists.wikimedia.org > > You can reach the person managing the list at > wiki-research-l-owner(a)lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Wiki-research-l digest..." > > > Today's Topics: > > 1. Re: Gender-specific page titles (Flöck) > 2. Re: Research on Wikidata's content coverage (Flöck) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 8 Apr 2015 11:09:00 +0000 > From: Flöck, Fabian <Fabian.Floeck(a)gesis.org> > To: Research into Wikimedia content and communities > <wiki-research-l(a)lists.wikimedia.org> > Subject: Re: [Wiki-research-l] Gender-specific page titles > Message-ID: <7ECBA035-F388-4F89-9B65-A7C1C956AA2C(a)gesis.org> > Content-Type: text/plain; charset="utf-8" > > Interesting, thanks Mark! > > - Fabian > > On 07.04.2015, at 16:38, Mark J.Nelson <mjn(a)anadrome.org<mailto: > mjn(a)anadrome.org>> wrote: > > > Flöck, Fabian <Fabian.Floeck(a)gesis.org<mailto:Fabian.Floeck@gesis.org>> > writes: > > Does anyone know about a study that looks at how often for example > articles about a profession use the male instead of the female form as > the name (female form doesn't exist or is just a redirect)? > > It would probably not be a so much of an issue for English, but rather > Spanish, German, Russian etc. Concrete example: > https://de.wikipedia.org/wiki/Professor exists in German, but > https://de.wikipedia.org/wiki/Professorin is just a redirect. > > One thing to be careful of in such a study (though I would also like to > see it!) is tha the politics and preferences in this area vary widely > across languages, and sometimes within a language, so a purely > data-driven study has to be careful about its assumptions and > generalizations. > > Below a long-ish discussion of Greek that you may skip if not interested > (it ended up longer-winded than I had expected): > > For example in Greek it is very profession-specific whether the trend is > towards using a slashed form of both genders, or towards convergence on > a single form that applies to both genders (sometimes with atypical > morphology). Sometimes it depends on the specific word form and > historical usage. In fields that historically had both men and women, > both forms are very well established and tend to persist, e.g. a male > teacher is a δάσκαλος and a female one is a δασκάλα. But in fields that > were typically so male-dominated that only the masculine version has > been in common use, there's disagreement over whether it's more > progressive to "revive" a feminine form, or to generalize the masculine > form to cover both genders. For example a female president would > universally be called by the historically masculine form πρόεδρος, > but with a feminine article (i.e. πρόεδρος can now be either a masculine > or feminine noun, depending on context, even though it's morphologically > irregular as a feminine noun). There is in Byzantine Greek a feminine > analog, προέδρισσα (referring to a different position), but it isn't > used today outside humorous contexts (roughly where you might use > "Presidentess" in English). The same applies for a number of other more > common professions, but for some it's more disputed which form should be > used (for President there isn't any usage variance). > > In short it's complex, so I hope any data set is careful about what it's > counting as data, and why. :) > > -Mark > > -- > Mark J. Nelson > Anadrome Research > http://www.anadrome.org > > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > > > Cheers, > Fabian > > -- > Fabian Flöck > Research Associate > Computational Social Science department @GESIS > Unter Sachsenhausen 6-8, 50667 Cologne, Germany > Tel: + 49 (0) 221-47694-208 > fabian.floeck(a)gesis.org<mailto:fabian.floeck@gesis.org> > > www.gesis.org > www.facebook.com/gesis.org > > > > > >

2 1

Research on Wikidata's content coverage
by Oliver Keyes 08 Apr '15

08 Apr '15

Hey all, Is anyone aware of research on the completeness of Wikidata, in terms of coverage and systemic bias? This seems like the sort of thing Max Klein might know ;). Papers, blog posts, anything. -- Oliver Keyes Research Analyst Wikimedia Foundation

3 4

Anyone have access to this article?
by Jonathan Morgan 08 Apr '15

08 Apr '15

http://onlinelibrary.wiley.com/doi/10.1111/jcom.12123/abstract *"What Creates Interactivity in Online News Discussions? An Exploratory Analysis of Discussion Factors in User Comments on News Items"* If you have access, and can send me a PDF offline, I would be very grateful :) Cheers, Jonathan -- Jonathan T. Morgan Community Research Lead Wikimedia Foundation User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)> jmorgan(a)wikimedia.org

10 17

whoVIS editor-editor interaction visualization prototype, API for word provenance
by Flöck, Fabian 08 Apr '15

08 Apr '15

Hi all, we produced a prototype of an editor-editor interaction network visualization for individual articles, based on the word/tokens deleted and reintroduced by editors. It will be presented as a demo at the WWW conference this year [1], but we would love to also get some feedback on it from this list. It's in an early stage and pretty slow when loading up, so have patience when you try it out here: http://km.aifb.kit.edu/sites/whovis/index.html, and be sure to read the "how to" section on the site. Alternatively you can watch the (semi-professional) screencast I did :P, it explains most of the functions. The (disagreement) interactions are based on a extended version of the extraction of authorship we do with wikiwho [2], and the graph drawing is done almost exactly after the nice method proposed by Brandes et al. [3] . The code can be found at github, both for the interaction-extraction extension of wikiwho [4] and the visualization itself [5], which basically produces an json output for feeding the D3 visualization libraries we use. We have yet to generate output for more articles, so far we only show a handful for demonstration purposes. The whole thing also fits nicely (and was supposed to go along) with the IEG proposal that Pine had started on editor interaction [6] . word provenance/authorship API prototype: Also, we have worked a bit on our early prototype for an API for word provenance/authorship: You can get word/token-wise information from which revision what content originated (and thereby which editor originally authored the word) at http://193.175.238.123/wikiwho/wikiwho_api.py?revid=<REV_ID>"&name=<ARTICLENAME>&format=json (<ARTICLENAME> -> name of the article in ns:0, in the english wikipedia, <REV_ID> -> rev_id of that article for which you want the authorship information, format is currently only json) Example: http://193.175.238.123/wikiwho/wikiwho_api.py?revid=649876382&name=Laura_Bu… Output format is currently: {"tokens": [{"token": "<FIRST TOKEN IN THE WIKI MARKUP TEXT>", "author_name": "<NAME OF AUTHOR OF THE TOKEN>", "rev_id": "<REV_ID WHEN TOKEN WAS FIRST ADDED>"}, {"token": "<SECOND TOKEN IN THE WIKI MARKUP TEXT>", "author_name": "<NAME OF AUTHOR OF THE TOKEN>", "rev_id": "<REV_ID WHEN TOKEN WAS FIRST ADDED>"}, {"token": "<THIRD TOKEN … … ], "message": null, "success": "true", "revision": {"article": "<NAME OF REQUESTED ARTICLE>", "time": "<TIMESTAMP OF REQUESTED REV_ID>", "reviid": <REQUESTED REV_ID>, "author": "<AUTHOR OF REQUESTED REV_ID>"}} DISCLAIMER: there are problems with getting/processing the XML for larger articles right now, so don't be surprised if that gives you an error sometimes (i.e. querying "Barack Obama" for instance and similar sizes will *not* succeed for higher revision numbers). Also, we are working on the speed and providing more precomputed articles (right now almost all are computed on request, although we save intermediary results). Still, for most articles it works fine and the output has been tested for accuracy (cf. [2]). At some point in the future, this API will also be able to deliver the interaction data that the visualization is build on. I'm looking forward to your feedback :) Cheers, Fabian [1] http://f-squared.org/wikiwho/demo32.pdf [2] http://f-squared.org/wikiwho/ [3] http://dl.acm.org/citation.cfm?id=1526808 [4] https://github.com/maribelacosta/wikiwho [5] https://github.com/wikiwho/whovis [6] https://meta.wikimedia.org/wiki/Grants:IEG/Editor_Interaction_Data_Extracti… -- Fabian Flöck Research Associate Computational Social Science department @GESIS Unter Sachsenhausen 6-8, 50667 Cologne, Germany Tel: + 49 (0) 221-47694-208 fabian.floeck(a)gesis.org<mailto:fabian.floeck@gesis.org> www.gesis.org www.facebook.com/gesis.org [cid:E09117E0-16C9-4BCB-B1F9-D758CB4CE0D3]

2 3

Re: [Wiki-research-l] Research on Wikidata's content coverage
by Finn Aarup Nielsen 07 Apr '15

07 Apr '15

Hi Oliver Den 07-04-2015 kl. 21:50 skrev Oliver Keyes: > Hey all, > > Is anyone aware of research on the completeness of Wikidata, in terms > of coverage and systemic bias? This seems like the sort of thing Max > Klein might know ;). Papers, blog posts, anything. > "The sum of all human knowledge": A systematic review of scholarly research on the content of Wikipedia Journal of the Association for Information Science and Technology, 66(2):219–245, 2015 February. http://www2.compute.dtu.dk/pubdb/views/edoc_download.php/6784/pdf/imm6784.p… "Comprehensiveness" starts on page 4 Wikipedia research and tools: Review and comments. http://www2.compute.dtu.dk/pubdb/views/edoc_download.php/6012/pdf/imm6012.p… "Coverage" starts on page 9. See also Table 4. For example: "recency was to a certain extent a predictor for coverage" "Astronomy and banksia somewhat overcited" best Finn Årup Nielsen

1 0

rc stream
by Ed Summers 07 Apr '15

07 Apr '15

A few people have mentioned [1] to me that the Recent Changes IRC channels are going to be shut down in favor of the new Socket.IO based stream [2]. Does anyone know if this is anything more than a rumor? //Ed [1] https://github.com/edsu/wikistream/issues/42 [2] https://wikitech.wikimedia.org/wiki/RCStream

5 13

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l April 2015