Pursuant to prior discussions about the need for a research
policy on Wikipedia, WikiProject Research is drafting a
policy regarding the recruitment of Wikipedia users to
participate in studies.
At this time, we have a proposed policy, and an accompanying
group that would facilitate recruitment of subjects in much
the same way that the Bot Approvals Group approves bots.
The policy proposal can be found at:
The Subject Recruitment Approvals Group mentioned in the proposal
is being described at:
Before we move forward with seeking approval from the Wikipedia
community, we would like additional input about the proposal,
and would welcome additional help improving it.
Also, please consider participating in WikiProject Research at:
University of Minnesota
I am thrilled to announce our speaker lineup for this month’s research showcase <https://www.mediawiki.org/wiki/Analytics/Research_and_Data/Showcase#April_2…>.
Jeff Nickerson (Stevens Institute of Technology) will talk about remix and reuse in collaborative communities; Heather Ford (Oxford Internet Institute) will present an overview of the oral citations debate in the English Wikipedia.
The showcase will be recorded and publicly streamed at 11.30 PT on Thursday, April 30 (livestream link will follow). We’ll hold a discussion and take questions from remote attendees via the Wikimedia Research IRC channel (#wikimedia-research <http://webchat.freenode.net/?channels=wikimedia-research> on freenode) as usual.
Looking forward to seeing you there.
Creating, remixing, and planning in open online communities
Paradoxically, users in remixing communities don’t remix very much. But an analysis of one remix community, Thingiverse, shows that those who actively remix end up producing work that is in turn more likely to remixed. What does this suggest about Wikipedia editing? Wikipedia allows more types of contribution, because creating and editing pages are done in a planning context: plans are discussed on particular loci, including project talk pages. Plans on project talk pages lead to both creation and editing; some editors specialize in making article changes and others, who tend to have more experience, focus on planning rather than acting. Contributions can happen at the level of the article and also at a series of meta levels. Some patterns of behavior – with respect to creating versus editing and acting versus planning – are likely to lead to more sustained engagement and to higher quality work. Experiments are proposed to test these conjectures.
Authority, power and culture on Wikipedia: The oral citations debate
In 2011, Wikimedia Foundation Advisory Board member, Achal Prabhala was funded by the WMF to run a project called 'People are knowledge' or the Oral citations project <https://meta.wikimedia.org/wiki/Research:Oral_Citations>. The goal of the project was to respond to the dearth of published material about topics of relevance to communities in the developing world and, although the majority of articles in languages other than English remain intact, the English editions of these articles have had their oral citations removed. I ask why this happened, what the policy implications are for oral citations generally, and what steps can be taken in the future to respond to the problem that this project (and more recent versions of it <https://meta.wikimedia.org/wiki/Research:Indigenous_Knowledge>) set out to solve. This talk comes out of an ethnographic project in which I have interviewed some of the actors involved in the original oral citations project, including the majority of editors of the surr <https://en.wikipedia.org/wiki/surr> article that I trace in a chapter of my PhD <http://www.oii.ox.ac.uk/people/?id=286>.
thank you for pointing out this data set. It¹s what I was looking for.
Von: Physikerwelt <wiki(a)physikerwelt.de>
Datum: Mittwoch, 29. April 2015 17:55
An: Malte Schwarzer <ms(a)mieo.de>, Norman Meuschke <n(a)meuschke.org>
Betreff: Fwd: [Wiki-research-l] Wikilink referral statistics
kannst du bitte auf die Anfrage reagieren?
---------- Forwarded message ----------
From: Aaron Halfaker <aaron.halfaker(a)gmail.com>
Date: Wed, Apr 29, 2015 at 4:54 PM
Subject: Re: [Wiki-research-l] Wikilink referral statistics
To: Research into Wikimedia content and communities
Indeed Andrew. Upon re-reading, I think you're right. Thanks for pointing
to that dataset.
Also, primary credit for that dataset should go to Ellery Wulczyn. :)
Credit where it is due.
> Wulczyn, Ellery; Taraborelli, Dario (2015): Wikipedia Clickstream. figshare.
On Wed, Apr 29, 2015 at 9:47 AM, Andrew Gray <andrew.gray(a)dunelm.org.uk>
> Hi Aaron,
> I may be misreading the request but I think what's being looked at
> here is Wikipedia -> Wikipedia links - so the referring server + the
> referred server are both ours.
> Given that, I *think* this data Dario put out earlier in the year
> would be what's needed - http://dx.doi.org/10.6084/m9.figshare.1305770
> - but with the caveat that it's only enwiki and only for two months.
> It won't identify which link on a page was used (if it appears
> multiple times), but most "see also" links are unique within the page
> and so this shouldn't pose a problem.
> On 29 April 2015 at 14:47, Aaron Halfaker <aaron.halfaker(a)gmail.com> wrote:
>> > Hi Physikerwelt,
>> > I'm not sure how we'd collect that data. You'd need to gather it from
>> > whatever server the user's browser made a request to after clicking one of
>> > those links. That's how referrers work. Also, clicks to non-https links
>> > from https Wikipedia will not contain referrers. See
>> > https://meta.wikimedia.org/wiki/Research:Wikimedia_referrer_policy for a
>> > proposal to update our policy.
>> > -Aaron
>> > On Wed, Apr 29, 2015 at 6:33 AM, Physikerwelt <wiki(a)physikerwelt.de> wrote:
>>> >> Hi,
>>> >> is there information about referrals within enwiki?
>>> >> We are investigating the quality of the "See also" links and are looking
>>> >> for estimates how often the see also links were used.
>>> >> If so can we access the information from eqiad.wmflabs?
>>> >> Best
>>> >> Physikerwelt
>>> >> _______________________________________________
>>> >> Wiki-research-l mailing list
>>> >> Wiki-research-l(a)lists.wikimedia.org
>>> >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>> > _______________________________________________
>> > Wiki-research-l mailing list
>> > Wiki-research-l(a)lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> - Andrew Gray
> Wiki-research-l mailing list
Wiki-research-l mailing list
Further to a short paper I recently wrote with some colleagues (
http://ssrn.com/abstract=2592528), I started working on Vis-à-Wik, a simple
online visual analytics tool for Wikipedia analysis.
Vis-à-Wik retrieves data from the MediaWiki Wikipedia API, and uses D3js to
visualize the links between Wikipedia articles as a network diagram. This
simple tool allows users to search for Wikipedia articles in a selected
language edition, and visualize the articles selected by the user as a set
of nodes, along with the related articles in a second language edition, and
the links and language-links between them.
The aim is facilitate mid-scale analysis of Wikipedia content — that is
somewhere between a single-page analysis (that editors do routinely) and
large-scale analyses (e.g., academic research projects).
Vis-à-Wik is available for testing at sdesabbata.github.io/vis-a-wik, while
the code is available on GitHub (github.com/sdesabbata/vis-a-wik) under the
GPLv3 licence. This is not a collaborative visualization tool, and
currently implements only one of the visualization methods, but it is a
first step (hopefully) of a larger endeavour.
This is just an alpha release, and I really look forward to any comment,
feedback, or suggestion! :)
Dr. Stefano De Sabbata
Oxford Internet Institute
University of Oxford
Junior Research Fellow
University of Oxford
Information Geographies <http://geography.oii.ox.ac.uk/>
Connectivity, Inclusion, and Inequality <http://cii.oii.ox.ac.uk/>
We've just released a count of pageviews to the English-language
Wikipedia from 2015-03-16T00:00:00 to 2015-04-25T15:59:59, grouped by
timestamp (down to a one-second resolution level) and site (mobile or
The smallest number of events in a group is 645; because of this, we
are confident there should not be privacy implications of releasing
this data. We checked with legal first ;p. If you're interested in
getting your mitts on it, you can find it at DataHub
I just wanted to let everyone know that a new Crowd Science track is taking place at HICSS 2016.
Given that I've personally benefited tremendously over the years from the conversations that take place here, and that we consider Wikipedia and Peer Production research to be a core pillar of Crowd Science, we would very much welcome your new and emerging Crowd-research to our track.
See here for specific CFP details:- https://www.dropbox.com/s/9dm8adspbp1jut7/HICSS%20Crowdsourcing%20Minitrack…
Also, if you're interested in further endeavours in this respect, please feel free to participate at this doodle poll that we've set-up:- http://doodle.com/6u5zpyiqdkhhkmun#table
Don't hesitate to get in touch should you have any questions, and we'll see you in beautiful Kauai in the new year!
I wanted to share results from a fun project that explored *where*
information in Wikipedia comes from. We studied the geography of editors,
the spatial articles they edit, and the sources they cite. Overall, we
found evidence for both language and socioeconomic barriers. This was work
with a ridiculously amazing and fun group of people: Heather Ford, Dave
Musicant, Oliver Keyes, Mark Graham, and Brent Hecht.
You can find an interactive version of our CHI paper at:
If you're interested in the sourcing datasets, drop me a line. I'm happy to
Feedback and questions are welcome!
Shilad W. Sen
Mathematics, Statistics, and Computer Science Dept.
Hello, fellow researchers,
I'm looking to see if any research has been done recently around external
links in Wikipedia, and more specifically external links contained in
references. My main goal is to identify the most cited domains, ideally with
Labs tools or similar that could help in this regard are also welcome. I
haven't been able to find much so far, and before I dive into the database
myself, I'd like to check I haven't missed anything obvious :)
The context for this work is "citoid" , the new Zotero-based citation tool
used by VisualEditor to automatically fetch and format references using only
their URL. Zotero works well for many scientific journals, but is weaker with
regard to newspapers and non-English sources.
By looking into the most used URLs/domains in citations, I'm hoping to
identify those that are not yet properly supported by Zotero and, by
extension, citoid. This would then help developers focus their efforts on
adding support for URLs to the most high-value websites.
Any pointers you might have are very welcome :)