I am doing a PhD on online civic participation project
(e-participation). Within my research, I have carried out a user
survey, where I asked how many people ever edited/created a page on a
Wiki. Now I would like to compare the results with the overall rate of
wiki editing/creation on country level.
I've found some country-level statistics on Wikipedia Statistics (e.g.
3,000 editors of Wikipedia articles in Italy) but data for UK and
France are not available since Wikipedia provides statistics by
languages, not by countries. I'm thus looking for statistics on UK and
France (but am also interested in alternative ways of measuring wiki
editing/creation in Sweden and Italy).
I would be grateful for any tips!
Sunny regards, Alina
European University Institute
For the last week or so I am getting the following error when trying to
use the http://wikidashboard.appspot.com/ tool: "403: User account
expired. The page you requested is hosted by the Toolserver user
wiki_researcher, whose account has expired. Toolserver user accounts are
automatically expired if the user is inactive for over six months. To
prevent stale pages remaining accessible, we automatically block
requests to expired content. If you think you are receiving this page in
error, or you have a question, please contact the owner of this
document: wiki_researcher [at] toolserver [dot] org. (Please do not
contact Toolserver administrators about this problem, as we cannot fix
it---only the Toolserver account owner may renew their account.)"
I've tried contacting the owner, and send an email to PARC
<http://en.wikipedia.org/wiki/PARC_%28company%29> (it's their project,
per the logo seen at the project page ) through their web form, but so
far - nothing. Can anyone help to contact them?
The tool is useful not only for research (I've used and I am sure so
have others here); it is also one of the tools used by Good Article
reviewers (and linked from
Why we allow toolserver tools used by the community to expire in such a
confusing way is beyond me.
Piotr Konieczny, PhD
It it possible to query for the watchers of a page? It does not seem to be in the API, nor is the "watchers" or "wl_user" table in the Data Base replicas (where I thought MediaWiki stores it. I imagine this is for privacy reasons, correct? If so, how would one gain access?
I have been talking with an "econophysicist" who thinks that we could apply a "contagion" algorithm, to see which edits are "contagious". (I met this econopyhicist at the Berkeley Data Science Faire at which Wikimedia Analytics presented, so it was worth it in the end).
Wikipedian in Residence, OCLC
The December 2013 issue of the Wikimedia Research Newsletter is out:
1 Cohort of cross-language Wikipedia editors analyzed
2 Attempt to use Wikipedia pageviews to predict election results in
Iran, Germany and the UK
3 Integrity of Wikipedia and Wikipedia research
4.1 "How we found a million style and grammar errors in the English Wikipedia"
4.2 "Evaluation of gastroenterology and hepatology articles on Wikipedia"
4.3 Overview of research on FLOSS and Wikipedia
4.4 In battle over Walt Whitman's sexuality, Wikipedia policies "tamed
the mass into producing a good encyclopedia entry"
4.5 Elinor Ostrom's theories applied to Wikipedia
4.6 New dissertation on Wiktionary
••• 9 publications were covered in this issue •••
Thanks to Daniel Mietchen, Maximilian Klein and Piotr Konieczny for
Tilman Bayer and Dario Taraborelli
Wikimedia Research Newsletter
* Follow us on Twitter/Identi.ca: @WikiResearch
* Receive this newsletter by mail:
* Subscribe to the RSS feed:
Senior Operations Analyst (Movement Communications)
IRC (Freenode): HaeB
*** Apologies for multiple postings ***
CALL FOR PAPERS & CALL FOR WORKSHOPS AND TUTORIAL PROPOSALS
ACM Web Science Conference (WebSci'14), June 23-26, 2014
Bloomington, Indiana, USA
websci14.org / @WebSciConf / #WebSci14
Deadline for papers: Feb. 23rd 2014
Deadline for workshop & tutorial proposals: Jan. 17th 2014
Web Science is the emergent science of the people, organizations,
applications, and of policies that shape and are shaped by the Web,
the largest informational artifact constructed by humans in history.
Web Science embraces the study of the Web as a vast universal
information network of people and communities. As such, Web Science
includes the study of social networks whose work, expression, and play
take place on the Web. The social sciences and computational sciences
meet in Web Science and complement one another: Studying human
behavior and social interaction contributes to our understanding of
the Web, while Web data is transforming how social science is
conducted. The Web presents us with a great opportunity as well as an
obligation: If we are to ensure the Web benefits humanity we must do
our best to understand it.
Call for Papers
The Web Science conference is inherently interdisciplinary, as it
attempts to integrate computer and information sciences,
communication, linguistics, sociology, psychology, economics, law,
political science, philosophy, digital humanities, and other
disciplines in pursuit of an understanding of the Web. This
conference is unique in the manner in which it brings these
disciplines together in creative and critical dialogue, and we invite
papers from all the above disciplines, and in particular those that
cross traditional disciplinary boundaries.
Following the success of WebSci'09 in Athens, WebSci'10 in Raleigh,
WebSci'11 in Koblenz, WebSci '12 in Evanston, and WebSci'13 in Paris,
for the 2014 conference we are seeking papers and posters that
describe original research, analysis, and practice in the field of Web
Science, as well as work that discusses novel and thought-provoking
ideas and works-in-progress.
Possible topics for submissions include, but are not limited to, the
* Analysis of human behavior using social media, mobile devices, and
* Methodological challenges of analyzing Web-based
* large-scale social interaction
* Data-mining and network analysis of the Web and human communities on
* Detailed studies of micro-level processes and interactions
* on the Web
* Collective intelligence, collaborative production, and social
* Theories and methods for computational social science on the Web
* Studies of public health and health-related behavior on the Web
* The architecture and philosophy of the Web
* The intersection of design and human interaction on the Web
* Economics and social innovation on the Web
* Governance, democracy, intellectual property, and the commons
* Personal data, trust, and privacy
* Web and social media research ethics
* Studies of Linked Data, the Cloud, and digital eco-systems
* Big data and the study of the Web
* Web access, literacy, and development
* Knowledge, education, and scholarship on and through the Web
* People-driven Web technologies, including crowd-sourcing, open data,
and new interfaces
* Digital humanities
* Arts & culture on the Web or engaging audiences using Web resources
* Web archiving techniques and scholarly uses of Web archives
* New research questions and thought-provoking ideas
A separate Call for Workshop and Tutorial Proposals is on the
conference website at:
Web Science is necessarily a very selective single track conference
with a rigorous review process. To accommodate the distinct traditions
of its many disciplines, we provide three different submission
formats: full papers, short papers, and posters. For all types of
submissions, inclusion in the ACM DL proceedings will be by default,
but not mandatory (opt-out via EasyChair). All accepted research
papers (full and short papers) will be presented during the
single-track conference. All accepted posters will be given a spot in
the single-track lightning talk session, and room to present their
papers during a dedicated poster session.
Full research papers (5 to 10 pages, ACM double column, 20 mins
presentation including Q&A)
Full research papers should present new results and original work that
has not been previously published. Research papers should present
substantial theoretical, empirical, methodological, or policy-oriented
contributions to research and/or practice.
Short research papers (up to 5 pages, ACM double column, 15 mins
presentation including Q&A)
Short research papers should present new results and original work
that has not been previously published. Research papers can present
preliminary theoretical, empirical, methodological, or policy-oriented
contributions to research and/or practice.
Posters (up to 2 pages, ACM double column, lightning talk + poster
Extended abstracts for posters, which should be in English, can be up
to 2 pages.
Full and short paper and poster submissions should be formatted
according to the official ACM SIG proceedings template (WebSci archive
format at http://www.acm.org/sigs/publications/proceedings-templates).
Please submit papers using EasyChair at
Other creative submission formats (flexible formats)
Other types of creative submissions are also encouraged, and the exact
format and style of presentation are open. Examples might include
artistic performances or installations, interactive exhibits,
demonstrations, or other creative formats. For these submissions, the
proposers should make clear both what they propose to do, and any
special requirements they would need to successfully do it (in terms
of space, time, technology, etc.)
The Web Science program committee consists of a program committee that
covers all relevant areas of Web Science. Each submission will be
refereed by three PC members and one short meta review written by a
Co-PC chair, to cover both the research background of each submission
as well as the necessary interdisciplinary aspects.
(Optional) Archival Proceedings in the ACM Digital Library
All accepted papers and posters will by default appear in the Web
Science 2014 Conference Proceedings and can also be made available
through the ACM Digital Library, in the same length and format of the
submission unless indicated otherwise (those wishing not to be indexed
and archived can "opt out" of the proceedings).
Call for Workshops and Tutorial Proposals
The Web Science conference will start with tutorials and workshops
that will promote in-depth training and discussions with the goal of
understanding how people, organizations, applications, and policies
shape and are shaped by the Web. In agreement with the spirit of the
conference, the tutorials and workshops are intended to create
opportunities for interdisciplinary discussion around themes and
methods that are central to the study of the Web. The list of themes
includes, but is not restricted to,
1. Methods for data mining and network research;
2. The study of social dynamics (i.e. political campaigns, censorship)
using Web data;
3. The relationship between technical design and individual behaviour
(i.e. the impact of by-default design on privacy);
4. The future of the Web in an era of increasing mobile applications;
5. The incentives and limits of regulation;
6. Participatory systems and crowdsourcing;
7. The dynamics of information creation (supply) and consumption
(demand) and its relation to real world events.
We will give priority to proposals that approach their topic from the
perspective of various disciplines, spanning the divide between the
social and computer sciences. Tutorials and workshops can be designed
as half or full day events. Workshops can have a mixture of panel
presentations and invited speakers, but presentations should reflect
the diversity of approaches that characterize the multidisciplinary
nature of Web Science.
For more information about chairs, submission, review, deadlines, etc,
please see the full call at
Full & Short Papers:
* 23 February 2014: Submissions of full and short papers
* 13 April 2014: Notification of acceptance for papers
* 11 May 2014: Camera-ready version of papers and posters due
Late Breaking Posters:
* 23 March 2014: Submissions of posters
* 13 April 2014: Notification of acceptance for posters
* 11 May 2014: Camera-ready version of posters due
Workshops and tutorial proposals:
* January 17th 2014: Proposal Submissions
Authors take note: The official publication date is the date the
proceedings are made available in the ACM Digital Library. This date
may be up to two weeks prior to the first day of the conference. The
official publication date affects the deadline for any patent filings
related to published work. (If proceedings are published in the ACM
Digital Library after the conference is over, the official publication
date is the first day of the conference.)
Conference calendar and rough program
* 23 June 2014: workshops, opening reception and keynote
* 24 June 2014: keynote(s), technical program, poster reception
* 25 June 2014: keynote(s), technical program, social event
* 26 June 2014: keynote, technical program, closing
* Fil Menczer, Indiana University
* Jim Hendler, Rensselaer Polytechnic Institute
* Bill Dutton, Oxford Internet Institute, University of Oxford
* Markus Strohmaier, University of Koblenz and GESIS (Computing)
* Ciro Cattuto, ISI Foundation (Physics)
* Eric T. Meyer, Oxford Internet Institute, University of Oxford
* Yong-Yeol Ahn, Indiana University
* Luca Maria Aiello, Yahoo! Research
* William Allen, University of Oxford
* Sitaram Asur, HP Labs
* Alain Barrat, CNRS
* Fabricio Benevenuto, Federal University of Minas Gerais
* Mark Bernstein, Eastgate Systems, Inc
* Paolo Boldi, Universita degli Studi di Milano
* Niels Brugger, Aarhus Universitet
* Licia Capra, University College London
* Carlos Castillo, Qatar Computing Research Institute
* Lu Chen, Wright State University
* Cristobal Cobo, Oxford Internet Institute
* David Crandall, Indiana University
* Pasquale De Meo, VU University, Amsterdam
* David De Roure, Oxford e-Research Centre
* Pnina Fichman, Indiana University
* Alessandro Flammini, Indiana University
* Matteo Gagliolo, Universite libre de Bruxelles
* Laetitia Gauvin, ISI Foundation, Turin
* Daniel Gayo Avello, University of Oviedo
* Scott Golder, Cornell University
* Bruno Goncalves, Aix-Marseille Universite
* Andrew Gordon, University of Southern California
* Scott Hale, Oxford Internet Institute
* Noriko Hara, Indiana University
* Bernhard Haslhofer, University of Vienna
* Andreas Hotho, University of Wuerzburg
* Geert-Jan Houben, TU Delft
* Jeremy Hunsinger, Wilfrid Laurier University
* Ajita John, Avaya Labs
* Robert Jaschke, L3S Research Center
* Haewoon Kwak, Telefonica Research
* Renaud Lambiotte, University of Namur
* Matthieu Latapy, CNRS
* Silvio Lattanzi, Google
* Vili Lehdonvirta, Oxford Internet Institute
* Sune Lehmann, Technical University of Denmark
* Kristina Lerman, University of Southern California
* David Liben-Nowell, Carleton College
* Yu-Ru Lin, University of Pittsburgh
* Huan Liu, Arizona State University
* Jared Lorince, Indiana University
* Mathias Lux, Klagenfurt University
* Massimo Marchiori, University of Padova and UTILABS
* Yutaka Matsuo, University of Tokyo
* Jaimie Murdock, Indiana University
* Mirco Musolesi, University of Birmingham
* Eni Mustafaraj, Wellesley College
* Wolfgang Nejdl, L3S and University of Hannover
* Andre Panisson, ISI Foundation, Turin
* Hanwoo Park, Yeungnam University
* Fernando Pedone, University of Lugano
* Leto Peel, University of Colorado, Boulder
* Orion Penner, IMT Lucca
* Nicola Perra, Northeastern University
* Rob Procter, University of Warwick
* Cornelius Puschmann, Alexander von Humboldt Institute for Internet
* Daniele Quercia, Yahoo! Labs
* Carlos P. Roca, Universitat Rovira i Virgili
* Richard Rogers, University of Amsterdam
* Daniel Romero, Northwestern University
* Matthew Rowe, Lancaster University
* Giancarlo Ruffo, Universita di Torino
* Derek Ruths, McGill University
* Rossano Schifanella, Universita di Torino
* Ralph Schroeder, Oxford Internet Institute
* Kalpana Shankar, University College Dublin
* Xiaolin Shi, Microsoft
* Elena Simperl, University of Southampton
* Philipp Singer, Knowledge Management Institute
* Marc Smith, Connected Action Consulting Group
* Steffen Staab, University of Koblenz-Landau
* Burkhard Stiller, University of Zurich
* Lei Tang, @WalmartLabs
* Loren Terveen, University of Minnesota
* Sebastiano Vigna, Universita degli Studi di Milano
* Claudia Wagner, GESIS-Leibniz Institute for the Social Sciences
* Jillian Wallis, UC Los Angeles
* Stan Wasserman, Indiana University
* Ingmar Weber, Qatar Computing Research Institute
* Matthew Weber, Rutgers University
* Lilian Weng, Indiana University
* Christopher Wienberg, University of Southern California
* Ben Zhao, UC Santa Barbara
* Arkaitz Zubiaga, Dublin Institute of Technology
Wiki Research Junkies,
I am investigating the comparative quality of articles about Cote d'Ivoire and Uganda versus other countries. I wanted to answer the question of what makes high-quality articles? Can anyone point me to any existing research on heuristics of Article Quality? That is, determining an articles quality by the wikitext properties, without human rating? I would also consider using data from the Article Feedback Tools, if there were dumps available for each Article in English, French, and Swahili Wikipedias. This is all the raw data I can seem to find http://toolserver.org/~dartar/aft5/dumps/
The heuristic technique that I currently using is training a naive Bayesian filter based on:
* Per Section.
* Text length in each section
* Infoboxes in each section.
* Filled parameters in each infobox
* Images in each section
* Good Article, Featured Article?
* Then Normalize on Page Views per on population / speakers of native language
Can you also think of any other dimensions or heuristics to programatically rate?
Wikipedian in Residence, OCLC
Of possible interest -- most useful for someone needing library
Affiliate" is an unpaid, offsite, year-long position.
"significantly improving the accuracy and reliability at least 25 Wikipedia
historical topics, preferably articles within a particular historical scope
(for example: modern Russian and Soviet history, U.S. Civil War history, the
history of late imperial China)."
See below -- and contact Amanda French (cc'd) with applications/questions.
---------- Forwarded message ----------
Date: Fri, Dec 20, 2013 at 8:07 PM
Subject: [CODE4LIB] Job: Wikipedia Affiliate at George Mason University
George Mason University
In conjunction with The Wikipedia Library project, the Roy
Rosenzweig Center for History and New Media (RRCHNM) at George Mason
University is seeking applicants for a "Wikipedia Affiliate." This is an
unpaid, year-long, remote research position beginning March 1, 2014 and
February 28, 2015 that entitles the affiliate to full library privileges at
George Mason University, including proxied access to all online materials to
which the GMU Libraries subscribe: more than 400 databases, thousands of
scholarly journals and mainstream periodicals, and hundreds of ebooks. The
position is designed to give research library access to a Wikipedia editor
does not currently have such access or who has only limited access to
scholarly resources: the purpose of the position is to help improve
Wikipedia's reliability and accuracy by providing Wikipedia editors with
access to the best scholarly information resources while providing a model
other universities to do likewise.
The affiliate will be an experienced Wikipedia editor with at least one year
of regular activity contributing to Wikipedia on historical topics in any
field, region, or period. The affiliate will also be a thorough researcher
is committed to improving Wikipedia articles by consulting and citing
reliable, scholarly sources and who is a lucid writer of text for Wikipedia
encyclopedia articles on historical topics. An undergraduate or graduate
degree in History, Art History, or a related discipline is desirable but not
Position Description and Duties
During the affiliate year, the affiliate will conduct scholarly research
the library resources of George Mason University with the aim of
improving the accuracy and reliability at least 25 Wikipedia articles on
historical topics, preferably articles within a particular historical scope
(for example: modern Russian and Soviet history, U.S. Civil War history, the
history of late imperial China). Near the end of the affiliate year, the
affiliate will write a brief report listing the Wikipedia articles he or she
has contributed to and improved over the course of the year, describing how
his or her access to GMU library resources has helped increase the
of Wikipedia on this topic and analyzing whether the affiliate program could
serve as a model for other universities. The affiliate will also be asked to
give a brief talk on the same subject to RRCHNM, either in person or via a
remote technology such as Skype.
To apply, please send the following documents to Dr. Amanda French at
afrench5(a)gmu.edu by January 20, 2014:
1. A standard resume or curriculum vitae that also includes
a link to your Wikipedia profile and
at least three links to Wikipedia articles on historical topics that you
2. A cover letter that includes
a description of your background, including why you contribute to Wikipedia
and what level of historical expertise and interest you have in which
regions, or periods;
a summary of what access you currently have (or don't have) to research
materials such as databases and scholarly journals;
an explanation of why you want to become a Wikipedia Affiliate to RRCHNM;
a brief outline of the historical topic(s) and/or specific Wikipedia
you would focus on during your affiliate year.
All applicants will be notified of the outcome of the search by the end of
February 2014. The affiliate year will begin March 1, 2014.
About the Roy Rosenzweig Center for History and New Media
Since 1994 under the founding direction of Roy Rosenzweig, the Center for
History and New Media (RRCHNM) at George Mason University has used digital
media and computer technology to democratize history--to incorporate
voices, reach diverse audiences, and encourage popular participation in
presenting and preserving the past. The center itself is a democratic,
collaborative space where over fifty scholars, technologists, and
work together to advance the state of the art.
RRCHNM uses digital media and technology to preserve and present history
online, transform scholarship across the humanities, and advance historical
education and understanding. Each year RRCHNM's many project websites
over 20 million visitors, and over a million people rely on its digital
to teach, learn, and conduct research.
George Mason University is a public research university located
14 miles from Washington, D.C., with over 30,000 students. Global education
and research are a fundamental part of the university's mission to serve its
diverse and international student body. RRCHNM is part of the Department of
History and Art History.
About The Wikipedia Library
The Wikipedia Library connects Wikipedia editors with libraries, open access
resources, paywalled databases, and research experts. We are working
towards 5 big goals that create an open hub for conducting research:
Connect editors with their local library and freely accessible resources
Partner to provide free access to paywalled publications, databases,
universities, and libraries
Build relationships among our community of editors, libraries, and
Facilitate research for Wikipedians, helping editors to find and use sources
Promote broader open access in publishing and research
The Wikipedia Affiliate to RRCHNM position is based on the Wikipedia
Scholar idea suggested by Peter Suber at the Harvard Open Access Project.
Brought to you by code4lib jobs: http://jobs.code4lib.org/job/11416/
For the English Wikipedia there is a page where you find the most often
failed searches . We have asked for the code for this software and we
received it. What we want is expand the functionality and use it for any
When we do, we want to differentiate in failed searches that do not exist
in Wikidata either and the failed searches that exist in Wikidata.
As you may know searching Wikidata as well has been enabled on several
Wikipedias among them the Italian and the Polish Wikipedia. This allows us
to provide access to articles in other languages, it allows for finding
images in Commons because of a link to a Commons category, obviously access
to Wikidata and access in a more visual way care of the "Reasonator".
NB this functionality would add value to the en,wp as well because as you
may know 51% of the de.wp articles are linked to the en.wp...
What I am looking for is to have the developer who will modify the software
have access to the data. Magnus Manske is a well known and trusted
developer. He is the one that started MediaWiki. It is for him that I ask
Our theory is that when we add Wikidata items, we will get more quickly at
the tipping point where search is actually useful (remember, there are 280+
Wikipedias). We expect that when we advertise what subjects do not have a
Wikipedia article but do have a Wikidata item, we will stimulate the
writing of articles that prove popular.
In effect we expect to engage in data driven user participation.
If you have any questions or suggestions I am happy to hear them. If
someone can get access to the data for Magnus please let me know.
(cross-posting Sebastiano’s post from the analytics list, this may be of interest to both the wikidata and wiki-research-l communities)
Begin forwarded message:
> From: Sebastiano Vigna <vigna(a)di.unimi.it>
> Subject: [Analytics] Distributing an official graph
> Date: December 9, 2013 at 10:09:31 PM PST
> [Reposted from private discussion after Dario's request]
> My problem is that of exploring the graph structure of Wikipedia
> 1) easily;
> 2) reproducibly;
> 3) in a way that does not depend on parsing artifacts.
> Presently, when people wants to do this they either do their own parsing of the dumps, or they use the SQL data, or they download a dataset like
> which has everything "cooked up".
> My frustration in the last few days was when trying to add the category links. I didn't realize (well, it's not very documented) that bliki extracts all links and render them in HTML *except* for the category links, that are instead accessible programmatically. Once I got there, I was able to make some progress.
> Nonetheless, I think that the graph of Wikipedia connections (hyperlinks and category links) is really a mine of information and it is a pity that a lot of huffing and puffing is necessary to do something as simple as a reverse visit of the category links from "People" to get, actually, all people pages (this is a bit more complicated--there are many false positives, but after a couple of fixes worked quite well).
> Moreover, one has continuously this feeling of walking on eggshells: a small change in bliki, a small change in the XML format and everything might stop working is such a subtle manner that you realize it only after a long time.
> I was wondering if Wikimedia would be interested in distributing in compressed form the Wikipedia graph. That would be the "official" Wikipedia graph--the benefits, in particular for people working on leveraging semantic information from Wikipedia, would be really significant.
> I would (obviously) propose to use our Java framework, WebGraph, which is actually quite standard in distributing large (well, actually much larger) graphs, such as ClueWeb09 http://lemurproject.org/clueweb09/, ClueWeb12 http://lemurproject.org/clueweb12/ and the recent Common Web Crawl http://webdatacommons.org/hyperlinkgraph/index.html. But any format is OK, even a pair of integers per line. The advantage of a binary compressed form is reduced network utilization, instantaneous availability of the information, etc.
> Probably it would be useful to actually distribute several graphs with the same dataset--e.g., the category links, the content link, etc. It is immediate, using WebGraph, to build a union (i.e., a superposition) of any set of such graphs and use it transparently as a single graph.
> In my mind the distributed graph should have a contiguous ID space, say, induced by the lexicographical order of the titles (possibly placing template pages at the start or at the end of the ID space). We should provide graphs, and a bidirectional node<->title map. All such information would use about 300M of space for the current English Wikipedia. People could then associate pages to nodes using the title as a key.
> But this last part is just rambling. :)
> Let me know if you people are interested. We can of course take care of the process of cooking up the information once it is out of the SQL database.
> Analytics mailing list