Cross-posting this request to wiki-research-l. Anyone have data on
frequently used section titles in articles (any language), or know of
datasets/publications that examined this?
I'm not aware of any off the top of my head, Amir.
---------- Forwarded message ----------
From: Amir E. Aharoni <amir.aharoni(a)mail.huji.ac.il>
Date: Sat, Jul 11, 2015 at 3:29 AM
Subject: [Wikitech-l] statistics about frequent section titles
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Did anybody ever try to collect statistics about frequent section titles in
For Wikipedia, for example, titles such as "Biography", "Early life",
"Bibliography", "External links", "References", "History", etc., appear in
a lot of articles, and their counterparts appear in a lot of languages.
There are probably similar things in Wikivoyage, Wiktionary and possibly
Did anybody ever try to collect statistics of the most frequent section
titles in each language and project?
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
“We're living in pieces,
I want to live in peace.” – T. Moore
Wikitech-l mailing list
Jonathan T. Morgan
Senior Design Researcher
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
I been working on graphs to visualize the entire edit activity of in wiki
for some time now. I'm documenting all of it at
The graphs can be viewed at
https://cosmiclattes.github.io/wikigraphs/data/wikis.html. Currently only
graphs for 'en' have been put up, I'll add the graphs for the wikis soon.
- The editors are split into groups based on the month in which they
made their first edit.
- The active edit sessions (value or percentage etc) for the groups are
then plotted as stacked bars or as a matrix. I've used the canonical
definition of an active edit session. The value are + or - .1% of the
values on https://stats.wikimedia.org/
- There is a selector on each graph that lets you filter the data in the
graph. On moving the cursor to the left end of the selector you will get a
resize cursor. The selection can then are moved or redrawn.
- In graphs 1,2 the selector filters by percentage.
- In graphs 3,4,5 the selector filters by the age of the cohort.
- Longevity of editors fell drastically starting Jan 06 and has since
stabilized at levels from Jan 07.
Would you to hear what you guys think of the graphs & any ideas you would
have for me.
The next Research showcase will be live-streamed this Wednesday, July 29 at
11.30 PT. The streaming link will be posted on the lists a few minutes
before the showcase starts (sorry, we haven't been able to solve this, yet.
:-() and as usual, you can join the conversation on IRC at #wikimedia
We look forward to seeing you!
*VisualEditor's effect on newly registered users*By *Aaron Halfaker*
It's been nearly two years since we ran an initial study
of VisualEditor's effect on newly registered editors. While most of the
results of this study were positive (e.g. workload on Wikipedians did not
increase), we still saw a significant decrease in the newcomer
productivity. In the meantime, the Editing
<https://www.mediawiki.org/wiki/Editing> team has made substantial
improvements to performance and functionality. In this presentation, I'll
report on the results of a new experiment designed to test the effects of
enabling this improved VisualEditor software for newly registered users by
default. I'll show what we learned from the experiment and discuss some
results have opened larger questions about what, exactly, is difficult
about being a newcomer to English Wikipedia.
*Wikipedia knowledge graph with DeepDive*
By *Juhana Kangaspunta* and
*Thomas Palomares (10-week student project)*
Despite the tremendous amount of information present on Wikipedia, only a
very little amount is structured. Most of the information is embedded in
text and extracting it is a non-trivial challenge. In this project, we try
to populate Wikidata, a structured component of Wikipedia, using DeepDive
tool to extract relations embedded in the text. We finally extracted more
than 140,000 relations with more than 90% average precision. We will
present DeepDive and the data that we use for this project, we explain the
relations we focused on so far and explain the implementation and pipeline,
including our model, features and extractors. Finally, we detail our
results with a thorough precision and recall analysis.
I want to read the text stored in the text tables, but the old_text
field stores it as what seems to be the path to the blob. How can I get the
content of the blob?
Alternately, is there any other way to access all text content (including
deleted content) without requiring global rights to the API?
Thanks Oliver and Aaron. I want to look at the deleted revisions as
described in the project meta page , which is not there in the XML dump.
I know the revisions that I want to get the content for. What would you
Happy to take this off the list if it gets too specific.
> ---------- Forwarded message ----------
> From: Aaron Halfaker <ahalfaker(a)wikimedia.org>
> Date: Wed, Jul 29, 2015 at 4:21 PM
> Subject: Re: [Wiki-research-l] How to read blobs in text table?
> To: Research into Wikimedia content and communities
> That's right. I use the API and the XML dumps if I need text content.
> If you let me know about the type of analysis you are performing, I
> can advise about the best strategies.
> On Wed, Jul 29, 2015 at 6:14 PM, Oliver Keyes <okeyes(a)wikimedia.org>
> > If we're talking Wikimedia Mediawiki instances, yes, the API is your
> > only way forward - for performance reasons the text content is stored
> > in a totally different set of servers that (to my knowledge) even paid
> > researchers don't get to mess around with. Alternately you could take
> > a look at https://dumps.wikimedia.org if slightly outdated information
> > is okay to you.
> > On 29 July 2015 at 18:58, Srijan Kumar <srijankedia(a)gmail.com> wrote:
> > > Hi!
> > >
> > > I want to read the text stored in the text tables, but the old_text
> > > stores it as what seems to be the path to the blob. How can I get the
> > > content of the blob?
> > >
> > > Alternately, is there any other way to access all text content
> > > deleted content) without requiring global rights to the API?
> > >
> > > Thanks!
> > > Srijan
> > >
> > >  https://www.mediawiki.org/wiki/Manual:Text_table
> > >
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > Wiki-research-l(a)lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > --
> > Oliver Keyes
> > Research Analyst
> > Wikimedia Foundation
> > _______________________________________________
> > Wiki-research-l mailing list
> > Wiki-research-l(a)lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> Wiki-research-l mailing list
iConference 2016 | Partnership with Society
Philadelphia, PA, USA
March 20-23, 2016
Conference website: http://ischools.org/the-iconference/
Conference submission site: https://www.conftool.com/iConference2016/
We are now accepting submissions for iConference 2016, our eleventh annual
gathering of scholars, researchers and professionals who share an interest
in the critical information issues of contemporary information society.
iConference 2016 takes place March 20-23, 2016, in historic Philadelphia,
Pennsylvania, USA. This year’s theme of “Partnership with Society” examines
the dynamic, evolving role of information science and today’s iSchool
movement, and the benefits to society. The conference includes
peer-reviewed papers, posters, workshops and sessions for interaction and
engagement, interspersed with multiple opportunities for networking. Early
career and next generation researchers can engage in the Doctoral Student
Colloquium, Early Career Colloquium and Undergraduate Student Showcase
Authors and organizers can now submit materials using our secure
submissions website: https://www.conftool.com/iConference2016/. The
official proceedings will be published in the open access Illinois Digital
Environment for Access to Learning and Scholarship (IDEALS). The deadline
for papers is Sept. 9, with other deadlines thereafter.
The iConference brings together scholars and researchers addressing
critical information issues in contemporary society. The iConference pushes
the boundaries of information studies, explores core concepts and ideas,
and creates new technological and conceptual configurations—all shaping
interdisciplinary discourses. Visit our website for more information,
including sample topics and links to past proceedings:
iConference 2016 is hosted by Drexel University’s College of Computing &
Informatics. Our conference venue, Loews Philadelphia Hotel, is located in
Philadelphia’s bustling center city. The historic 33-storey building is
hailed as America’s first skyscraper, and is in easy walking distance of
the Liberty Bell and Independence Hall, birthplace of the United States
Constitution. Other nearby attractions include the Rodin Museum and the
Philadelphia Art Museum.
The iConference is presented by the iSchools (www.ischools.org), a
worldwide association of Information Schools dedicated to advancing the
information field, and preparing students to meet the information
challenges of the 21st Century. Affiliation with the iSchools is not
required—all information scholars, researchers, and practitioners are
welcome at the iConference. The event is sponsored by Microsoft Research.
* Conference: http://ischools.org/the-iconference/
* Submissions: https://www.conftool.com/iConference2016/
* Past Proceedings:
* Facebook: IConference: https://www.facebook.com/IConference
* Twitter: @iConf | #iconf16
All submissions must be in English using our official template. All work
should be original and not previously published. Complete guidelines can be
found on our Author Instructions page:
We invite papers falling into two categories: completed research, and early
work/preliminary results. Completed research papers should be a maximum of
10 pages, including references; early work/preliminary results papers
should be a maximum of 6 pages, including references. Each paper will be
refereed in a double-blind process. The author(s) of the completed research
paper judged the best of the conference will receive the Lee Dirks Award
for Best Paper and $5,000. More at
Submission deadline: September 9, 2015
Papers Chairs: Yong Ming Kow, City University of Hong Kong; Bonnie Nardi,
University of California, Irvine; Chirag Shah, Rutgers University
We welcome submission of posters presenting new work, preliminary results
and designs, or educational projects. Submitted posters should be a maximum
of 1,500 words, not including references. These posters will undergo a
double-blind review. Poster abstracts will be published in the proceedings.
More at http://ischools.org/the-iconference/program/posters/
Submission deadline: October 5, 2015
Posters Chairs: Elke Greifeneder, Humboldt University; Kalpana Shankar,
University College Dublin
Workshops can be half- or full day, and are intended to foster interactive
discussions focusing on the particular topic within the purview of the
iSchools, namely, the relationships among information, people and
technology. Workshops provide a great opportunity for attendees who share
common interests and want to have intensive discussions. Workshop proposals
should be less than 750 words, and follow the guidelines on our website:
Submission deadline: September 28, 2015
Workshops Chairs: Denise Agosto, Drexel University; Sam Oh, Sungkyunkwan
University; Nicole A. Cooke, University of Illinois
* SESSIONS FOR INTERACTION AND ENGAGEMENT (SIE)
These sessions provide an excellent opportunity to present ideas,
facilitate discussions, and foster knowledge-sharing in unconventional
ways. Formats can include panels, fishbowls, installations, performances,
storytelling, roundtable discussions, wildcard sessions, demos/exhibitions,
and more. All should be highly participatory, informal, engaging, and
pluralistic. SIE proposals should be less than 750 words, and follow the
guidelines on our website:
Submission deadline: October 5, 2015
SIE Chairs: Karen E. Fisher, University of Washington; Steve Sawyer,
OTHER EVENTS SCHEDULED
* DOCTORAL COLLOQUIUM
The Doctoral Colloquium provides doctoral students the opportunity to
present their work to senior faculty and engage with one another in a
setting that is relatively informal but that allows for the fullest of
intellectual exchanges. Students receive feedback on their dissertation,
career paths, and other areas from participating faculty and student peers.
More at http://ischools.org/the-iconference/program/doctoral-colloquium/
Application deadline: September 28, 2015
Doctoral Colloquium Chairs: Greg Leazer, UCLA; Iris Xie, University of
* DOCTORAL DISSERTATION AWARD
Recognizing the outstanding dissertation of the preceding year, this
competition is open to all member iSchools. Each school may submit one
dissertation for consideration. The winner will receive a cash prize of
$2,500, the runner up $1,000; both will be honored at the iConference. More
Submission deadline: October 12, 2015
Dissertation Award Chairs: Michael Seadle, Humboldt University; Shigeo
Sugimoto, University of Tsukuba
* EARLY CAREER COLLOQUIUM
This half-day event is intended for assistant professors, post-docs, or
others in pre-tenure positions and builds on the tradition of highly
successful events at past iConferences. Participants will sign up at
registration. More at
Early Career Colloquium Chairs: Virginia Ortiz-Repiso Jimenez, University
Carlos III-Madrid; Kristin Eschenfelder, University of Wisconsin, Madison;
Eric Myers, University of British Columbia
*UNDERGRADUATE STUDENT SHOWCASE FORUMS
The undergraduate showcase will feature iSchool undergraduate research.
Such examples include senior design, senior projects, STAR (Students
Tackling Advanced Research) Scholars Program students, etc. Details will be
posted to our website as they become available.
More at: http://ischools.org/the-iconference/
If anyone has a copy of Rehurek & Kolkus's "Language Identification on
the Web: Extending the Dictionary Method" from 2009, could they send
it to me?
We’re preparing for the July 2015 research newsletter and looking for contributors. Please take a look at: https://etherpad.wikimedia.org/p/WRN201507 and add your name next to any paper you are interested in covering. We encourage coverage of Submissions from the just ended Wikimania 2015 conference in Mexico City. As usual, short notes and one-paragraph reviews are most welcome.
Highlights from this month:
Models for Understanding Collective Intelligence on Wikipedia
Wikipedia vs. peer-reviewed medical literature for information about the 10 most costly medication conditions
Wikipedia, democracy and local elections in São Paulo: a study of the developing of articles edited during the election campaign in 2012
Detection of text-based advertising and promotion in Wikipedia by deep learning method
Extracting and Visualizing Biographical Events from Wikipedia
Detecting spatial patterns of natural hazards from the Wikipedia knowledge base
Generating Quizzes for History Learning Based on Wikipedia Articles
The influence of network structures of Wikipedia discussion pages on the efficiency of WikiProjects
The Rise and Fall of an Online Project. Is Bureaucracy Killing Efficiency in Open Knowledge Production?
Theories: Wikipedia and the production of knowledge
VEWS: A Wikipedia Vandal Early Warning System
An agent-based model of edit wars in Wikipedia: How and when is consensus reached
Google Trends and Wikipedia Page Views
Hot news detection using Wikipedia
Amplifying the Impact of Open Access: Wikipedia and the Diffusion of Science
#Wikipedia on Twitter: Analyzing Tweets about Wikipedia
Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis
Wikidata World Maps June 2015
If you have any question about the format or process feel free to get in touch off-list.
Masssly, Tilman Bayer and Dario Taraborelli