This is just to announce that the final draft of my PhD. thesis "Wikipedia: A quantitative analysis" is already finished. Only minor appendixes remain, on general background for some statistical methods that I applied.
It will be (hopefully) approved to be presented in just a few days, though bureacracy will delay the "voce" until middle of March (more or less).
It includes the first quantitative analysis comparing the top 10 language versions of Wikipedia, as of Dec. 2007 (to allow fair comparison of EN with other languages). Among other interesting insights, it presents a complete study of the activity of logged authors, articles and talk pages, evolution in time of distributions of key parameters (diff. authors per article, articles per author, revisions per author/article, etc.).
It also offer a more in-depth study of the inequality of contributions by logged authors, and also for articles. Likewise, it presents a complete survival analysis to examine the average lifetime of Wikipedia contributors, focusing on the transitions first contribution --> joining the core --> core membership --> leaving the core --> abandoning the project.
Finally, we already examine some very basic metrics for quality, analyze the commont quantitative patterns of reputated authors and high quality content and try to infer implications of all these findings for the future sustainability of the Wikipedia work flow model in the following years.
If any of you is interested in having a look at the (still draft) manuscript, I accept on-demand access petitions to the repo :).
I'll wait after the public defense and comments from reviewers to make a public summary of our conclusions.
"The People's Web meets NLP:
Collaboratively Constructed Semantic Resources"
Co-located with Joint conference of the 47th Annual Meeting of the
Association for Computational Linguistics and the 4th International
Joint Conference on Natural Language Processing of the Asian
Federation of Natural Language Processing
7th August 2009
In recent years, online resources collaboratively constructed by
users on the Web have considerably influenced the NLP community. In many
works, they have been used as a substitute for conventional semantic
resources and as semantically structured corpora with great success.
While conventional resources such as WordNet are developed by trained
linguists , online semantic resources can now be automatically
extracted from the content collaboratively created by the users .
Thereby, the knowledge acquisition bottlenecks and coverage problems
pertinent to conventional lexical semantic resources can be overcome.
The resource that has gained the greatest popularity in this respect
so far is Wikipedia. However, other resources recently discovered in
NLP, such as folksonomies, the multilingual collaboratively
constructed dictionary Wiktionary, or Q&A sites like WikiAnswers or
Yahoo! Answers are also very promising. Moreover, new wiki-based
platforms such as Citizendium or Knol have recently emerged that
offer features distinct from Wikipedia and are of high potential
in terms of their use in NLP.
The benefits of using Web-based resources come along with new
challenges, such as the interoperability with existing resources and
the quality of the knowledge represented. As collaboratively created
resources lack editorial control, they are typically incomplete. For
the interoperability with conventional resources, the mappings have
to be investigated. The quality of collaboratively constructed
resources is questioned in many cases, and the information extraction
remains a complicated task due to the incompleteness and semi-
structuredness of the content. Therefore, the research community has
begun to develop and provide tools for accessing collaboratively
constructed resources [2,5].
The above listed challenges actually present a chance for NLP
techniques to improve the quality of Web-based semantic resources.
Researchers have therefore proposed techniques for link prediction 
or information extraction  that can be used to guide the "crowds"
to construct resources that are better suited for being used in NLP
 Christiane Fellbaum
WordNet An Electronic Lexical Database.
MIT press, 1998.
 Torsten Zesch, Christof Mueller and Iryna Gurevych
Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary
Proceedings of the Conference on Language Resources and Evaluation
 Rada Mihalcea and Andras Csomai
Wikify!: Linking Documents to Encyclopedic Knowledge.
Proceedings of the Sixteenth ACM Conference on Information and
Knowledge Management, CIKM 2007.
 Daniel S. Weld et al.
Intelligence in Wikipedia.
Twenty-Third Conference on Artificial Intelligence (AAAI), 2008.
 Kotaro Nakayama et al.
Wikipedia Mining - Wikipedia as a Corpus for Knowledge Extraction.
Proceedings of the Annual Wikipedia Conference (Wikimania), 2008.
The workshop will bring together researchers from both worlds: those
using collaboratively created resources in NLP applications such as
information retrieval, named entity recognition, or keyword extraction,
and those using NLP applications for improving the resources or
extracting different types of semantic information from them. Hopefully,
this will turn into a feedback loop, where NLP techniques improved by
collaboratively constructed resources are used to improve the resources
Specific topics include but are not limited to:
* Different types of collaboratively constructed resources, such as
wiki-based platforms, Q&A sites or folksonomies;
* Using collaboratively constructed resources in NLP such as
information retrieval, text categorization, information
* Analyzing the properties of collaboratively constructed resources
related to their use in NLP;
* Interoperability of collaboratively constructed resources with
conventional semantic resources and between themselves;
* Converting unstructured information into structured lexical
semantic information; tools for mining social and collaborative
* Quality issues with respect to collaboratively constructed resources.
We also encourage the submission of short papers describing publicly
available tools for accessing or analyzing collaboratively created
resources. During the breaks, tables can be provided for demonstrations.
Rada Mihalcea, University of North Texas
Full paper submissions should follow the two-column format of ACL-IJCNLP
2009 proceedings without exceeding eight (8) pages of content plus one
extra page for references. Short paper submissions should also follow
the two-column format of ACL-IJCNLP 2009 proceedings, and should not
exceed four (4) pages, including references.
We strongly recommend the use of ACL LaTeX style files or Microsoft
Word Style files tailored for this year's conference, which will be
available on the conference website. All submissions must conform to
the official ACL-IJCNLP 2009 style guidelines available at:
As the reviewing will be blind, the paper must not include the authors'
names and affiliations. Furthermore, self-references that reveal the
author's identity, e.g., "We previously showed (Smith, 1991) ...", must
be avoided. Instead, use citations such as "Smith previously showed
(Smith, 1991) ...". Papers that do not conform to these requirements
will be rejected without review.
All accepted papers will be presented orally and published in the
The deadline for all papers is May 1st, 2009 (GMT-12).
Submission is electronic using paper submission software at:
Paper submission deadline (full and short): May 1, 2009
Notification of acceptance of papers: June 1, 2009
Camera-ready copy of papers due: June 7, 2009
ACL-IJCNLP 2009 Workshop: Aug 7, 2009
Ubiquitous Knowledge Processing Lab
Technical University of Darmstadt, Germany
Delphine Bernhard Technische Universiaet Darmstadt
Paul Buitelaar DFKI Saarbruecken
Razvan Bunescu University of Texas at Austin
Pablo Castells Universidad Autononoma de Madrid
Philipp Cimiano Karlsruhe University
Irene Cramer Dortmund University of Technology
Andras Csomai Google Inc.
Ernesto De Luca University of Magdeburg
Roxana Girju University of Illinois at Urbana-Champaign
Andreas Hotho University of Kassel
Graeme Hirst University of Toronto
Ed Hovy University of Southern California
Jussi Karlgren Swedish Institute of Computer Science
Boris Katz Massachusetts Institute of Technology
Adam Kilgarriff Lexical Computing Ltd
Chin-Yew Lin Microsoft Research
James Martin University of Colorado Boulder
Olena Medelyan University of Waikato
David Milne University of Waikato
Saif Mohammad University of Maryland
Dan Moldovan University of Texas at Dallas
Kotaro Nakayama University of Tokyo
Ani Nenkova University of Pennsylvania
Guenter Neumann DFKI Saarbruecken
Maarten de Rijke University of Amsterdam
Magnus Sahlgren Swedish Institute of Computer Science
Manfred Stede Potsdam University
Benno Stein Bauhaus University Weimar
Tonio Wandmacher University of Osnabrueck
Rene Witte Concordia University Montreal
Hans-Peter Zorn European Media Lab, Heidelberg
Apologies in advanced for the cross-posting. :-)
Please circulate this call among Wikimedia communities, researchers
and other people that may be interested! This call is also online at
== Call for Participation ==
Wikimania is an annual global event devoted to Wikimedia projects
around the globe (including Wikipedia, Wikibooks, Wikisource,
Wikinews, Wiktionary, Wikiversity, Wikiquote, Wikispecies, and
Wikimedia Commons). The conference is a community gathering, giving
the editors and users of Wikimedia projects an opportunity to meet
each other, exchange ideas, report on research and projects, and
collaborate on the future of the projects. The conference is open to
the public, and is a chance for educators, researchers, programmers
and free culture activists who are interested in the Wikimedia
projects to learn more and share ideas about the Wikimedia projects.
This year's conference will be held from '''August 26-28''' in Buenos
Aires, Argentina at '''San Martín Cultural Center'''.
For more information, please visit the official Wikimania 2009 site at
We are accepting submissions for presentations, workshops, panels,
posters, open space discussions, and artistic works related to the
Wikimedia projects or free content topics in general. Please carefully
follow the submission guidelines below.
=== Important dates ===
* '''Submissions will open on:''' March 1
* '''Deadline for submitting workshop, panel, and presentation
submissions:''' April 15
* '''Deadline for submitting posters, open space discussions, and
artistic works:''' April 30
* '''Notification of acceptance of workshops, panels, presentations:''' May 15
* '''Notification of acceptance of posters, discussions, and artistic
works:''' May 31
* '''Conference dates:''' August 26-28
=== Themes and tracks ===
There are two tracks for submission: the '''Casual Track''', for
members of wiki communities and interested observers to share their
own experiences and thoughts and to present new ideas; and the
'''Academic Track''', for research based on the methods of scientific
studies exploring the social, content or technical aspects of
Wikipedia, the other Wikimedia projects, or other massively
collaborative works, as well as open and free content creation and
community dynamics more generally.
Submissions to either track should address one or more of the following themes:
* '''"Wikimedia Communities,"''' including the topics of conflict
resolution and community dynamics; reputation and identity;
multi-lingualism and languages and cultures.
* '''"Free Knowledge,"''' including open access to information; ways
to gather and distribute free knowledge, use of the Wikimedia projects
in education, journalism, research; ways to improve content quality
* '''"Latin American challenges,"''' centering on efforts and
limitations for expanding the reach of Wikimedia projects in Latin
America; promotion of projects in Native American languages; specific
problems of the Spanish and Portuguese-speaking Wikimedia communities.
* '''"Technical infrastructure,"''' including issues related to
MediaWiki development and extensions; Wikimedia's technical
infrastructure; and new ideas for development.
Papers should be of interest to members of the Wikimedia communities,
and fit within one of the themes above.
=== Types of Submissions ===
We are seeking submissions for:
* '''Presentations''' (10–30 minute talks with discussion afterwards)
:* This type of submission is appropriate for presenting substantial
research or community projects
* '''Workshops''' (60–120 minute session with a discussion leader and
more audience involvement)
:* This type of submission is appropriate for sessions designed to
teach a specific subject or explore it in depth
* '''Panels''' (group of 2-5 speakers to discuss aspects of a topic
with audience questions, 45-90 minute sessions)
:* This type of submission is appropriate for discussions on a topic
of wide interest among community members, with several participants
who may be presenting their work. For less formal discussions of
limited interest, consider an open space discussion instead.
* '''Open space discussions''' (informal discussion on a specific
topic; the discussion leader helps moderate the conversation but the
session is open to anyone interested to join in)
:* This type of submission is good for a topic that several
participants want to discuss or brainstorm about in an informal
* '''Posters''' (printed visual displays that can stand on their own,
with no associated presentation)
:* This type of submission is good for presenting research in
progress, or smaller community projects
* '''Artistic works''' (plays, competitions, comedy, visualizations,
displays or other representations of some aspect of the projects)
:* This type of submission is good for showing creativity or
showcasing beautiful work about the projects.
In addition there will be the chance to give lightning talks, which
are 5-minute short presentations. Lightning talk sessions will be
organized on the Wikimania 2009 wiki shortly before the conference
begins, without any need to submit them via the submission system.
These talks are best for those who want to quickly present an idea or
project without giving a formal presentation. These are informal talks
that are open to everyone to participate in.
=== Submission Guidelines ===
Wikimania is organized by volunteers, so please help us minimize
wasted effort by submitting via the submission system and following
these guidelines. All submissions MUST include the following:
# '''Event title:''' an English or Spanish title.
# '''Abstract:''' a short English or Spanish abstract of your event in
50 to 100 words. The abstract will be used for the public schedule.
# '''Themes and track:''' list the track you wish to submit to (Casual
or Academic) and the single theme you think your submission fits in
best (Wikimedia Communities, Free Knowledge, Latin American
challenges, Technical infrastructure). Note that posters and artistic
works have their own track in the submission system.
# '''Information about the speaker:''' full name, email, and a short biography.
# '''Submission file:''' A plain text, PDF or OpenDocument file, in
English or Spanish, containing:
#* '''A long description of the submission''', in English or Spanish
that can be used for reviewing, not to exceed 1000 words. Please give
an overview of the areas to be covered or taught. State clearly the
relevance to the Wikimedia projects and whether submission concerns a
specific wiki project. You can also include links, Include graphics an
diagrams if they do not exceed one page.
#* '''Event type:''' please state if the event is a presentation,
workshop, panel, open space discussion, poster, or artistic work; if a
presentation or panel, whether the presentation is expected to be a
#* '''For panel submissions only:''' name of a suggested moderator and
short biographies of each suggested panelist
#* '''Language:''' list the language you plan to present in. The
conference will be bilingual in English and Spanish.
#* '''Special requirements:''' list any special requirements,
including any equipment.
In the "Comments for conference director" field you should tell us
whether you will attend to Wikimania (a) surely, (b) probably, (c)
only if your submission is accepted, or (d) only if we provide travel
and/or accommodation. You can also add yourself to the public list of
attendees at the Wikimania 2009 wiki:
Please note that all submissions must be dual licensed under the GNU
Free Documentation License version 1.2 or later ''and'' the Creative
Commons Attribution-Share Alike 3.0 3.0! By submitting for Wikimania
2009 you agree to this condition.
Once you are sure you have included all of the required information,
please send your submission before the respective deadline through our
If you have further questions, email wikimania-program(a)wikimedia.org
(in English or Spanish).
I am writing a report on science in Wikipedia and other projects of
Wikimedia (especially Wikibooks and Wikiversity). A question which
interests me in particular is: who is actually contributing to the
articles? With "who" I am asking especially for the educational level of
the contributors, but also other aspects such as gender and age might be
of interest, of course.
The context of this question is that I am wondering if there are maybe
more "full-time experts" taking part in Wikipedia as one might guess. So
far I found examples for both: Undergraduates taking care of major parts
of scientific fields at Wikipedia (see e.g. Keim 2007) and active
full-time scientists contributing to the encyclopaedia (see e.g. Huss et
al. 2008; Butler 2008).
I would appreciate any help. Thank you.
Butler, D., 2008, /Publish in Wikipedia or perish/; Website; Nature News
Huss, J. W., Orozco, C., Goodale, J., Wu, C., Batalov, S., Vickers, T.
J., Valafar, F. und Su, A. I., 2008, A Gene Wiki for Community
Annotation of Gene Function, /PLoS Biology/ /6(7)/, e175
Keim, B., 2007, News feature: WikiMedia, /Nat Med/ /13(3)/, 231-233
Sociology student at the University of Bielefeld, Germany
Research assistant at the Institute of Technology Assessment (ITA)
of the Austrian Academy of Sciences (AAS)
Strohgasse 45, 5
Tel. (Office): +43-(0)1-51581-6597
Project "Interactive Science"
(for more information please see www.oeaw.ac.at/ita/interactive)
CALL FOR PAPERS
3rd Workshop on Social Aspects of the Web (SAW 2009)
in conjunction with
12th International Conference on Business Information Systems (BIS 2009)
April 28, 2009
Deadline extended to: February 17, 2009
In recent years, the Web has moved from a simple one-way communication
channel, extending traditional media, to a complex "peer-to-peer"
communication space with a blurred author/audience distinction and new
ways to create, share, and use knowledge in a social way.
This change of paradigm is currently profoundly transforming most areas
of our life: our interactions with other people, our relationships, ways
of gathering information, ways of developing social norms, opinions,
attitudes and even legal aspects, as well as ways of working and doing
The change also raises a strong need for theoretical, empirical and
applied studies related to how people may interact on the Web, how they
actually do so, and what new possibilities and challenges are emerging
in the social, business and technology dimensions.
Following the two previous events, the goal of the 3rd Workshop is to
bring researchers and practitioners together to explore the issues and
challenges related to social aspects of the Web.
TOPICS OF INTEREST
* People on the social Web
* Individuals on the Web (identity, privacy, incentives, activity
models, trust and reputation, ...)
* Communities on the Web (roles, leadership, social norms and
conflicts, types of communities, ...)
* Collaboration on the Web (content and data development and
maintenance, decision taking ...)
* On-line and off-line life (mixed interaction models, on-line vs.
off-line communities, ... )
* Business activities in the social Web (sales, exchanges,
word-of-mouth, recruiting, marketing, ...)
* Data and content on the social Web
* Social content organization (tagging, classification,
recommendations, collaborative filtering, ...)
* Content dynamics (content flow and evolution, mashups, comments,
collaborative creation, ...)
* Semantic social Web (standards, annotation of social content/data,
ontology learning, ...)
* Data and social network portability (standards, policies,
technologies, licenses, ...)
* Social software and services
* Specific types of social software (social networks, blogs, wikis,
resources sharing, ...)
* Development (architectures, technologies, platforms,
* Adoption (critical mass problem, socio-technical gap, data and
social network migration, ...)
* Alternative user interaction models (games, mobile, mixed reality,
* Social software in the enterprise (knowledge management, CRM,
collaborative software, ...)
* Business models of social services (pricing, cost models, customer
relation, content acquisition, ...)
* Mining the social Web
* Mining user-generated content (opinion, comments, rankings, forums,
* Mining the social graph (collaborative filtering, social network
* Mining activity patterns (access, used features, participation,
* Entity-centric content integration (on people, experts, objects,
companies, locations, ...)
* Social Web mining in business (for marketing, products design,
customer support, ...)
* Long papers: max. 12 pages
* Work-in-progress reports: max. 6 pages
* Demo papers: max. 4 pages
Papers must be submitted in PDF format according to Springer LNBIP
template available from
Submission system is available at
Papers approved for presentation at SAW 2009 will be published in BIS
2009 workshop proceedings, as a volume in Springer's Lecture Notes in
Business Information Processing (LNBIP) series.
All authors of accepted papers as well as other participants will be
asked to read accepted papers abstracts before the workshop (papers will
be available on-line in advance) to facilitate discussion.
Workshop participants will be also invited to take part in the BIS
conference and other BIS workshops.
* February 17, 2009 - submission deadline for papers (extended)
* March 9, 2009 - notification of acceptance/rejection (new date)
* March 22, 2009 - submission of final papers (new date)
* April 28, 2009 - the workshop
* Poznan University of Economics, Department of Information Systems
* Dominik Flejter
* Tomasz Kaczmarek
* Marek Kowalkiewicz
* Krisztian Balog, University of Amsterdam, the Netherlands
* Simone Braun, FZI Karlsruhe, Germany
* John Breslin, DERI, NUI Galway, Ireland
* Tanguy Coenen, Vrije Universiteit Brussel, Belgium
* Sebastian Dietzold, University of Leipzig, Germany
* Davide Eynard, Politecnico di Milano, Italy
* Dominik Flejter, Poznan University of Economics, Poland
* Adam Jatowt, Kyoto University, Japan
* Tomasz Kaczmarek, Poznan University of Economics, Poland
* Marek Kowalkiewicz, SAP Research Brisbane, Australia
* Marcin Paprzycki, Polish Academy of Science, Poland
* Willy Picard, Poznan University of Economics, Poland
* Katharina Siorpaes, STI, University of Innsbruck, Austria
* Jie Tang, Tshingua University, China
* Celine van Damme, Vrije Universiteit Brussel, Belgium
* Valentin Zacharias, FZI Karlsruhe, Germany
Dominik Flejter < http://dominik.flejter.net/ >
Poznan University of Economics
Department of Information Systems < http://www.kie.ae.poznan.pl/ >
SAW 2009 Co-chair