---------- Forwarded message ----------
From: Lars Aronsson <lars(a)aronsson.se>
Date: 06-Sep-2007 21:32
Subject: [Wikitech-l] Statistics on templates and references
To: wikitech-l(a)lists.wikimedia.org
A year ago, I wrote a little script for extracting template calls
from the XML database dump. The idea is that many templates are
infoboxes that provide structured information, such as the
population density of a country or bibliographic information in
book citations. The script is now updated to also extract ISBNs
and <ref> tags, as if these had been templates.
http://meta.wikimedia.org/wiki/User:LA2/Extraktor
I downloaded the reasonably small Wikipedia dumps for the
Scandinavian and Baltic languages and compiled some statistics,
such as the 50 most used templates, the 20 most cited ISBNs and
the 15 most common things to find inside <ref> tags.
http://meta.wikimedia.org/wiki/User:LA2/Extraktor_stats_200709
Of these languages, Swedish is the biggest (the uncompressed
database dump is 600 MB) followed by Finnish (481 MB) and
Norwegian (415 MB). But Finnish is far ahead in the use of
references and templates. One way to describe this degree of
structure is the size of my script's output compared to its input:
Language Dump size Extraktor output
----------------- --------- ----------------
lt = Lithuanian 152 MB 18.4 % or 28 MB
no = Norwegian 415 MB 16.9 %
nn = Nynorsk 85 MB 15.3 %
fi = Finnish 481 MB 14.1 %
is = Icelandic 66 MB 12.7 %
se = Sami 5.1 MB 10.8 %
da = Danish 209 MB 10.5 %
sv = Swedish 600 MB 10.2 %
fo = Faroese 7.8 MB 8.9 %
et = Estonian 116 MB 8.3 %
lv = Latvian 45 MB 8.2 %
fiu-vro = Võro 3.5 MB 6.4 %
I can't fully explain why the Lithuanian WP ranks so high.
Perhaps there is an opening <ref> that doesn't close, causing many
bytes to be included? If so, my script could help to find and
hunt down such errors. (I also tried the Yiddish Wikipedia and
got an even higher ranking, but I can't understand anything of
that language, so I'm totally clueless.)
And the ranking doesn't quite capture the fact that the Finnish
Wikipedia contains 59365 <ref> tags and 15108 ISBNs, while Swedish
has 28956 and 10742, respectively, and the Norwegian 19078 and
9060. The main difference seems to be the "good" examples above
12% and the laggards below 12%. Swedish and Danish should learn
from Norwegian and Finnish.
My conclusions are not final. The message is that the script
exists, and you are all free to help in digging out interesting
information.
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
They've just been waiting in a mountain for the right moment:
http://modernthings.org/
I'm writing a paper on cyberstalking and harassment, which I hope to
hand to the Foundation with a view to educating people about the
extent of the problem on Wikimedia.
I'd like to include some concrete examples of the cyberstalking or
offline stalking of users as a result of their participation in any of
the Wikimedia projects, and particularly where the target was picked
on because they were an administrator.
If you've been a target of this yourself, or if you know someone who
has, I'd appreciate hearing from you at slimvirgin at gmail dot com.
All replies will be received in strictest confidence. The target's
name and story will not be included in the final document without
consent, and all identifying details will be changed on request.
What I'm most interested in is how the cyberstalking or harassment
made you feel, and what happened when you tried to find support. I'd
like to hear whether it frightened you or made you anxious; whether it
affected your sleep, your appetite, or your health in any other way;
and whether you considered ending your association with the project
you were involved in, or did end it.
Many thanks,
Sarah
http://en.wikipedia.org/wiki/User:SlimVirgin
Hi all,
after quite some work into improving the DBpedia information
extraction framework, we have released a new version of the DBpedia
dataset today.
DBpedia is a community effort to extract structured information from
Wikipedia and to make this information available on the Web. DBpedia
allows you to ask sophisticated queries against Wikipedia and to link
other datasets on the Web to Wikipedia data.
The DBpedia dataset describes 1,950,000 "things", including at least
80,000 persons, 70,000 places, 35,000 music albums, 12,000 films. It
contains 657,000 links to images, 1,600,000 links to relevant external
web pages and 440,000 external links into other RDF datasets.
Altogether, the DBpedia dataset consists of around 103 million RDF
triples.
The Dataset has been extracted from the July 2007 Wikipedia dumps of
English, German, French, Spanish, Italian, Portuguese, Polish,
Swedish, Dutch, Japanese, Chinese, Russian, Finnish and Norwegian
versions of Wikipedia. It contains descriptions in all these
languages.
Compared to the last version, we did the following:
1. Improved the Data Quality
We increased the quality of the data, be improving the DBpedia
information extraction algorithms. So if you have decided that the old
version of the dataset was too dirty for your application, please look
again, you will be surprised :-)
2. Third Classification Schema Added
We have added a third classification schema to the dataset. Beside of
the Wikipedia categorization and the YAGO classification, concepts are
now also classified by associating them to WordNet synsets.
3. Geo-Coordinates
The dataset contains geo-coordinates for for geographic locations.
Geo-coordinates are expressed using the W3C Basic Geo Vocabulary. This
enables location-based SPARQL queries.
4. RDF Links to other Open Datasets
We interlinked DBpedia with further open datasets and ontologies. The
dataset now contains 440 000 external RDF links into the Geonames,
Musicbrainz, WordNet, World Factbook, EuroStat, Book Mashup, DBLP
Bibliography and Project Gutenberg datasets. Altogether, the network
of interlinked datasources around DBpedia currently amounts to around
2 billion RDF triples which are accessible as Linked Data on the Web.
The DBpedia dataset is licensed under the terms GNU Free Documentation
License. The dataset can be accessed online via a SPARQL endpoint and
as Linked Data. It can also be downloaded in the form of RDF dumps.
Please refer to the DBpedia webpage for more information about the
dataset and its use cases:
http://dbpedia.org/
Many thanks for their excellent work to:
1. Georgi Kobilarov (Freie Universität Berlin) who redesigned and
improved the extraction framework and implemented many of the
interlinking algorithms.
2. Piet Hensel (Freie Universität Berlin) who improved the infobox
extraction code, wrote the unit test suite.
3. Richard Cyganiak (Freie Universität Berlin) for his advice on
redesigning the architecture of the extraction framework and for
helping to solve many annoying Unicode and URI problems.
4. Zdravko Tashev (OpenLink Software) for his patience to try several
times to import buggy versions of the dataset into Virtuoso.
5. OpenLink Software altogether for providing the server that hosts
the DBpedia SPARQL endpoint.
6. Sören Auer, Jens Lehmann and Jörg Schüppel (Universität Leipzig)
for the original version of the infobox extraction code.
7. Tom Heath and Peter Coetzee (Open University) for the RDFS version
of the YAGO class hirarchy.
8. Fabian M. Suchanek, Gjergji Kasneci (Max-Plank-Institut
Saarbrücken) for allowing us to integrate the YAGO classification.
9. Christian Becker (Freie Universität Berlin) for writing the
geo-coordinates and the homepage extractor.
10. Ivan Herman, Tim Berners-Lee, Rich Knopman and many others for
their bug reports.
Have fun exploring the new dataset :-)
Cheers
Chris
--
Chris Bizer
Freie Universität Berlin
Phone: +49 30 838 54057
Mail: chris(a)bizer.de
Web: www.bizer.de
Hi Alain et al,
not sure if it's helpful/relevant to you, but i'm currently completing my
thesis on examining the community structures within Wikipedia, and how it
fosters a sense of citizenship. Have done extended interviews with WP users
from around the world as part of this....
I can share some of my work with you if interested, just shoot me an email.
Tamsin
http://www.isea2008.org/index.html
Event: 25th July to August 3rd 2008, Singapore
"We welcome contributions from creative practitioners and researchers
from a variety of disciplines and institutional contexts as media arts
benefits from and exemplifies the interdisciplinary linkages between
contemporary art, science, technology and their related philosophies,
pedagogies and institutional practices."
Call for proposals submissions: 15th July - 30th of September 2007
One of the themes is "Wiki Wiki":
http://www.isea2008.org/themes3.html
cheers,
Brianna
--
They've just been waiting in a mountain for the right moment:
http://modernthings.org/
I need a good solid reference to substantiate the following claim:
"Besides leading to high quality content, wikis have been shown to be good tools for fostering the emergence of active communities"
Does anyone know of a good research paper that looks specifically at this kind of impact of wikis?
Thx.
----
Alain Désilets, National Research Council of Canada
Chair, WikiSym 2007
2007 International Symposium on Wikis
Wikis at Work in the World:
Open, Organic, Participatory Media for the 21st Century
http://www.wikisym.org/ws2007/
Yes, I would like a copy of the paper, thx.
> -----Original Message-----
> From: wiki-research-bounces(a)wikisym.org
> [mailto:wiki-research-bounces@wikisym.org] On Behalf Of Derek Hansen
> Sent: August 29, 2007 1:28 PM
> To: Discussion of wiki research and practice
> Cc: Research into Wikimedia content and communities
> Subject: Re: [wiki-research] Wikis as a tool for fostering
> emergence ofcommunities
>
> I will be presenting a paper "Virtual Community Maintenance
> with a Repository" at the ASIST conference in October that
> discusses the ways in which a wiki repository has helped
> strengthen an email-based technical support community. The
> abstract is found at
> http://www.asis.org/Conferences/AM07/papers/72.html
>
> If you would like a copy of the paper let me know.
>
> Derek L. Hansen
> Assistant Professor
> University of Maryland
>
> On 8/29/07, Desilets, Alain <Alain.Desilets(a)nrc-cnrc.gc.ca> wrote:
> > I need a good solid reference to substantiate the following claim:
> >
> > "Besides leading to high quality content, wikis have been
> shown to be good tools for fostering the emergence of active
> communities"
> >
> > Does anyone know of a good research paper that looks
> specifically at this kind of impact of wikis?
> >
> > Thx.
> >
> >
> > ----
> > Alain Désilets, National Research Council of Canada Chair, WikiSym
> > 2007
> >
> > 2007 International Symposium on Wikis
> > Wikis at Work in the World:
> > Open, Organic, Participatory Media for the 21st Century
> >
> > http://www.wikisym.org/ws2007/
> >
> > _______________________________________________
> >
> > wiki-research mailing list, wiki-research(a)wikisym.org
> > http://www.wikisym.org/mailman/listinfo/wiki-research
> >
> > For the wiki-research, wiki-standards, wikisym-announce
> mailing lists, please see:
> > http://www.wikisym.org/cgi-bin/mailman/listinfo
> >
> _______________________________________________
>
> wiki-research mailing list, wiki-research(a)wikisym.org
> http://www.wikisym.org/mailman/listinfo/wiki-research
>
> For the wiki-research, wiki-standards, wikisym-announce
> mailing lists, please see:
> http://www.wikisym.org/cgi-bin/mailman/listinfo
>
http://wiki-riki.wikispaces.com/Research+Papers+and+Reports ->
"Wikibook AERA paper.pdf"
Sajjapanroj, S., Bonk, C. J., Lee, M., & Lin, M.-F. G. (2007, April).
The challenges and successes of Wikibookian experts and Wikibook
novices: Classroom and community perspectives. Paper presented at the
American Educational Research Association, Chicago, IL.
The Challenges and Successes of Wikibookian Experts and Wikibook
Novices: Classroom and Community Collaborative Experiences
Abstract:
The present study explored the creation of Wikibooks in both classroom
(i.e., Wikibook Novices) and general community (i.e., Wikibookian
Experts) contexts. Observations, surveys, and follow-up email
interviews were the primary means of data collection. This study
analyzed various demographic data of Wikibookians as well as
motivational factors involved in Wikibook creation. Other variables
explored included Wikibook ownership, challenges, frustrations,
perceptions of success and completion, and norms for collaboration in
the Wikibook community. The results indicate that Wikibookians were
young males with varying educational backgrounds; fewer than half
without a four year college degree. Wikibookian Experts were more
likely to perceive that a Wikibook could be completed than Wikibook
Novices in a classroom project. And compared to the novices, the
Wikibookians Experts were also more likely to indicate that no one
owns a Wikibook. Still there were similarities across the populations
in this survey. For instance, they both tended to see a Wikibook
environment as informal, exploratory, collaborative, and somewhat
independent, though in varying degrees. They also recognized that
there are multiple roles involved in the completion of a
Wikibook—contributor, author, reader, etc.—as well as multiple owners
or no owner of the final Wikibook product; assuming that there is a
final product. Importantly, they perceive at a Wikibook project as a
way to share knowledge, obtain personal growth, publish their work,
learn new technologies, and make a contribution to society. However,
the Wikibook Novices favored the publishing avenues it provided as
well as the technology experimentation whereas the Wikibook experts
focused on sharing knowledge and looking for personal growth and
enrichment. Many research avenues are noted to follow-up some of
these similarities and differences.
cheers,
Brianna
--
They've just been waiting in a mountain for the right moment:
http://modernthings.org/
Hello,
My Master's Thesis, Wikipedia as Collective Action: Personal Incentives and
Enabling Structures, is now available at:
http://www.msu.edu/~john2429/Wikipedia as Collective Action.pdf
Comments and feedback are welcome and appreciated, as I plan to revise the
paper and submit it for publication. Thanks for reading, if it's of
interest to you.
Benjamin Johnson
Telecommunication, Information Studies and Media
College of Communication Arts and Sciences
Michigan State University
john2429(a)msu.edu
benjamin.k.johnson(a)gmail.com