==Reposting with better title to get included in proper thread==
Brian:
> ps: Does anyone know of a script that can strip out wiki syntax? This
> is pertinent. It will also be necessary to leve only paragraphs of
> text in the articles..the below data is noticably skewed in some (but
> not all) of the mesures.
>
Brian, here an inital reponse:
Some perl code from the WikiCounts job, that strips lots of markup code,
used to get cleaner text for word count and article size in chars.
It is not 100% accurate, and not all markup is removed, but these regexps
slow down the whole job big time.
The result is at least far closer to a decent word count than wc would be on
the raw data.
$article =~ s/\'\'+//go ; # strip bold/italic formatting
$article =~ s/\<[^\>]+\>//go ; # strip <...> html
# these are valid UTF-8 chars, but it takes way too long to process,
so
# I combine those in one set
# $article =~ s/[\xc0-\xdf][\x80-\xbf]|
# [\xe0-\xef][\x80-\xbf]{2}|
# [\xf0-\xf7][\x80-\xbf]{3}/x/gxo ;
# this one set selects UTF-8 faster (with 99.9% accuracy I would say)
$article =~ s/[\xc0-\xf7][\x80-\xbf]+/x/gxo ; # count unicode chars
as one char
$article =~ s/\&\w+\;/x/go ; # count htlm chars as one char
$article =~ s/\&\#\d+\;/x/go ; # count htlm chars as one char
$article =~ s/\[\[ [^\:\]]+ \: [^\]]* \]\]//gxoi ; # strip
image/category/interwiki links
# a few internal
links with colon in title will get lost too
$article =~ s/http \: [\w\.\/]+//gxoi ; # strip external links
$article =~ s/\=\=+ [^\=]* \=\=+//gxo ; # strip headers
$article =~ s/\n\**//go ; # strip linebreaks + unordered list tags
(other lists are relatively scarce)
$article =~ s/\s+/ /go ; # remove extra spaces
Actually the code in WikiCountsInput.pl is a bit more complicated as it
tries to find a decent solution for ja/zh/ko
Also numbers are counted as one word (including embedded points and commas).
if ($language eq "ja")
{ $words = int ($unicodes * 0.37) ; }
etc
> pss: I recall from the Wikimania meeting that someone had a script to
> convert a dump to tab-delimited data. That would be useful to me...
> could someone provide a link?
>
http://karma.med.harvard.edu/mailman/private/freelogy-discuss/2006-July/0000
47.html
> Erik: The largest of articles takes approx. 1/10 of a second running
> the binary produced by this C code. Using Inline::C in perl, I could
> fairly easily embed the code (style.c from GNU Diction) into your
> script. It would take and return strings. "Simple!" =) Otherwise I can
> just produce the data in csv etc.. and provide it to you.
>
Questions and caveats:
1/10 secs x 2 million articles early in 2007 is 55 hours. Plus German is 80
hours. Of course you say 1/10 is for largest articles only.
Still it adds up big time when all months are processed, and running
WikiCounts incrementally only adding data for last month has its drawbacks
as explained in out meeting at Wikimania. Is it 1/10 sec for all tests
combined? Could we limit ourselves to the better researched tests or the
tests which are supported in more languages or deemed more sensible anyway ?
I would prefer tests that work in all alphabet based languages. When wiki
syntax is introduced that is not stripped by regexps above or some other
tool it would produce artificial drift in the results over the months.
> This data is very easy to reproduce. I provide a unix command for each
> that assumes you have installed the lynx text browser, which has a
> dump command to strip out html and leave text, and the GNU Diction
> package, which provides style. Style supports English/German.
Strip html is already done. See above.
I could imagine we run these tests on a yet to be determined sample of all
articles to save processing costs.
Tracking 10.000 or 50.000 articles from month to month, if chosen properly
(random ?) should give decent results.
Cheers, Erik Zachte
Brian:
> ps: Does anyone know of a script that can strip out wiki syntax? This
> is pertinent. It will also be necessary to leve only paragraphs of
> text in the articles..the below data is noticably skewed in some (but
> not all) of the mesures.
>
Brian, here an inital reponse:
Some perl code from the WikiCounts job, that strips lots of markup code,
used to get cleaner text for word count and article size in chars.
It is not 100% accurate, and not all markup is removed, but these regexps
slow down the whole job big time.
The result is at least far closer to a decent word count than wc would be on
the raw data.
$article =~ s/\'\'+//go ; # strip bold/italic formatting
$article =~ s/\<[^\>]+\>//go ; # strip <...> html
# these are valid UTF-8 chars, but it takes way too long to process,
so
# I combine those in one set
# $article =~ s/[\xc0-\xdf][\x80-\xbf]|
# [\xe0-\xef][\x80-\xbf]{2}|
# [\xf0-\xf7][\x80-\xbf]{3}/x/gxo ;
# this one set selects UTF-8 faster (with 99.9% accuracy I would say)
$article =~ s/[\xc0-\xf7][\x80-\xbf]+/x/gxo ; # count unicode chars
as one char
$article =~ s/\&\w+\;/x/go ; # count htlm chars as one char
$article =~ s/\&\#\d+\;/x/go ; # count htlm chars as one char
$article =~ s/\[\[ [^\:\]]+ \: [^\]]* \]\]//gxoi ; # strip
image/category/interwiki links
# a few internal
links with colon in title will get lost too
$article =~ s/http \: [\w\.\/]+//gxoi ; # strip external links
$article =~ s/\=\=+ [^\=]* \=\=+//gxo ; # strip headers
$article =~ s/\n\**//go ; # strip linebreaks + unordered list tags
(other lists are relatively scarce)
$article =~ s/\s+/ /go ; # remove extra spaces
Actually the code in WikiCountsInput.pl is a bit more complicated as it
tries to find a decent solution for ja/zh/ko
Also numbers are counted as one word (including embedded points and commas).
if ($language eq "ja")
{ $words = int ($unicodes * 0.37) ; }
etc
> pss: I recall from the Wikimania meeting that someone had a script to
> convert a dump to tab-delimited data. That would be useful to me...
> could someone provide a link?
>
http://karma.med.harvard.edu/mailman/private/freelogy-discuss/2006-July/0000
47.html
> Erik: The largest of articles takes approx. 1/10 of a second running
> the binary produced by this C code. Using Inline::C in perl, I could
> fairly easily embed the code (style.c from GNU Diction) into your
> script. It would take and return strings. "Simple!" =) Otherwise I can
> just produce the data in csv etc.. and provide it to you.
>
Questions and caveats:
1/10 secs x 2 million articles early in 2007 is 55 hours. Plus German is 80
hours. Of course you say 1/10 is for largest articles only.
Still it adds up big time when all months are processed, and running
WikiCounts incrementally only adding data for last month has its drawbacks
as explained in out meeting at Wikimania. Is it 1/10 sec for all tests
combined? Could we limit ourselves to the better researched tests or the
tests which are supported in more languages or deemed more sensible anyway ?
I would prefer tests that work in all alphabet based languages. When wiki
syntax is introduced that is not stripped by regexps above or some other
tool it would produce artificial drift in the results over the months.
> This data is very easy to reproduce. I provide a unix command for each
> that assumes you have installed the lynx text browser, which has a
> dump command to strip out html and leave text, and the GNU Diction
> package, which provides style. Style supports English/German.
Strip html is already done. See above.
I could imagine we run these tests on a yet to be determined sample of all
articles to save processing costs.
Tracking 10.000 or 50.000 articles from month to month, if chosen properly
(random ?) should give decent results.
Cheers, Erik Zachte
Here are a few readability measure examples. Just a side-by-side
comparison of the text from the GWB article from en.wp and simple.wp,
and de.wp. I plan on parsing en, de and simple in full and exploring
how these measures might be correlated with quality.
ps: Does anyone know of a script that can strip out wiki syntax? This
is pertinent. It will also be necessary to leve only paragraphs of
text in the articles..the below data is noticably skewed in some (but
not all) of the mesures.
pss: I recall from the Wikimania meeting that someone had a script to
convert a dump to tab-delimited data. That would be useful to me...
could someone provide a link?
Erik: The largest of articles takes approx. 1/10 of a second running
the binary produced by this C code. Using Inline::C in perl, I could
fairly easily embed the code (style.c from GNU Diction) into your
script. It would take and return strings. "Simple!" =) Otherwise I can
just produce the data in csv etc.. and provide it to you.
See [[Readability]] and Google to get an idea of what these
readability grades mean. Briefly:
All of these explained quite simply: http://www.readability.info/info.shtml
Kincaid: http://en.wikipedia.org/wiki/Flesch-Kincaid_Readability_Test#Flesch-Kincaid…
ARI: http://en.wikipedia.org/wiki/Automated_Readability_Index
Coleman-Liau: http://en.wikipedia.org/wiki/Coleman-Liau_Index
Flesh Index: http://en.wikipedia.org/wiki/Flesch-Kincaid_Readability_Test#Flesch_Reading…
Fog Index: http://en.wikipedia.org/wiki/Gunning-Fog_Index
Lix: http://www.readability.info/info.shtml
SMOG-Grading: http://en.wikipedia.org/wiki/SMOG_Index
This data is very easy to reproduce. I provide a unix command for each
that assumes you have installed the lynx text browser, which has a
dump command to strip out html and leave text, and the GNU Diction
package, which provides style. Style supports English/German.
----------------------------------------------------------------
[[George W. Bush]] on en.wp:
lynx -dump http://en.wikipedia.org/wiki/"George W. Bush" | style
YMMV: I removed all the hyperlinks in this article before running style
----------------------------------------------------------------
readability grades:
Kincaid: 11.7
ARI: 13.5
Coleman-Liau: 12.8
Flesch Index: 54.0
Fog Index: 15.3
Lix: 51.3 = school year 10
SMOG-Grading: 13.1
sentence info:
60081 characters
12376 words, average length 4.85 characters = 1.52 syllables
513 sentences, average length 24.1 words
58% (299) short sentences (at most 19 words)
18% (97) long sentences (at least 34 words)
65 paragraphs, average length 7.9 sentences
0% (3) questions
22% (114) passive sentences
longest sent 294 wds at sent 507; shortest sent 1 wds at sent 5
word usage:
verb types:
to be (155) auxiliary (49)
types as % of total:
conjunctions 4% (544) pronouns 3% (336) prepositions 11% (1311)
nominalizations 3% (311)
sentence beginnings:
pronoun (47) interrogative pronoun (3) article (40)
subordinating conjunction (23) conjunction (5) preposition (40)
----------------------------------------------------------------
[[George W. Bush]] on simple.wp:
lynx -dump http://simple.wikipedia.org/wiki/"George W. Bush" | style
----------------------------------------------------------------
readability grades:
Kincaid: 3.3
ARI: 0.7
Coleman-Liau: 6.0
Flesch Index: 88.6
Fog Index: 6.5
Lix: 23.6 = below school year 5
SMOG-Grading: 7.4
sentence info:
8659 characters
2344 words, average length 3.69 characters = 1.28 syllables
248 sentences, average length 9.5 words
65% (163) short sentences (at most 4 words)
10% (26) long sentences (at least 19 words)
14 paragraphs, average length 17.7 sentences
0% (0) questions
10% (27) passive sentences
longest sent 253 wds at sent 39; shortest sent 1 wds at sent 4
word usage:
verb types:
to be (40) auxiliary (1)
types as % of total:
conjunctions 1% (24) pronouns 1% (33) prepositions 4% (95)
nominalizations 1% (24)
sentence beginnings:
pronoun (10) interrogative pronoun (0) article (3)
subordinating conjunction (3) conjunction (1) preposition (2)
----------------------------------------------------------------
[[George W. Bush]] on de.wp:
lynx -dump http://de.wikipedia.org/wiki/"George W. Bush" | style -L de
----------------------------------------------------------------
readability grades:
Kincaid: 8.0
ARI: 6.7
Coleman-Liau: 12.3
Flesch Index: 57.7
Fog Index: 10.8
Lix: 34.4 = school year 5
SMOG-Grading: 5.3
sentence info:
37740 characters
7909 words, average length 4.77 characters = 1.63 syllables
694 sentences, average length 11.4 words
63% (441) short sentences (at most 6 words)
16% (116) long sentences (at least 21 words)
56 paragraphs, average length 12.4 sentences
0% (2) questions
6% (44) passive sentences
longest sent 274 wds at sent 256; shortest sent 1 wds at sent 191
sentence beginnings:
pronoun (14) interrogative pronoun (3) article (37)
----------------------------------------------------------------
Cheers,
Brian Mingus
This mail (including pictures) was sent to attendants of Wikimania 2006 and
some others that recently showed active interest in quantitative research.
Crossposting here. I hope you will find at least something in this mail that
is to your liking.
Wikimania 2006 was, like its predecessor in Frankfurt, a source of
inspiration.
Several official and impromptu meetings were held that were related to
research and quantitative analysis.
On a conference with 6 parallel sessions one has to make difficult choices,
and for me it was impossible to attend several highly interesting research
meetings.
----------------------------------------------------------------------
Wikimedia Research
I am very much looking forward towards a transcript or at least speaker
notes and/or personal observations of several presentations.
Foremost among them James' Research about Wikimedia: A workshop [1]
I also hope that James as Chief Research Officer could give us a sense of
direction and timing: the mission of the Wikimedia Research Network [2] is
lofty, the number of Wikimedians that subscribed large, but the current
status for most activities seems to be 'idle' [3] [4] ? Also is there any
coordination with external research groups, like mentioned on [5] and
elsewhere [6] ?
Would it be useful to divide Wikimedia Research Network activities in
A Quantitative Analysis
B Social Research Collaborations [7]
C Other Activities
and coordinate these separately?
C would still cover 50%+ of the WRN mission statement, like: identify the
needs of the individual Wikimedia projects, make recommendations for
targeted development, guide and motivate outside developers, assist in the
study of new project proposals.
I expect on Wikimania most social science sessions [8] presented relevant
material and either used or added to quantative research. So there is
synergy between A and B.
[1] http://wikimania2006.wikimedia.org/wiki/Proceedings:JF1
[2] http://meta.wikimedia.org/wiki/Wikimedia_Research_Network
[3] http://meta.wikimedia.org/wiki/Category:Research_Team
[4] http://meta.wikimedia.org/wiki/Research/Research_Projects
[5] http://meta.wikimedia.org/wiki/Research
[6] http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Wikidemia
[7] http://meta.wikimedia.org/wiki/Research/Social_Research_Collaborations
[8] http://wikimania2006.wikimedia.org/wiki/Category:Wiki_Social_Science
----------------------------------------------------------------------
Communication
There was no IRC meeting of the Research Team after December 2005. There are
pretty active Wikimedia researchers outside the team though. For me
Wikimania 2006 confirmed that more exchange of ideas would be helpful.
I'm not sure more IRC discussions are a panacea. Personally I prefer
discussion via wiki and mailing list, it is less spontaneous but one can
easier formulate a coherent proposal or comment on it in a thoughtful
manner, and no less important: it is much better to follow for others who
read the discussion later.
Part of the information flow is now on meta, some of it on the research
mailing list [8] (which is largely dormant [9], though recent posts are very
useful). And some of it on the freelogy list [10] and probably elsewhere.
What about making the Wikimedia research list the central forum for all
broad and conceptual discussions and link from there to meta for detailed
discussions? I will post his mail there anyway, of course without the
images.
[8] http://mail.wikipedia.org/pipermail/wiki-research-l/
[9] http://www.infodisiac.com/Wikipedia/ScanMail/Wiki-research-l.html
[10] http://karma.med.harvard.edu/mailman/listinfo/freelogy-discuss
----------------------------------------------------------------------
Visualisation
I personally enjoyed very much session Can Visualization Help? [11]
= IBM researcher Fernanda Viégas [12] talked about the famous Wikipedia
History Flow tool [13], which was recently extended, announced a free
edition and told that Tim Starling had pledged to reinstate the relevant
export function so that we can use the tool on our projects.
= IBM researcher Martin Wattenberg [14] showed his newest toy where one can
see all contributions of one single Wikimedia editor, presented as an
association cloud (titles grouped per namespace and sorted by number of
edits, font size varied per title to express relative number of edits). It
is somewhat scary though, I feel a quantitative improvement - exposing data
that are already online in a much more efficient manner -, can lead to a
qualitative setback - exposing ones character and interests in a way that
was never expected. People may after all regret that they edited under their
real name. Although personally I will happily continue to do so, it is a
matter of responsibility towards the community to at least discuss whether
we should actively promote such a tool. I know I'm partially guilty in this
respect myself with mailing list stats but feel that did not cross the
line.
= Visualization guru Ben Schneiderman [15] made a case for more advanced
data visualisation tools to spice up wikistats. I am a long time admirer of
several of his UI inventions and happy to take up the challenge.
[11] http://wikimania2006.wikimedia.org/wiki/Proceedings:FV1
[12] http://alumni.media.mit.edu/~fviegas/
[13] http://www.research.ibm.com/visual/projects/history_flow/
[14] http://www.bewitched.com/
[15] http://www.cs.umd.edu/~ben/
---------------------------------------------------------
General User Survey
One promising but sleeping WRT project, that I initiated myself, is the
'General User Survey' [21]. A few Wikimania participants interested in
wikistats gathered ad hoc at lunch time on Saturday (others interested in
the project, Cormaggio, Piotrus were at the conference, but not in the
vicinity at that moment). Kevin Gamble, associate director of 75 Land-Grant
Universities, expressed his continued interest and said he might be able to
offer programming support
A project definition plus rationale [21] and a mockup questionnaire form
[22] have been created and discussed for more than a year. I started the
transition towards technical design [23] and with Kevins support and
resources coding might follow later this year. Once we have a proof of
concept in e.g. English and German (at least two languages to show
multilingual aspects) I'm sure more people will start to take notice, and
help to discuss and fine-tune the questionnaire. At a later stage, before
going live with a multilingual golden edition, we will probably have to
discuss matters with the board (Anthere already stated her support) in order
to make this an official survey, hopefully with coverage on the project
pages themselves (banner announcement ?). Mind you, the implementation is
not exactly trivial, lots of issues involved that require critical
discussion, code and coordination. I invite everyone to comment on tech
notes, especially of course Kevin, and hope to learn from him whether coding
this project fits within his budget.
[21] http://meta.wikimedia.org/wiki/General_User_Survey
[22] http://meta.wikimedia.org/wiki/General_User_Survey/Questionnaire
[23]
http://meta.wikimedia.org/wiki/General_User_Survey/Implementation_Issues
--------------------------------------------------------
Quantitative Analysis
Saturday I met Jeremy Tobacman. We had a long and very interesting
discussion, mainly on new initiatives centered around the freelogy servers.
Jeremy proposed to held an impromptu lunch meeting on Sunday and gathered a
room full of people.
[pictures removed]
Several mails have already been written about this, but to a smaller
audience. So here are a few highlights.
Issues that were discussed:
1 Hardware
The two tool servers [32] are very crowded and insufficient for all stats
jobs we might want to run. The tool servers run a mirror of the live
database so well behaved SQL queries are possible. Well behaved meaning they
should no try to emulate the xml dump process where extracting the English
Wikipedia (all revisions) already takes a full week.
Alexander Wait (Sasha) has access to huge hardware resources, enough to
calculate how many parallel universes it takes to find at least one zebra
couple where a black-and-white mother and a white-and-black father have
exactly mirrored patterns and thus produce offspring that is either all
black or all white (mind you, albino's are false positives).
Since in reality Sasha is merely interested in unraveling the secrets of DNA
he has some cpu cycles to spare. Upon request virtual machines can be
catered for. The freelogy-discuss mailing list archives have information
about hardware availability [33]
By the way, Jeremy and Erik Tobacman have a server at The National Bureau of
Economic Research (NBER) for quantitative research on Wikipedia.
Also I am urged by the Communications Subcomittee to spend more of my time
on publishable stats (in time spent TomeRaider offline edition of Wikipedia
easily dominated, but the time for offline browsing is nearly over) and they
want me to have a dedicated server. I would like it to be well utilised, but
of course it should produce timely wikistats in the first place, as that is
what it is offered for. To be discussed.
2 Real time data collection / Performance / Storage
It would be useful to learn when a page is being slashdotted or otherwise in
the news, at the moment of the actual event, in order that vandal patrols
can be timely summoned, and article improvement can commence right away.
Major performance issues need to be addressed.
Do we gather and keep every page hit ? Hardly practicable. Wikimedia visitor
stats were not disabled for no reason. It seems we are getting switches that
can log accesses stochastically (e.g. every 100nth access, plus for a
selected subset of IP addresses all hits to monitor navigation patterns).
There might be a need to store data in aggregated (condensed) form, as
volumes will be huge. At least tapping from switches directly puts no burden
on squids (=web proxies/caches).
Brion will be asked to drop bz2 compression on xml dump job, as it is so
much slower and compresses so much less than 7zip. Brion had to develop a
distributed version of bzip to get it working at all on the 800 Gb enwiki
dump file. Format bz2 is however supported on more platforms, so Brion may
no comply.
Specifically about wikistats: I explained why I always process the full
historic dump instead of doing incremental steps: new functionality in
wikistats means processing it all anyway. Data for older months are not
really static due to frequent deletions and moves. Could I speed up counts
section of wikistats by splitting job over several servers ? I'll have to
look into it.
3 Data publishing
We should be careful not to publish very granular data for outside
inspection. It is a well known fact that China wants complete control over
its citizens. Less known is that they have the latest technology (mainly
bought in the US) and lots of it, and about 30.000 IT professionals
(estimate by Reporters without Borders/Reporters sans Frontières) working on
concealment of internet resources, redirection of internet requests and
spying on internet usage patterns in general. They would love to see our raw
access logs. Cathy will you attend the Chinese Wikimania? [34] If you happen
to hear about these things, I hope you will blog about it. See also [35]
See also well timed scoop [36] about AOL privacy disaster.
4 Measuring quality quantitatively
It may be impossible to define quality, let alone measure it, But it will be
fun to zoom in on it and see how far we can come. Spurred by Jimbo's
excellent Wikimania kick off speech, where he stressed we will need more
attention to quality, I started a project to extend wikistats. Brian offered
lots of ideas and hopefully will prove me wrong in my belief that adding
spelling, grammar and readability assessments is not to be taken too lightly
in a multilingual environment [37] [38]
[31] http://wikimania2006.wikimedia.org/wiki/Proceedings:CM1 mp3 audio
available
[32] http://meta.wikimedia.org/wiki/Toolserver
[33]
http://karma.med.harvard.edu/mailman/private/freelogy-discuss/2006-May/00000
2.html (registration needed:
http://karma.med.harvard.edu/mailman/listinfo/freelogy-discuss)
[34]
http://en.wikinews.org/wiki/Chinese_Wikimania_2006_to_be_held_in_Hong_Kong
[35] http://wikimania2006.wikimedia.org/wiki/User:Roadrunner (I wonder if he
is the person who gave a smashing full hour speech on this at 20c3 Berlin)
[36]
http://www.siliconbeat.com/entries/2006/08/06/aol_research_exposes_data_weve
_got_a_little_sick_feeling.html
(data were anonimized but some users had searched for their own name several
times and were easily recognized, lots of very embarrassing stuff was
uncovered)
[37] http://meta.wikimedia.org/wiki/Wikistats/Measuring_Article_Quality
(conceptual overview)
[38]
http://meta.wikimedia.org/wiki/Wikistats/Measuring_Article_Quality/Operation
alisation_for_wikistats
---------------------------------------------------------
Ongoing
By the way Angela Beasley and Jakob Voss will give a workshop on Wikipedia
research on WikiSym 2006 [41] [42]
[41] http://ws2006.wikisym.org/space/Workshop%3E%3EWikipedia+Research
[42]
http://meta.wikimedia.org/wiki/Workshop_on_Wikipedia_Research%2C_WikiSym_200
6
Regards, Erik Zachte
Hi,
On August 6, 2006, at lunch on the last day of Wikimania, eight doctoral
students in the humanities and social sciences, from around the world
(Italy, Ireland, Germany, Greece, Taiwan, U.S.--Georgia, Boston, Chicago),
discussed various ways to collaborate.
This web page on Wikimedia is one place we may continue discussing
collaborations:
Social Research Collaborations
http://meta.wikimedia.org/wiki/Research/Social_Research_Collaborations
If you know any graduate students who are actively engaged in social
research of aspects of the free culture movement or of wikimedia projects,
please let them know about this webpage.
Thanks,
Doug
WikiSym 2006 is upon us in two weeks. To get an
idea of what the conference will be like, please view the WikiSym program:
http://www.wikisym.org/ws2006/program.html
You may also enjoy the "How and Why Wikipedia
Works" interview with our keynoter:
http://www.riehle.org/computer-science/research/2006/wikisym-2006-interview…
================
CALL FOR PARTICIPATION
WIKISYM 2006: THE 2006 INTERNATIONAL SYMPOSIUM ON WIKIS
August 21-23, 2006, Odense, Denmark
CO-LOCATED WITH ACM HYPERTEXT 2006
See http://www.wikisym.org/ws2006
Archival - Peer Reviewed - ACM Sponsored
EARLY REGISTRATION DEADLINE APPROACHING: June 19, 2006
GENERAL INFORMATION
This year's Wiki Symposium brings together wiki
researchers and practitioners in the historic and
beautiful city of Odense, Denmark, on August
21-23, 2006. Participants will present, discuss,
and move forward the latest advances in wiki
contents, sociology, and technology. The
symposium program offers invited talks by Angela
Beesley ("How and Why Wikipedia Works"), Doug
Engelbart and Eugene E. Kim ("The Augmented
Wiki"), Mark Bernstein ("Intimate Information")
and Ward Cunningham ("Design Principles of
Wikis"). The research paper track presents and
discusses breaking wiki research, the panels let
you listen to and contribute to topics like
"Wikis in Education" and "The Future of Wikis",
and the workshops let you get active and
contribute to on-going research and practitioner
work with your peers. (Many workshops accept
walk-ins, so it is not too late!) What's more,
for the first time, we will have an on-going
openspace track (to replace BOFs) so you can get
active and involved in an organized fashion on
any wiki topic you like. We believe this is how
to get the most out of your experience at WikiSym 2006!
And, of course, if you can't wait, please join
our conversation on wiki research and practice on
the symposium wiki at http://ws2006.wikisym.org
PROGRAM OVERVIEW
See http://www.wikisym.org/ws2006/program.html
Keynotes and invited talks:
* Angela Beesley: How and Why Wikipedia Works
* Doug Engelbart and Eugene E. Kim: The Augmented Wiki
* Mark Bernstein: Intimate Information
* Ward Cunningham: Design Principles of Wiki
Panels on:
* Wikis in Education
* The Future of Wikis
Research papers and practitioner reports on:
* wiki technology
* wiki sociology and philosophy
* wiki uses, for example, in software, education, and politics
and many more, see http://www.wikisym.org/ws2006/program.html#Papers
Workshops on:
* wikis in education
* wikipedia research
* wiki markup standards
* wikis and the semantic web
And, of course: Demos! We have pre-set demos, but
please feel free to bring your own notebook! We
will provide space for you to demo on-the-spot in
our Monday night demo session, a favorite from WikiSym 2005.
SYMPOSIUM LOGISTICS
Handled through the Hypertext 2006 website:
* Conference registration:
http://hypertext.expositus.com/information.asp?Page=76&menu=13
* Conference hotel:
http://hypertext.expositus.com/information.asp?Page=93&menu=13
* Travel information:
http://hypertext.expositus.com/information.asp?Page=91&menu=13
SYMPOSIUM COMMITTEE
Dirk Riehle, Bayave Software GmbH, Germany (Symposium Chair)
Ward Cunningham, Eclipse Foundation, U.S.A.
Kouichirou Eto, AIST, Japan (Publicity Co-Chair)
Richard P. Gabriel, Sun Microsystems, U.S.A.
Beat Doebeli Honegger, UAS Northwestern Switzerland (Workshop Chair)
Matthias L. Jugel, Fraunhofer FIRST, Germany (Panel Chair)
Samuel J. Klein, Harvard University, U.S.A.
Helmut Leitner, HLS Software, Austria (Publicity Co-Chair)
James Noble, Victoria University of Wellington, New Zealand (Program Chair)
Sebastien Paquet, Socialtext, U.S.A. (Demonstrations Chair)
Sunir Shah, University of Toronto, Canada (Publicity Co-Chair)
PROGRAM COMMITTEE
James Noble, Victoria University of Wellington, New Zealand (Program Chair)
Ademar Aguiar, Universidade do Porto, Portugal
Robert Biddle, Carleton University, Canada
Amy Bruckman, Georgia Institute of Technology, U.S.A.
Alain Désilet, NRC, CNRC, Canada
Ann Majchrzak, University of Southern California, U.S.A.
Frank Fuchs-Kittowski, Fraunhofer ISST, Germany
Mark Guzdial, Georgia Institute of Technology, U.S.A.
Samuel J. Klein, Harvard University, U.S.A.
Dirk Riehle, Bayave Software GmbH, Germany
Robert Tolksdorf, Freie Universität Berlin, Germany