Following the suggestion of Jimmy Wales, we would like to announce a new paper
that uses Wikipedia as a very large-scale repository of world knowledge
for information retrieval tasks. To date, Wikipedia was mostly used by human
(readers), and we hope our research opens a promising new direction of
using the knowledge from Wikipedia for tasks that normally require human-level
intelligence. In our ongoing research, we plan to explore using Wikipedia for
additional text processing tasks, such as Web search and word sense
Evgeniy Gabrilovich and Shaul Markovitch (2006).
''Overcoming the Brittleness Bottleneck using Wikipedia:
Enhancing Text Categorization with Encyclopedic Knowledge''.
Proceedings of the 21st National Conference on Artificial Intelligence
(AAAI-06), pp. 1301-1306.
Ph.D. student in Computer Science
Department of Computer Science, Technion - Israel Institute of Technology
Technion City, Haifa 32000, Israel
Email: gabr(a)cs.technion.ac.il WWW: http://www.cs.technion.ac.il/~gabr
 Thanks to superb work by Erik Garrison, we now have an efficient,
C-based parser that extracts header data from WMF xml dumps into csv files
readable by standard statistical software packages.
* Source for this parser will soon be web-available; stay tuned.
* The csv files will also be available online, either from
download.wikimedia.org (if the parser can be run on the WMF servers) or from
a webserver on karma or at NBER (see below).
* If you just can't wait, let us know and we'll offer express service :)
* The csv files consist of these variables with these types:
 We have begun to use these csv files to produce weekly sets of
See last week's work here:
This week we will finish out that set of stats.
Next week's list needs your creative suggestions: Please edit directly!
 NBER has set us up with a pretty good Linux box, wikiq.nber.org, running
Fedora Core 5. We hope to have Xen instances available for researchers
interested in doing statistical analysis on the csv files within two weeks.
 WMF readership data continues to be irretrievably lost. What can we do
to begin saving at least some of it as soon as possible? If we were to save
only articleid for one of every hundred squid requests, and include some
indicator in the file at the end of each day, privacy concerns and
computational burdens would be minimized, and this would still be a great
How can we make this happen?
>In preparation of WikiSym 2006, I conducted an interview on "How and
>Why Wikipedia Works".
>This article presents an interview with Angela Beesley, Elisabeth
>Bauer, and Kizu Naoko. All three are leading Wikipedia practitioners
>in the English, German, and Japanese Wikipedias and related
>projects. The interview focuses on how Wikipedia works and why these
>three practitioners believe it will keep working. The interview was
>conducted via email in preparation of WikiSym 2006, the 2006
>International Symposium on Wikis, with the goal of furthering
>Wikipedia research. Interviewer was Dirk Riehle, the chair of
>WikiSym 2006. An online version of the article provides simplified
>access to URLs.
>The full text of the interview can be found here:
>WikiSym 2006, the 2006 International Symposium on Wikis
>General website: http://www.wikisym.org/ws2006
>This year's Wiki Symposium brings together wiki researchers and
>practitioners in the historic and beautiful city of Odense, Denmark,
>on August 21-23, 2006. Participants will present, discuss, and move
>forward the latest advances in wiki contents, sociology, and
>technology. The symposium program offers invited talks by Angela
>Beesley ("How and Why Wikipedia Works"), Doug Engelbart and Eugene
>E. Kim ("The Augmented Wiki"), Mark Bernstein ("Intimate
>Information") and Ward Cunningham ("Design Principles of Wikis").
>The research paper track presents and discusses breaking wiki
>research, the panels let you listen to and contribute to topics like
>"Wikis in Education" and "The Future of Wikis", and the workshops
>let you get active and contribute to on-going research and
>practitioner work with your peers. (Many workshops accept walk-ins,
>so it is not too late!) What's more, for the first time, we will
>have an on-going open space track (to replace BOFs) so you can get
>active and involved in an organized fashion on any wiki topic you
>like. We believe this is how to get the most out of your experience
>at WikiSym 2006!
>And, of course, if you can't wait, please join our conversation on
>wiki research and practice on the symposium wiki! For the program,
>please see the program information. For an overview of time slot
>allocations, please see the time grid.
>General website: http://www.wikisym.org/ws2006
>Symposium wiki: http://ws2006.wikisym.org
>2006 program: http://www.wikisym.org/ws2006/program.html
>Dirk Riehle, ph: +49 172 184 8755, web: http://www.riehle.org
>Interested in wiki research? Please see http://www.wikisym.org !
>wiki-research mailing list
I have funding to support one Ph.D. student to pursue a doctoral degree
in business administration.
In particular, I would like to work with a student who is interested in
studying the content and the communities of the Wikimedia projects.
I am in the process of developing a research project to examine the
working principles of Wikimedia. This project is an extension to my
earlier work on open source where I develop a community-based model of
knowledge creation based on Linux kernel development (Lee, 2003 in
This fellowship is designed to train/develop academic researchers. The
student is expected to develop original research proposal(s) and conduct
empirical tests. Most students become employed as university professors
Please contact me if you are interested in the opportunity, or if you
know some one whom you would recommend.
Information about the fellowship:
The funding is guaranteed for four years with an option of employment as
a lecturer in the fifth year.
The funding covers tuition and stipend.
Information about me:
Dr. Gwendolyn K. Lee
I am an assistant professor at the University of Florida, School of
Business, Department of Management.
PHD - Business Administration, Univ of California at Berkeley, 2003
MS, BS - Massachusetts Institute of Technology
Evolutionary economics, Innovation, Knowledge creation, Industry
evolution, Convergence of industry boundaries, Emerging technologies
Information about the application procedure:
Application Due Date: September, 2006
Notification of Admission Status: January, 2007
Beginning of Course Work: August, 2007
What I am looking for in an applicant:
(1) Intellectual curiosity and creativity
(2) Academic aptitude and interest in learning
(3) Discipline and rigor
(4) P.S. No business experience is required to get a Ph.D. from a