A new version of the Python parser in WikiXRay, along with improved documentation, can be found here:
Basically, I've developed two flavors: the standard for those people who want an alternative to other tools for processing Wikipedia's dumps (including the text table). The other version is for research purposes, It ignores the text itself and extracts instead useful info on the fly.
Both flavors use extended inserts (you can tune the size and num. of rows) and the --monitor mode calls a db access module to avoid timeout errors.
Further improvements (--skipnamespaces and --inject, this one should be very easy) are on the way.
Sé un Mejor Amante del Cine
¿Quieres saber cómo? ¡Deja que otras personas te ayuden!.
I added Korrigan and Andrea separately because I am not sure that hey
are on the lists. I added MediaWiki list to ping interested
developers. Please, give your comments, suggestions etc. at wiki
research list (wiki-research-l(a)lists.wikimedia.org).
It looks like improving Cite extension is the simplest task from the
list . French Wikipedia already has a working improvement  and
Andrea made another one (including easy to use interface) . There
are, also, two bug reports: one for simple improving Cite  and one
for adding BibTex compatibility . The first MW bug report is very
live, while the last one seems to be dead.
So, my question is: Is it possible to join efforts for making one good
extension which is needed to all WM projects?
 - http://meta.wikimedia.org/wiki/Wikidata
 - http://www.scionline.org/index.php/Main_Page
 - http://bugzilla.wikimedia.org/show_bug.cgi?id=6271
 - http://bugzilla.wikimedia.org/show_bug.cgi?id=8167
As the topic of quality is frequently raised on this list, I would
like to point out some of our papers. Me and a colleague have
researched information quality in virtual communities, especially
assessing completeness and correctness.
While we have not worked on Wikipedia (We tested for example World66
and Wikitravel, wikis for travel information) the methodology we
used can easily be transferred to Wikipedia. We claim that it
captures the user perspective in a rigourous way.
The results of these studies were encouraging: virtual communities -
without monetary compensation - are capable of getting even with
comercially produced information products. An open question remains
about the required minimum activity for a virtual community to create
and maintain information of high quality.
I hope this helps and stipulates a discussion. Please do not hesitate
contacting me for any questions.
Prestipino, M., Aschoff, F.-R., Schwabe, G. How up-to-date are Online
Tourism Communities? An Empirical Evaluation of Commercial and Non-
Commercial Information Quality.
40th Hawaii International Conference on System Sciences (HICSS-40).
Waikoloa, Big Island, Hawaii, January 3-6, 2007
Prestipino, M., Aschoff, R., Schwabe, G.: What’s the use of
guidebooks in the age of collaborative media? Empirical Evaluation of
free and commercial travel information". 19th Bled eConference
eValues June 5 - 7, 2006, Bled, Slovenia
This paper presents ideas on how to improve technology for
information exchange in virtual communities
Prestipino, M., "Supporting Collaborative Information Spaces for
Tourists", Conference Proceedings Mensch und Computer 2004, 2004.
Marco Prestipino, M.Sc.
Institut für Informatik
binzmuehlestrasse 14, ch-8050 zuerich, switzerland
I am the lead author on two wiki research papers my group at the
University of Minnesota is publishing this fall.
The first focuses on Wikipedia:
* Reid Priedhorsky, Jilin Chen, Shyong (Tony) K. Lam, Kathering
Panciera, Loren Terveen, John Riedl. "Creating, Destroying, and
Restoring Value in Wikipedia." To appear in Proc. GROUP 2007. 10 pages.
* Link: http://www.cs.umn.edu/~reid/papers/group282-priedhorsky.pdf
* Abstract: Wikipedia’s brilliance and curse is that any user can edit
any of the encyclopedia entries. We introduce the notion of the impact
of an edit, measured by the number of times the edited version is
viewed. Using several datasets, including recent logs of all article
views, we show that an overwhelming majority of the viewed words were
written by frequent editors and that this majority is increasing.
Similarly, using the same impact measure, we show that the probability
of a typical article view being damaged is small but increasing, and we
present empirically grounded classes of damage. Finally, we make policy
recommendations for Wikipedia and other wikis in light of these findings.
The second is not Wikipedia-focused, and as such not really on topic for
this list, but as I'm already sending a mail, I thought I'd include it.
If you are interested only in research directly related to Wikipedia,
you can stop reading now.
* Reid Priedhorsky, Benjamin Jordan, Loren Terveen. "How a Personalized
Geowiki Can Help Bicyclists Share Information More Effectively." Short
paper. To appear in Proc. WikiSym 2007. 6 pages.
* Link: http://www.cs.umn.edu/~reid/papers/wiki09s-priedhorsky.pdf
* Abstract: The bicycling community is focused around a real-world
activity - navigating a bicycle - which requires planning within a
complex and ever-changing space. While all the knowledge needed to find
good routes exists, it is highly distributed. We show, using the results
of surveys and interviews, that cyclists need a comprehensive,
up-to-date, and personalized information resource. We introduce the
personalized geowiki, a new type of wiki which meets these requirements,
and we formalize the notion of geowiki. Finally, we state some general
prerequisites for wiki contribution and show that they are met by cyclists.
Questions and comments welcome.
Graduate Research Assistant
GroupLens Research, http://www.grouplens.org