--- El jue, 11/11/10, Diederik van Liere <dvanliere(a)gmail.com> escribió:
De: Diederik van Liere <dvanliere(a)gmail.com>
Asunto: Re: [Wiki-research-l] Editor Trends Study - Improving the tool
Para: wiki-research-l(a)lists.wikimedia.org
Fecha: jueves, 11 de noviembre, 2010 23:44
Dear Felipe,
We did investigate other tools before deciding to embark on
this new
project, as you rightly point out we should minimize code
overlap.
Pywikipediabot is an editing tool as far as I know and your
tool,
WikixRay, has definitely proven itself. However, I believe
that a
no-sql solution will give better performance than sql
databases and
that has been one of the main reasons to write this tool.
I am not sure if a separate mailing list is required, at
the moment
it's not, but thanks for the suggestion and I have added
the SVN link.
Thanks, Diederik. I'm also curious about testing the performance of MongoDB. I admit
I've never tried this kind of DBs yet.
Will check the SVN.
Best,
F.
Best,
Diederik
To: Research into Wikimedia content and
communities
<wiki-research-l(a)lists.wikimedia.org>
Message-ID: <376712.40857.qm(a)web27504.mail.ukl.yahoo.com>
Content-Type: text/plain; charset="iso-8859-1"
--- El mi?, 10/11/10, Diederik van Liere <dvanliere(a)gmail.com>
escribi?:
De: Diederik van Liere <dvanliere(a)gmail.com>
Asunto: [Wiki-research-l] Editor Trends Study -
Improving the tool
Para: wiki-research-l(a)lists.wikimedia.org
Fecha: mi?rcoles, 10 de noviembre, 2010 00:02
Hi, Diederik,
I'm also glad to see progress in this project. Some
comments inline.
Dear researchers,
Recently, we started the Editor Trends Study
(
http://strategy.wikimedia.org/wiki/Editor_Trends_Study).
The goal of this study is to get a better
understanding of the community
dynamics within the different Wikipedia projects.
Part of this project consists of developing a tool
(
http://strategy.wikimedia.org/wiki/Editor_Trends_Study/Software)
that parses a Wikipedia dump file, extracts the
required information, stores it
in a database and exports it to a CSV file. This
CSV
file can then be used in a
statistical program such as R, Stata or SAS.
Well, I would have expected that the team would have
done some previous search for
open source code already
available, that implements at least some (if not exactly all
or the very same) of the planned functionalities.
Some examples are my own tool, WikiXRay, and
Pywikpediabot (that, AFAIK, now it
also includes a fast
parser of Wikipedia dump files).
For my tool, now I use git for version control and you
can use any of the two
repos available (the official at
libresoft, or the mirror at Gitorious):
available, but I guess they can
help to solve some problems,
or at least help you to speed up the development and to
avoid starting from scratch.
We are looking for some volunteers that would enjoy
testing the tool. You
don't need to be a
software developer (although it helps :)) to help
us;
some patience, a bit of time and
a fairly recent computer is all you need. You
should
be comfortable installing programs,
working with a command-line interface and have basic
Subversion experience.
Python experience is a real bonus!
The testing will focus on getting the tool to run
without any supervision. For
more background information,
have a look at:
don't see the links
to your SVN repo (only [] ).
We are testing the tool with the largest Wikipedia
projects, so if you would like
to replicate
the analysis on your own favorite Wikipedia project or
help improve the quality of
the tool then please contact me
off-list.
I think it should be more effective to have another
public list to which people
specifically interested in this
tool can suscribe (for example, like we have one for XML
dumps exclusively).
This should sensibly reduce the number of duplicated
bug reports, and comments,
since other people can learn
about known issues.
Hope this helps.
Best,
Felipe.
Best,
Diederik
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l