- That example you posted isn't a list of all users, just ones who have
added "Babel" template to their userpages [1].
- That data is stored in the database, in the category and categorylinks
tables (possibly elsewhere, I can't remember offhand).
- I don't think the are sorted in anything more than the current row
order in the database (so in the order of creation).
- The user pages will be included in the "All pages" dumps (as opposed to
the "Articles, templates, image descriptions, and primary meta-pages.")
As for your original sets of questions:
- IIRC, no userdata is included in any dumps. This is to protect user
privacy.
- No on all accounts, only thing related in the interface language. If
you click "My Preferences" on any Wiki, what options you see there is what
is stored in the users table (more or less)
- All edits are "modifications" technically. You'd have to
programatically figure out what is _just_ adding content.
- Yes, that "tool" would be called MediaWiki, if you want the most
accurate parser of MediaWiki Markup [2]. There are some
alternative parser's [3] but their output can be of variable quality.
-Jon
[1]
http://meta.wikimedia.org/wiki/Meta:Babel_templates
[2]
http://www.mediawiki.org/wiki/Markup_spec
[3]
http://www.mediawiki.org/wiki/Alternative_parsers
On Mon, Oct 24, 2011 at 13:11, Rami Al-Rfou' <rmyeid(a)gmail.com> wrote:
Hi All,
So with more investigation I discovered that I can get a list of the users
depending on their skill at a specific language. For example:
http://en.wikipedia.org/w/index.php?title=Category:User_zh-N
It seems that such list is populated from a database. Does anyone know
where can I find such database ?
Other questions are regarding the partial dumps of wikipedia. Are the dumps
sorted by any field ? How can get all the users pages ? Are they stored in a
specific dump ? Or the dumps are stored by page titles or categories
?
Regards.
On Tue, Oct 18, 2011 at 15:29, Rami Al-Rfou' <rmyeid(a)gmail.com> wrote:
Hi,
I am planning to study the difference in users edits style and their
spelling errors in English Wikipedia as part of a research project I am
involved in.
So I downloaded some of the wikipedia XML partial dump and convert them to
SQL. My understanding that wikipedia stores every copy of the pages in the
database.
- I can not see the users table! Is the users table stored in a
special partial dump?
- Does the user table contain any properties related to the user
country, preferred wikipeidas, or their skill in different languages ?
- I am interested in the user modifications that contain addition to
the articles and not modification or deletion. I am planning now to diff
between revisions to get such data. Are you aware of any tool or effort that
can help?
- Are you aware of any tools that extract the text from wikipedia
markup language.
Regards.
--
Rami Al-Rfou'
PhD student at Stony Brook University
--
Rami Al-Rfou'
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Jon
[[User:ShakataGaNai]] / KJ6FNQ
http://snowulf.com/
http://ipv6wiki.net/