- That example you posted isn't a list of all users, just ones who have added "Babel" template to their userpages [1]. - That data is stored in the database, in the category and categorylinks tables (possibly elsewhere, I can't remember offhand). - I don't think the are sorted in anything more than the current row order in the database (so in the order of creation). - The user pages will be included in the "All pages" dumps (as opposed to the "Articles, templates, image descriptions, and primary meta-pages.")
As for your original sets of questions:
- IIRC, no userdata is included in any dumps. This is to protect user privacy. - No on all accounts, only thing related in the interface language. If you click "My Preferences" on any Wiki, what options you see there is what is stored in the users table (more or less) - All edits are "modifications" technically. You'd have to programatically figure out what is _just_ adding content. - Yes, that "tool" would be called MediaWiki, if you want the most accurate parser of MediaWiki Markup [2]. There are some alternative parser's [3] but their output can be of variable quality.
-Jon [1] http://meta.wikimedia.org/wiki/Meta:Babel_templates [2] http://www.mediawiki.org/wiki/Markup_spec [3] http://www.mediawiki.org/wiki/Alternative_parsers
On Mon, Oct 24, 2011 at 13:11, Rami Al-Rfou' rmyeid@gmail.com wrote:
Hi All,
So with more investigation I discovered that I can get a list of the users depending on their skill at a specific language. For example: http://en.wikipedia.org/w/index.php?title=Category:User_zh-N
It seems that such list is populated from a database. Does anyone know where can I find such database ?
Other questions are regarding the partial dumps of wikipedia. Are the dumps sorted by any field ? How can get all the users pages ? Are they stored in a specific dump ? Or the dumps are stored by page titles or categories ?
Regards.
On Tue, Oct 18, 2011 at 15:29, Rami Al-Rfou' rmyeid@gmail.com wrote:
Hi,
I am planning to study the difference in users edits style and their spelling errors in English Wikipedia as part of a research project I am involved in.
So I downloaded some of the wikipedia XML partial dump and convert them to SQL. My understanding that wikipedia stores every copy of the pages in the database.
- I can not see the users table! Is the users table stored in a
special partial dump?
- Does the user table contain any properties related to the user
country, preferred wikipeidas, or their skill in different languages ?
- I am interested in the user modifications that contain addition to
the articles and not modification or deletion. I am planning now to diff between revisions to get such data. Are you aware of any tool or effort that can help?
- Are you aware of any tools that extract the text from wikipedia
markup language.
Regards.
-- Rami Al-Rfou' PhD student at Stony Brook University
-- Rami Al-Rfou'
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l