Hi All,

So with more investigation I discovered that I can get a list of the users depending on their skill at a specific language. For example: http://en.wikipedia.org/w/index.php?title=Category:User_zh-N

It seems that such list is populated from a  database. Does anyone know where can I find such database ?

Other questions are regarding the partial dumps of wikipedia. Are the dumps sorted by any field ? How can get all the users pages ? Are they stored in a specific dump ? Or the dumps are stored by page titles or categories 
?

Regards.

On Tue, Oct 18, 2011 at 15:29, Rami Al-Rfou' <rmyeid@gmail.com> wrote:
Hi,

I am planning to study the difference in users edits style and their spelling errors in English Wikipedia as part of a research project I am involved in.

So I downloaded some of the wikipedia XML partial dump and convert them to SQL. My understanding that wikipedia stores every copy of the pages in the database.

  • I can not see the users table! Is the users table stored in a special partial dump?
  • Does the user table contain any properties related to the user country, preferred wikipeidas, or their skill in different languages ?
  • I am interested in the user modifications that contain addition to the articles and not modification or deletion. I am planning now to diff between revisions to get such data. Are you aware of any tool or effort that can help?
  • Are you aware of any tools that extract the text from wikipedia markup language.
Regards.

--
Rami Al-Rfou'
PhD student at Stony Brook University



--
Rami Al-Rfou'