On Tue, Oct 25, 2011 at 7:11 AM, Rami Al-Rfou' rmyeid@gmail.com wrote:
Hi All,
So with more investigation I discovered that I can get a list of the users depending on their skill at a specific language. For example: http://en.wikipedia.org/w/index.php?title=Category:User_zh-N
It seems that such list is populated from a database. Does anyone know where can I find such database ?
Other questions are regarding the partial dumps of wikipedia. Are the dumps sorted by any field ? How can get all the users pages ? Are they stored in a specific dump ? Or the dumps are stored by page titles or categories ?
http://csv.ozziesport.com/October%209%20-%20Wikipedia%20English%20Data.csvis a file I have related to that. It is about a year old and a result of manual data mining, where I looked for user boxes and which users had transcluded them onto their user space. My file only covers English Wikipedia and doesn't include every user box around. It might be a good place to start. I don't think that userbox information is stored in a separate user table, so I doubt that you would be able to get access to it through that route. :/