for the latest dump. The
user table is private, but it doesn't seem like you need that. You're
looking for what people have publicly posted on their own user pages,
which MediaWiki understands as a page in a specific namespace, barely
connected to a user at all from a database standpoint.
So if you're looking for category members (like the babel template you
linked), it can be found in enwiki-DATE-categorylinks.sql.gz. Import
that into mySQL -- it is about 10gb uncompressed, with the indexes
making up another 25gb. However, that'll just give you the page_ids
of the user page containing the template. You also have to download
and import the page table (also public and archived) and join to it in
mySQL if you want to get the usernames of everyone who has put
themselves in those categories. Page is much more manageable --
uncompressed, it is about 3 gb of data and 2.5 gb of indexes.
R. Stuart Geiger
UC-Berkeley School of Information
User:Staeiou / @staeiou
On Mon, Oct 24, 2011 at 1:38 PM, Jon Davis <wiki(a)konsoletek.com> wrote:
That example you posted isn't a list of all users,
just ones who have added
"Babel" template to their userpages .
That data is stored in the database, in the category and categorylinks
tables (possibly elsewhere, I can't remember offhand).
I don't think the are sorted in anything more than the current row order in
the database (so in the order of creation).
The user pages will be included in the "All pages" dumps (as opposed to the
"Articles, templates, image descriptions, and primary meta-pages.")
As for your original sets of questions:
IIRC, no userdata is included in any dumps. This is to protect user privacy.
No on all accounts, only thing related in the interface language. If you
click "My Preferences" on any Wiki, what options you see there is what is
stored in the users table (more or less)
All edits are "modifications" technically. You'd have to programatically
figure out what is _just_ adding content.
Yes, that "tool" would be called MediaWiki, if you want the most accurate
parser of MediaWiki Markup . There are some alternative parser's  but
their output can be of variable quality.
On Mon, Oct 24, 2011 at 13:11, Rami Al-Rfou' <rmyeid(a)gmail.com> wrote:
So with more investigation I discovered that I can get a list of the users
depending on their skill at a specific language. For
It seems that such list is populated from a database. Does anyone know
where can I find such database ?
Other questions are regarding the partial dumps of wikipedia. Are the
dumps sorted by any field ? How can get all the users pages ? Are they
stored in a specific dump ? Or the dumps are stored by page titles or
On Tue, Oct 18, 2011 at 15:29, Rami Al-Rfou' <rmyeid(a)gmail.com> wrote:
I am planning to study the difference in users edits style and their
spelling errors in English Wikipedia as part of a research project I am
So I downloaded some of the wikipedia XML partial dump and convert them
to SQL. My understanding that wikipedia stores every copy of the pages in
I can not see the users table! Is the users table stored in a special
Does the user table contain any properties related to the user country,
preferred wikipeidas, or their skill in different languages ?
I am interested in the user modifications that contain addition to the
articles and not modification or deletion. I am planning now to diff between
revisions to get such data. Are you aware of any tool or effort that can
Are you aware of any tools that extract the text from wikipedia markup
PhD student at Stony Brook University
Wiki-research-l mailing list
[[User:ShakataGaNai]] / KJ6FNQ
Wiki-research-l mailing list