Thank you for your quick answer. I'd like to precise a little more the doubts I have:
20050421_cur_table.sql.gz ----> 864 108 KB and 20050421_cur_table.sql ----> around gigabytes (compression factor of around 3)
BUT
20050421_old_table.sql.gz ----> around 31 gigabytes and 20050421_old_table.sql ----> 34 201 362 bytes (compression factor of around 1.1)
Even though gunzip seemed to have worked well, I am still very suspicious. I was expecting an old_table.sql file of around 80 to 90 gigabytes.
I am sorry to insist, but it is very important for my research project to have the integrality of the records in the table "old" for I am trying to study collaboration processes among wikipedians.
Thank you very much.
Kevin Carillo
_____
From: wikitech-l-bounces@wikimedia.org [mailto:wikitech-l-bounces@wikimedia.org] On Behalf Of wikitech-l@wikimedia.org Sent: April 26, 2005 11:48 AM To: wikitech-l@wikimedia.org Subject: Wikitech-l Digest, Vol 21, Issue 48 Importance: Low
Send Wikitech-l mailing list submissions to wikitech-l@wikimedia.org mailto:
To subscribe or unsubscribe via the World Wide Web, visit http://mail.wikipedia.org/mailman/listinfo/wikitech-l or, via email, send a message with subject or body 'help' to wikitech-l-request@wikimedia.org mailto:
You can reach the person managing the list at wikitech-l-owner@wikimedia.org mailto:
When replying, please edit your Subject line so it is more specific than "Re: Contents of Wikitech-l digest..."
Today's Topics:
1. Re: MediaWiki 1.5 release schedule (Brion Vibber) 2. Re: (no subject) (Tomer Chachamu) 3. Re: Test wiki (Phil Boswell) 4. Re: cvs head version - change of user preference: fully re-arranging the sub-options (Timwi) 5. Re: [patch] Anonymous edit warning (Timwi) 6. Question about uncompressing and importing Wikipedia "old" table (Kevin Carillo) 7. Re: cvs head version - change of user preference: fully re-arranging the sub-options (Phil Boswell) 8. Re: Question about uncompressing and importing Wikipedia "old" table (John Fader) 9. Re: Question about uncompressing and importing Wikipedia "old" table (Timwi) 10. AW: [Wikitech-l] Upload-Problems in Version 1.4.0 (German) (Karl-Otto Kirst) 11. AW: [Wikitech-l] Upload-Problems in Version 1.4.0 (German) (Karl-Otto Kirst)
----------------------------------------------------------------------
Message: 1 Date: Tue, 26 Apr 2005 05:10:31 -0700 From: Brion Vibber <brion@pobox.com mailto: > Subject: Re: [Wikitech-l] MediaWiki 1.5 release schedule To: Wikimedia developers <wikitech-l@wikimedia.org mailto: > Message-ID: <426E2FB7.7090901@pobox.com mailto: > Content-Type: text/plain; charset="iso-8859-1"
MaPhi Werner wrote:
So, out of naivity: how does the process work concerning the incorporation of enhancement patches?
Enhancement patches may or may not get integrated, depending on time, interest, code cleanliness, bugginess, etc.
I continued my work on read-access controlled pages (see http://mail.wikipedia.org/pipermail/wikitech-l/2005-March/028105.html for my first approach). The second version abandoned the idea of hiding individual pages and deals with namespaces instead.
Read access restrictions are a really, really low priority. I personally have no interest in seeing such a feature in MediaWiki, though others might take a crack at your patch.
-- brion vibber (brion @ pobox.com mailto: )
As far as I know, most data in the "old" table is already compressed (and decompressed on-the-fly when the mediawiki software retrieves it), to save space on the live server. So it's unlikely that gzipping the resulting file would reduce its size.
Alfio
On Tue, 26 Apr 2005, Kevin Carillo wrote:
Thank you for your quick answer. I'd like to precise a little more the doubts I have:
20050421_cur_table.sql.gz ----> 864 108 KB and 20050421_cur_table.sql ----> around gigabytes (compression factor of around 3)
BUT
20050421_old_table.sql.gz ----> around 31 gigabytes and 20050421_old_table.sql ----> 34 201 362 bytes (compression factor of around 1.1)
Even though gunzip seemed to have worked well, I am still very suspicious. I was expecting an old_table.sql file of around 80 to 90 gigabytes.
I am sorry to insist, but it is very important for my research project to have the integrality of the records in the table "old" for I am trying to study collaboration processes among wikipedians.
Thank you very much.
Kevin Carillo
From: wikitech-l-bounces@wikimedia.org [mailto:wikitech-l-bounces@wikimedia.org] On Behalf Of wikitech-l@wikimedia.org Sent: April 26, 2005 11:48 AM To: wikitech-l@wikimedia.org Subject: Wikitech-l Digest, Vol 21, Issue 48 Importance: Low
Send Wikitech-l mailing list submissions to wikitech-l@wikimedia.org mailto:
To subscribe or unsubscribe via the World Wide Web, visit http://mail.wikipedia.org/mailman/listinfo/wikitech-l or, via email, send a message with subject or body 'help' to wikitech-l-request@wikimedia.org mailto:
You can reach the person managing the list at wikitech-l-owner@wikimedia.org mailto:
When replying, please edit your Subject line so it is more specific than "Re: Contents of Wikitech-l digest..."
Today's Topics:
- Re: MediaWiki 1.5 release schedule (Brion Vibber)
- Re: (no subject) (Tomer Chachamu)
- Re: Test wiki (Phil Boswell)
- Re: cvs head version - change of user preference: fully re-arranging the sub-options (Timwi)
- Re: [patch] Anonymous edit warning (Timwi)
- Question about uncompressing and importing Wikipedia "old" table (Kevin Carillo)
- Re: cvs head version - change of user preference: fully re-arranging the sub-options (Phil Boswell)
- Re: Question about uncompressing and importing Wikipedia "old" table (John Fader)
- Re: Question about uncompressing and importing Wikipedia "old" table (Timwi)
- AW: [Wikitech-l] Upload-Problems in Version 1.4.0 (German) (Karl-Otto Kirst)
- AW: [Wikitech-l] Upload-Problems in Version 1.4.0 (German) (Karl-Otto Kirst)
Message: 1 Date: Tue, 26 Apr 2005 05:10:31 -0700 From: Brion Vibber <brion@pobox.com mailto: > Subject: Re: [Wikitech-l] MediaWiki 1.5 release schedule To: Wikimedia developers <wikitech-l@wikimedia.org mailto: > Message-ID: <426E2FB7.7090901@pobox.com mailto: > Content-Type: text/plain; charset="iso-8859-1"
MaPhi Werner wrote:
So, out of naivity: how does the process work concerning the incorporation of enhancement patches?
Enhancement patches may or may not get integrated, depending on time, interest, code cleanliness, bugginess, etc.
I continued my work on read-access controlled pages (see http://mail.wikipedia.org/pipermail/wikitech-l/2005-March/028105.html for my first approach). The second version abandoned the idea of hiding individual pages and deals with namespaces instead.
Read access restrictions are a really, really low priority. I personally have no interest in seeing such a feature in MediaWiki, though others might take a crack at your patch.
-- brion vibber (brion @ pobox.com mailto: )
Kevin Carillo wrote:
20050421_old_table.sql.gz ----> around 31 gigabytes and 20050421_old_table.sql ----> 34 201 362 bytes (compression factor of around 1.1)
That's entirely normal, as stored text in the old table is usually compressed.
In the current tables, there are three possible states for a row in the old table.
(default): uncompressed single item. You probably won't find many of these in the Wikipedia dumps.
gzip: An individual text revision compressed with PHP's gzdeflate() function, to be uncompressed with PHP's gzinflate() function. These wrap zlib functions with some specific settings. If you for some reason don't want to use MediaWiki or PHP to retrieve data from the dump, see Erik Zachte's stats script for example Perl code.
object: A serialized PHP object which either contains multiple revisions of a page blobbed and compressed together, or references a particular row in which this revision can be found blobbed and compressed with others. This provides a better overall compression ratio in the database than individual compression. See includes/HistoryBlob.php
gzip and object rows are indicated by the presence of those flags in the old_flags field.
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org