Hello everybody. I've successfully imported both cur and old tables. However, I am realizing that the text parts in the "old" table seem to be in a compressed format. I do not know how Mediawiki is able to uncompress it and display it properly. but I'd like to have access the the column "old_text" through SQL queries, how could I do that? Indeed, I need to know the real length of each edit or comment.
Thank you.
Kevin Carillo
Kevin Carillo wrote:
I've successfully imported both cur and old tables. However, I am realizing that the text parts in the "old" table seem to be in a compressed format. I do not know how Mediawiki is able to uncompress it and display it properly. but I'd like to have access the the column "old_text" through SQL queries, how could I do that?
The old_flags field indicates the format of the old_text field: whether it's compressed, whether it's a serialized object, etc.
The simplest thing to do is to work within the MediaWiki framework and use Article::getRevisionText() on the retrieved database row: this will run the appropriate decompression and will unserialize and extract text from HistoryBlob and HistoryBlobStub object rows.
You might also check Erik Zachte's statistics scripts, a perl-based package, to see how he's reading data.
After we've gotten MediaWiki 1.5 in place over the next couple weeks we'll be providing a simple XML wrapper format for the backup dumps which won't be nearly as messy to deal with. (This is the same format used by Special:Export.) This will include an importer tool which could be used to import all the data to a local database without compression.
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org