Scorpunit wrote:
I have a problem to get the real text out of the old_text column. In the cur table for example the real text is saved in the database. But in the old table the text is hashed.
for example: O:15:"historyblobstub":2: {s:6:"mOldId";s:5:"48086";s:5:"mHash";s:32:"1ea4d983a8fd0a10c46022035552d597";}
How can I get the real text out of such a hash?
If you're working within the MediaWiki framework (eg a maintenance script), use Article::getRevisionText( $row ) (where $row is a row in object format, as retrieved with Database::fetchObject or mysql_fetch_object; your query must include both old_text and old_flags fields.)
The field there is a serialized PHP object which refers to another row which contains multiple revisions compressed together; see HistoryBlob.php for the classes used.
In the future we'll be making public dumps in a cleaner XML-wrapper format, the same format provided by Special:Export. You can take text out of that directly, or let it import to a database (potentially without all the fancy compression options we use) and take it from there.
-- brion vibber (brion @ pobox.com)