Hello,
I have a problem to get the real text out of the old_text column. In the cur table for example the real text is saved in the database. But in the old table the text is hashed.
for example: O:15:"historyblobstub":2: {s:6:"mOldId";s:5:"48086";s:5:"mHash";s:32:"1ea4d983a8fd0a10c46022035552d597";}
How can I get the real text out of such a hash?
Thanks..
Scorpunit wrote:
Hello,
I have a problem to get the real text out of the old_text column. In the cur table for example the real text is saved in the database. But in the old table the text is hashed.
for example: O:15:"historyblobstub":2: {s:6:"mOldId";s:5:"48086";s:5:"mHash";s:32:"1ea4d983a8fd0a10c46022035552d597";}
How can I get the real text out of such a hash?
Thanks..
Easy.
<?php $hash = $argv[1]; if ( $hash == md5( '' ) ) { exit(0); } for ( $len = 1; $len < 1e6; $len++ ) { $a = array_fill( 0, $len, 0 ); do { $s = ''; for ( $i = 0; $i < $len; $i++ ) { $s .= chr( $a[$i] ); } if ( md5( $s ) == $hash ) { print $s; exit(0); } $i = 0; do { $a[$i] = ( $a[$i] + 1 ) % 256; } while ( !$a[$i] && ++$i < $len ); } while ( $i != $len ); } exit(1); ?>
Might need some optimising before it's fast enough for server use though.
-- Tim Starling
Scorpunit wrote:
I have a problem to get the real text out of the old_text column. In the cur table for example the real text is saved in the database. But in the old table the text is hashed.
for example: O:15:"historyblobstub":2: {s:6:"mOldId";s:5:"48086";s:5:"mHash";s:32:"1ea4d983a8fd0a10c46022035552d597";}
How can I get the real text out of such a hash?
If you're working within the MediaWiki framework (eg a maintenance script), use Article::getRevisionText( $row ) (where $row is a row in object format, as retrieved with Database::fetchObject or mysql_fetch_object; your query must include both old_text and old_flags fields.)
The field there is a serialized PHP object which refers to another row which contains multiple revisions compressed together; see HistoryBlob.php for the classes used.
In the future we'll be making public dumps in a cleaner XML-wrapper format, the same format provided by Special:Export. You can take text out of that directly, or let it import to a database (potentially without all the fancy compression options we use) and take it from there.
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org