I don't know if there are any texts stored in the text table directly
that contain multiple compressed revisions, on the production cluster.
There are certainly some revision texts not stored in external store,
which consist of serialized objcts (perhaps broken) or gzipped data. I
just verified this by looking at some older entries in the text table
for eo.wikipedia.
As I look at this:
http://wikitech.wikimedia.org/view/Text_storage_data
it appears that we may indeed have a few problematic entries lying
around:
216694 219570 2876 object/concatenatedgziphistoryblob
If it will help you for testing, I'll try to track down a few such
revisions. What the test suite should do right now is whatever the
current code does when asked to fetch the revision. At some point we
need to go through all the old revision texts and patch up anything
broken to the extent possible. It will be a lot of work.
Ariel
Στις 08-02-2012, ημέρα Τετ, και ώρα 09:43 +0100, ο/η Christian
Aistleitner έγραψε:
Hello,
I am currently developing a test suite for the XML dumps, and I am curious about the
specification of text.old_flags in MediaWiki's maintainance/tables.sql.
The file describes the 'object' flag as
text field contained a serialized PHP object.
object either contains multiple versions compressed
to achieve a better compression ratio, or it refers
to another row where the text can be found.
Is the „multiple versions” part still used in some project?
If so, how should this be set up [1]?
Kind regards,
Christian
P.S.: In #wikimedia-dev I was told, to bring up the question on this list. If there are
further lists, where I should ask, please let me know.
[1] Before r6138 (back then still in Article.php not Revision.php), it seems the text was
obtained by
$object = unserialize( $text );
$text = $object->getItem( $hash );
. There it is somewhat obvious how a single object may return different texts. However,
beginning with p6138 it seems the text is simply fetched by
$obj = unserialize( $text );
[...]
$text = $obj->getText();
. If a single object should return different texts, how does it determine, which text to
return?
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l