I don't know if there are any texts stored in the text table directly that contain multiple compressed revisions, on the production cluster. There are certainly some revision texts not stored in external store, which consist of serialized objcts (perhaps broken) or gzipped data. I just verified this by looking at some older entries in the text table for eo.wikipedia.
As I look at this: http://wikitech.wikimedia.org/view/Text_storage_data it appears that we may indeed have a few problematic entries lying around: 216694 219570 2876 object/concatenatedgziphistoryblob
If it will help you for testing, I'll try to track down a few such revisions. What the test suite should do right now is whatever the current code does when asked to fetch the revision. At some point we need to go through all the old revision texts and patch up anything broken to the extent possible. It will be a lot of work.
Ariel
Στις 08-02-2012, ημέρα Τετ, και ώρα 09:43 +0100, ο/η Christian Aistleitner έγραψε:
Hello,
I am currently developing a test suite for the XML dumps, and I am curious about the specification of text.old_flags in MediaWiki's maintainance/tables.sql. The file describes the 'object' flag as
text field contained a serialized PHP object. object either contains multiple versions compressed to achieve a better compression ratio, or it refers to another row where the text can be found.
Is the „multiple versions” part still used in some project? If so, how should this be set up [1]?
Kind regards, Christian
P.S.: In #wikimedia-dev I was told, to bring up the question on this list. If there are further lists, where I should ask, please let me know.
[1] Before r6138 (back then still in Article.php not Revision.php), it seems the text was obtained by $object = unserialize( $text ); $text = $object->getItem( $hash ); . There it is somewhat obvious how a single object may return different texts. However, beginning with p6138 it seems the text is simply fetched by $obj = unserialize( $text ); [...] $text = $obj->getText(); . If a single object should return different texts, how does it determine, which text to return?
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l