OK, thank you guys. Now the reasons are clear :-). In any case, this forced the parser improvement, so it's welcome anyway ;).
Best,
F.
--- El lun, 15/6/09, Platonides Platonides@gmail.com escribió:
De: Platonides Platonides@gmail.com Asunto: Re: [Wikitech-l] Fixing problem with complete dumps in WikiXRay Para: wikitech-l@lists.wikimedia.org Fecha: lunes, 15 junio, 2009 10:44 Felipe Ortega wrote:
Hello, all.
For (yet) unknown reasons, last complete dump files
(pages-meta-history.xml) in some languages are flawed. Certain revision items are missing info about rev_user. Even though there are only 3 or 4 of that kind, this is enough to mess up either the parsing process or the later SQL load into the DB.
So far, the last 3 dumps of DE Wikipedia and 20090603
from FR Wikipedia have presented this error.
I have updated both WikiXRay parsers: http://meta.wikimedia.org/wiki/WikiXRay_parser http://meta.wikimedia.org/wiki/WikiXRay_parser_research
They now probe whether the parsed revision item is
complete or not, before creating the SQL. If it's flawed, its omitted and logged into an error file for later inspection.
Regards,
Felipe.
They're an effect of revdelete. You can see how they have a parameter deleted. An example is available in the bug for pywikipediabot: http://sourceforge.net/tracker/index.php?func=detail&aid=2790339&gro...
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org