Hi Ariel.
I've been noticing since last year (when I introduced a log error service in
WikiXRay) that there are several malformed revision items in the dump files.
This cause exceptions when trying to insert tuples in the DB without coherent
values.
I still receive these errors in the new dumps, so I think there is indeed some
issue that you should check.
Most of times, the missing value is for <rev_user>. I'm not sure about the
cause, but perhaps the dump process is facing a high load in the target server
and this causes blanks to be inserted instead of the actual value.
You can take a look at these malformed items in the following error log file
(taken from chunk 10 produced in March 2011):
http://gsyc.es/~jfelipe/tmp/error10_wx_enwiki_032011
All chunks from March 2011 (and previous dumps) contained these errors.
The fraction of these erroneous entries is still very low, compared to the size
of the whole dump, so it doesn't affect the accuracy of global studies. All the
same, it might cause some trouble in case one is looking for a particular
revision in the complete collection (I haven't checked explicitly, but it looks
like there is no pattern in these errors and they are produced randomly).
Let me know in you need more info that can be of help to solve this issue.
Best,
Felipe.