Hi all,
I have a script running collecting data in multiple wikipedia(s), I started to notice that revision table in lbwiki_p has some incorrect data.
Here is an example: mysql> select rev_id, rev_user, rev_page, rev_deleted, rev_len, rev_timestamp from revision where rev_id = 185751; +--------+----------+----------+-------------+---------+----------------+ | rev_id | rev_user | rev_page | rev_deleted | rev_len | rev_timestamp | +--------+----------+----------+-------------+---------+----------------+ | 185751 | 580 | 83446 | 0 | NULL | 20061203231418 | +--------+----------+----------+-------------+---------+----------------+
mysql> select rev_id, rev_page, rev_len from revision where rev_page = 83446 and rev_timestamp < 20061203231418; +--------+----------+---------+ | rev_id | rev_page | rev_len | +--------+----------+---------+ | 115478 | 83446 | NULL | | 118003 | 83446 | NULL | | 118009 | 83446 | NULL | | 138010 | 83446 | NULL | +--------+----------+---------+
According to my understanding if a record exist rev_len shouldn't be NULL, if the revision deleted then rev_deleted should get flag but rev_length should remain as it is.
Hope someone can look into this, because people who are doing analysis might end up getting wrong results.
Best; -- Anuradha Uduwage (Anu)
Hello, At Saturday 16 March 2013 01:41:47 DaB. wrote:
Hi all,
I have a script running collecting data in multiple wikipedia(s), I started to notice that revision table in lbwiki_p has some incorrect data.
Here is an example: mysql> select rev_id, rev_user, rev_page, rev_deleted, rev_len, rev_timestamp from revision where rev_id = 185751; +--------+----------+----------+-------------+---------+----------------+
| rev_id | rev_user | rev_page | rev_deleted | rev_len | rev_timestamp |
+--------+----------+----------+-------------+---------+----------------+
| 185751 | 580 | 83446 | 0 | NULL | 20061203231418 |
+--------+----------+----------+-------------+---------+----------------+
The result is correct.
According to my understanding if a record exist rev_len shouldn't be NULL, if the revision deleted then rev_deleted should get flag but rev_length should remain as it is.
Hope someone can look into this, because people who are doing analysis might end up getting wrong results.
rev_lenght will remain as it is – the problem is that rev_lenght was not there from the very beginning and was never (AFAIK) back-populated; so very old rows has no lenght and are NULL.
Best;
Anuradha Uduwage (Anu)
Sincerely, DaB.
DaB. wrote:
rev_lenght will remain as it is – the problem is that rev_lenght was not there from the very beginning and was never (AFAIK) back-populated; so very old rows has no lenght and are NULL.
Related: https://bugzilla.wikimedia.org/show_bug.cgi?id=12188.
MZMcBride
toolserver-l@lists.wikimedia.org