I'm seeing some confusing revision data in the database. Maybe it's just me, but the toolserver data doesn't seem to match what's happening on wikipedia. I have reports from three different people that my tools don't act normally for recent data.
Here are the details for one specific revision that I'm confused about...
Compare the en.wikipedia.org information for rev_id=48644364:
http://en.wikipedia.org/w/index.php?oldid=prev&diff=48644364
page title = Cranford, New Jersey, Main Page user name = Alansohn rev comment = copyedit and wikify to put term dates and party on line for each committee member rev timestamp = 2006-04-16T01:26:50
To the toolserver's, and they don't match:
mysql> SELECT * FROM enwiki_p.revision LEFT JOIN enwiki_p.page ON rev_page=page_id LEFT JOIN enwiki_p.user_ids ON user_id=rev_user WHERE rev_id=48644364\G *************************** 1. row *************************** rev_id: 48644364 rev_page: 4708760 rev_text_id: 48429816 rev_comment: [[User:Northmeister|Northmeister]] again (sorry, again, Guy) rev_user: 453528 rev_user_text: FloNight rev_timestamp: 20060416012812 rev_minor_edit: 0 rev_deleted: 0 page_id: 4708760 page_namespace: 3 page_title: JzG page_restrictions: page_counter: 0 page_is_redirect: 0 page_is_new: 0 page_random: 0.714814436852 page_touched: 20060411181926 page_latest: 47989221 page_len: 1843 user_name: FloNight user_id: 453528 1 row in set (0.00 sec)
Just as a quick sanity check, the information from the following two sources (for a revision made before the db cluster split) do match each other:
http://en.wikipedia.org/w/index.php?oldid=prev&diff=40000000
SELECT * FROM enwiki_p.revision LEFT JOIN enwiki_p.page ON rev_page=page_id LEFT JOIN enwiki_p.user_ids ON user_id=rev_user WHERE rev_id=40000000\G
-Dave
Hm. Auto incrementing fieds out of sync?
Interiot, can you find the revision number on toolserver matching that edit (match using the correct page_id, rev_user, and timestamp) and determine if there is a constant offset in revision numbers?
On 4/26/06, interiot@68k.org interiot@68k.org wrote:
I'm seeing some confusing revision data in the database. Maybe it's just me, but the toolserver data doesn't seem to match what's happening on wikipedia. I have reports from three different people that my tools don't act normally for recent data.
Yes, for the rev_ids right around the number, they seem to be offset by 138 or 139:
toolserver en.wikipedia.org
48644125 48644264 -100 (offset 139) 48644225 48644363 -1 (offset 138) 48644226 48644364 0 " 48644227 48644365 +1 " 48644326 48644464 +100 "
-Dave (Interiot)
On Wed, Apr 26, 2006 at 01:46:25PM -0400, Gregory Maxwell wrote:
Hm. Auto incrementing fieds out of sync?
Interiot, can you find the revision number on toolserver matching that edit (match using the correct page_id, rev_user, and timestamp) and determine if there is a constant offset in revision numbers?
On 4/26/06, interiot@68k.org interiot@68k.org wrote:
I'm seeing some confusing revision data in the database. Maybe it's just me, but the toolserver data doesn't seem to match what's happening on wikipedia. I have reports from three different people that my tools don't act normally for recent data.
Toolserver-l mailing list Toolserver-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/toolserver-l
And the boundary is here:
toolserver wikipedia offset
48644125 48644264 139 48644126 ? (toolserver says it's [[User talk:219.24.158.67]], edited by Elite1trek, but I can't find the edit on Wikipedia) 48644127 48644265 138
See http://tools.wikimedia.de/~interiot/cgi-bin/queries/tmp/en_revid_offset?diff...
-Dave
On Wed, Apr 26, 2006 at 01:24:19PM -0500, interiot@68k.org wrote:
Yes, for the rev_ids right around the number, they seem to be offset by 138 or 139:
toolserver en.wikipedia.org 48644125 48644264 -100 (offset 139) 48644225 48644363 -1 (offset 138) 48644226 48644364 0 " 48644227 48644365 +1 " 48644326 48644464 +100 "-Dave (Interiot)
On Wed, Apr 26, 2006 at 01:46:25PM -0400, Gregory Maxwell wrote:
Hm. Auto incrementing fieds out of sync?
Interiot, can you find the revision number on toolserver matching that edit (match using the correct page_id, rev_user, and timestamp) and determine if there is a constant offset in revision numbers?
On 4/26/06, interiot@68k.org interiot@68k.org wrote:
I'm seeing some confusing revision data in the database. Maybe it's just me, but the toolserver data doesn't seem to match what's happening on wikipedia. I have reports from three different people that my tools don't act normally for recent data.
Toolserver-l mailing list Toolserver-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/toolserver-l
Toolserver-l mailing list Toolserver-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/toolserver-l
Another problem, this time associated with rev_page or page_id..
SELECT page_namespace, page_title, rev_user_text, rev_timestamp, rev_comment FROM revision LEFT JOIN page ON rev_page=page_id WHERE rev_id=48589976;
+----------------+---------------+---------------+----------------+-------------+ | page_namespace | page_title | rev_user_text | rev_timestamp | rev_comment | +----------------+---------------+---------------+----------------+-------------+ | 0 | Kaliphoraceae | Jossi | 20060415174633 | warning | +----------------+---------------+---------------+----------------+-------------+
But Jossi never edited [[Kaliphoraceae]]. The edit seems to be actually made to [[User talk:69.117.28.162]], since the username/timestamp/comment match that.
So, maybe the page_id is incorrect also, caused by the same problem?
-Dave
On Wed, Apr 26, 2006 at 02:19:15PM -0500, interiot@68k.org wrote:
And the boundary is here:
toolserver wikipedia offset 48644125 48644264 139 48644126 ? (toolserver says it's [[User talk:219.24.158.67]], edited by Elite1trek, but I can't find the edit on Wikipedia) 48644127 48644265 138See http://tools.wikimedia.de/~interiot/cgi-bin/queries/tmp/en_revid_offset?diff...
-Dave
On Wed, Apr 26, 2006 at 01:24:19PM -0500, interiot@68k.org wrote:
Yes, for the rev_ids right around the number, they seem to be offset by 138 or 139:
toolserver en.wikipedia.org 48644125 48644264 -100 (offset 139) 48644225 48644363 -1 (offset 138) 48644226 48644364 0 " 48644227 48644365 +1 " 48644326 48644464 +100 "-Dave (Interiot)
On Wed, Apr 26, 2006 at 01:46:25PM -0400, Gregory Maxwell wrote:
Hm. Auto incrementing fieds out of sync?
Interiot, can you find the revision number on toolserver matching that edit (match using the correct page_id, rev_user, and timestamp) and determine if there is a constant offset in revision numbers?
On 4/26/06, interiot@68k.org interiot@68k.org wrote:
I'm seeing some confusing revision data in the database. Maybe it's just me, but the toolserver data doesn't seem to match what's happening on wikipedia. I have reports from three different people that my tools don't act normally for recent data.
Toolserver-l mailing list Toolserver-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/toolserver-l
Toolserver-l mailing list Toolserver-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/toolserver-l
Toolserver-l mailing list Toolserver-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/toolserver-l
maybe possible, that this is caused due to different lags between de-and enwiki before the split?
test this on de.wikipedia, too. then we will see if this is a general problem or just split-specific.
greets, marco
On 4/27/06, Marco Schuster CDL-Klever@gmx.net wrote:
maybe possible, that this is caused due to different lags between de-and enwiki before the split?
test this on de.wikipedia, too. then we will see if this is a general problem or just split-specific.
My working theory... (haven't tested to prove it yet)
I think that what happened is that when enwiki was moved to the new cluster we lost some of the replay logs. I think this happened because we were lagged at the time of the move, and the new cluster didn't have the missing log entries. ... or something along those lines.. Since then we've been replicating but since we missed some records, the auto-increment columns are all off... relational integrity destroyed, which is everything I remember about running mysql :(
Gregory Maxwell schrieb:
I think that what happened is that when enwiki was moved to the new cluster we lost some of the replay logs. I think this happened because we were lagged at the time of the move, and the new cluster didn't have the missing log entries. ... or something along those lines.. Since then we've been replicating but since we missed some records, the auto-increment columns are all off... relational integrity destroyed, which is everything I remember about running mysql :(
Sounds realistic. I'm not a mysql-freak, but is there any way to fix that? Maybe finding the records and re-replicate everything from the split or something strange? :-) Leon
On 4/27/06, Leon Weber leon.weber@leonweber.de wrote:
Gregory Maxwell schrieb:
I think that what happened is that when enwiki was moved to the new cluster we lost some of the replay logs. I think this happened because we were lagged at the time of the move, and the new cluster didn't have the missing log entries. ... or something along those lines.. Since then we've been replicating but since we missed some records, the auto-increment columns are all off... relational integrity destroyed, which is everything I remember about running mysql :(
Sounds realistic. I'm not a mysql-freak, but is there any way to fix that? Maybe finding the records and re-replicate everything from the split or something strange? :-) Leon
... thus far it's looking like we'll have to completely rereplicate (or otherwise goof around with a lot of manual fixes)..
I'm trying to reproduce it here.
Should probably move the DB onto the new disks at the same time.
Leon Weber schrieb:
Gregory Maxwell schrieb:
I think that what happened is that when enwiki was moved to the new cluster we lost some of the replay logs. I think this happened because we were lagged at the time of the move, and the new cluster didn't have the missing log entries. ... or something along those lines.. Since then we've been replicating but since we missed some records, the auto-increment columns are all off... relational integrity destroyed, which is everything I remember about running mysql :(
Sounds realistic. I'm not a mysql-freak, but is there any way to fix that? Maybe finding the records and re-replicate everything from the split or something strange? :-) Leon
As I know, only way to do this, would be a complete kill of the toolserver db's which have the bug. Then we rebuild the replication config so that the auto_increments are not set anymore by MySQL, but cloned *exactly* from the db3 server for enwiki. When this process is started once, we clone in background all oldids and the left tables back.
But we have to think how we prevent lagging in future(lag for wikis is again rising :-(). A way to solution would be a re-arrangement of the servers and the structure:The MySQL servers are splitted up. So we got an one for enwp, another one for dewp and so on for the 5 biggest wikis. The left wikis are stored on the left server. All servers replicate to the toolserver.
Existing MySQL servers in knams and paris cluster should be converted to apache/squid servers. When an Edit happens, its data will be transported to the corresponding mysql server in Florida. Then this server sends the data back to all Apaches(not via HTTP-an persistent connection would be better here!) and the Apaches generate now new versions for the page which has been edited and the corresponding logs.
Sure, this sounds quite mad...but it is a simple idea.
greets, Marco
Marco Schuster schrieb:
But we have to think how we prevent lagging in future(lag for wikis is again rising :-(). A way to solution would be a re-arrangement of the servers and the structure:The MySQL servers are splitted up. So we got an one for enwp, another one for dewp and so on for the 5 biggest wikis. The left wikis are stored on the left server. All servers replicate to the toolserver.
It's not that easy to prevent the lag, unless you want a tool-*server* and not a tool-*cluster*. It's a bit crazy, we want to replicate the data of a whole cluster on one single server -- and we want it not to lag (Tim's words).
Existing MySQL servers in knams and paris cluster should be converted to apache/squid servers. When an Edit happens, its data will be transported to the corresponding mysql server in Florida. Then this server sends the data back to all Apaches(not via HTTP-an persistent connection would be better here!) and the Apaches generate now new versions for the page which has been edited and the corresponding logs.
That's how we have it right now. Leon
On 4/27/06, Leon Weber leon.weber@leonweber.de wrote:
It's not that easy to prevent the lag, unless you want a tool-*server* and not a tool-*cluster*. It's a bit crazy, we want to replicate the data of a whole cluster on one single server -- and we want it not to lag (Tim's words).
The transaction rates on the wikimedia database servers are not that high. This is not that unreasonable.
Expensive queries are a problem which is why we probably should run two databases, one for live things and one for research queries. It also makes sense because we'd carry different tables for each..
(for example, it would be useful to carry *links tables which are keyed on (page_id,revision) to cover all revisions rather than just the most current)
Yup, the user_id is off too, for recently created users...
SELECT * FROM user_ids WHERE user_id=1273887;
+----------------+---------+ | user_name | user_id | +----------------+---------+ | Atleastamirror | 1273887 | +----------------+---------+
SELECT rev_user_text, rev_timestamp FROM revision WHERE rev_user=1273887;
+---------------+----------------+ | rev_user_text | rev_timestamp | +---------------+----------------+ | Widdim | 20060418030524 | | Widdim | 20060418030631 | | Widdim | 20060418030756 | | Widdim | 20060418030905 | +---------------+----------------+
The same problem was not observed on recently-created dewiki accounts.
-Dave
On Thu, Apr 27, 2006 at 03:54:39PM +0200, Marco Schuster wrote:
maybe possible, that this is caused due to different lags between de-and enwiki before the split?
test this on de.wikipedia, too. then we will see if this is a general problem or just split-specific.
greets, marco _______________________________________________ Toolserver-l mailing list Toolserver-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/toolserver-l
toolserver-l@lists.wikimedia.org