This is a somewhat complex problem so bear with me, I'll try to describe it as concisely as possible.
Background: I'm in the process of updating a large (20k pages), old (started ca 2004) MediaWiki site. It was running MediaWiki 1.18.3 and I thought a good first step would be to update the site to the 1.19.x LTS version, which I did (with the plan of moving to 1.23.x in March). Extensions were updated as well. Everything seemed stable and pre-upgrade backups were not kept after a month of no problem reports (big mistake, I know!) There are nightly backups but only a week's worth are kept for storage reasons. However, I have located some very old (3 year+) database backups.
The site admins use a MW extension called "merge-and-delete" to deal with spammers. There is a permanent, blocked user called "spammer" and any time a new user account is created by a spammer, the editors merge- and-delete that account into the "spammer" account. There are < 100 real editors on the site and this process kept them from being overwhelmed by the thousands of spammers creating accounts on the site in recent years.
The Problem About a month after the 1.19.x update, a problem was discovered. Somehow, all edits dated 2011 or earlier were altered such that they are now credited to the "spammer" user rather than the actual user who made the edits. The cause of the corruption appears to be a bug/problem related to the merge-and-delete extension and MW 1.19. I'm not seeking help for that here - I have turned off and removed the extension.
My problem is: how to restore the edit history so that edits are credited to the correct users again. The timestamp and page names of all edits are still correct, only the user name was corrupted. But the only backup old enough to have uncorrupted edit history data is years old and from a much older version of MW (maybe 1.16). And I need to fix the problem without losing years of edits.
The Solution? I can't see any easy fix for this. But I have thought of an approach that might work. If I write a bot that reads the very old database, extracts only <=2011 edit history, and then compares that data to the corrupted live site, perhaps it could work its way through the edit history making corrections. Does this seem plausible? And, if so, any advice on what to look out for? If anyone has alternate suggestions, I'm up for entertaining just about any idea on how to fix this.
-Steve
Are both the usernames and user ids in the revision history corrupt?
On Mon, Feb 29, 2016 at 3:24 PM, Steve Rainwater srainwater@ncc.com wrote:
This is a somewhat complex problem so bear with me, I'll try to describe it as concisely as possible.
Background: I'm in the process of updating a large (20k pages), old (started ca 2004) MediaWiki site. It was running MediaWiki 1.18.3 and I thought a good first step would be to update the site to the 1.19.x LTS version, which I did (with the plan of moving to 1.23.x in March). Extensions were updated as well. Everything seemed stable and pre-upgrade backups were not kept after a month of no problem reports (big mistake, I know!) There are nightly backups but only a week's worth are kept for storage reasons. However, I have located some very old (3 year+) database backups.
The site admins use a MW extension called "merge-and-delete" to deal with spammers. There is a permanent, blocked user called "spammer" and any time a new user account is created by a spammer, the editors merge- and-delete that account into the "spammer" account. There are < 100 real editors on the site and this process kept them from being overwhelmed by the thousands of spammers creating accounts on the site in recent years.
The Problem About a month after the 1.19.x update, a problem was discovered. Somehow, all edits dated 2011 or earlier were altered such that they are now credited to the "spammer" user rather than the actual user who made the edits. The cause of the corruption appears to be a bug/problem related to the merge-and-delete extension and MW 1.19. I'm not seeking help for that here - I have turned off and removed the extension.
My problem is: how to restore the edit history so that edits are credited to the correct users again. The timestamp and page names of all edits are still correct, only the user name was corrupted. But the only backup old enough to have uncorrupted edit history data is years old and from a much older version of MW (maybe 1.16). And I need to fix the problem without losing years of edits.
The Solution? I can't see any easy fix for this. But I have thought of an approach that might work. If I write a bot that reads the very old database, extracts only <=2011 edit history, and then compares that data to the corrupted live site, perhaps it could work its way through the edit history making corrections. Does this seem plausible? And, if so, any advice on what to look out for? If anyone has alternate suggestions, I'm up for entertaining just about any idea on how to fix this.
-Steve
MediaWiki-l mailing list To unsubscribe, go to: https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Good suggestion! I looked at the 'revision' table and found that only the 'rev_user' ID value is corrupt, it is set to the wrong user ID. However, the 'rev_user_text' field looks intact and still has the correct user name (or IP address). That could potentially allow a simple fix. Maybe I can just write a quick script to reset 'rev_user' to the ID of the user that matches the 'rev_user_text' field?
Are there other tables beside 'revision' that I should check?
-Steve
On Mon, 2016-02-29 at 15:33 -0500, John wrote:
Are both the usernames and user ids in the revision history corrupt?
On Mon, Feb 29, 2016 at 3:24 PM, Steve Rainwater srainwater@ncc.com wrote:
This is a somewhat complex problem so bear with me, I'll try to describe it as concisely as possible.
Background: I'm in the process of updating a large (20k pages), old (started ca 2004) MediaWiki site. It was running MediaWiki 1.18.3 and I thought a good first step would be to update the site to the 1.19.x LTS version, which I did (with the plan of moving to 1.23.x in March). Extensions were updated as well. Everything seemed stable and pre-upgrade backups were not kept after a month of no problem reports (big mistake, I know!) There are nightly backups but only a week's worth are kept for storage reasons. However, I have located some very old (3 year+) database backups.
The site admins use a MW extension called "merge-and-delete" to deal with spammers. There is a permanent, blocked user called "spammer" and any time a new user account is created by a spammer, the editors merge- and-delete that account into the "spammer" account. There are < 100 real editors on the site and this process kept them from being overwhelmed by the thousands of spammers creating accounts on the site in recent years.
The Problem About a month after the 1.19.x update, a problem was discovered. Somehow, all edits dated 2011 or earlier were altered such that they are now credited to the "spammer" user rather than the actual user who made the edits. The cause of the corruption appears to be a bug/problem related to the merge-and-delete extension and MW 1.19. I'm not seeking help for that here - I have turned off and removed the extension.
My problem is: how to restore the edit history so that edits are credited to the correct users again. The timestamp and page names of all edits are still correct, only the user name was corrupted. But the only backup old enough to have uncorrupted edit history data is years old and from a much older version of MW (maybe 1.16). And I need to fix the problem without losing years of edits.
The Solution? I can't see any easy fix for this. But I have thought of an approach that might work. If I write a bot that reads the very old database, extracts only <=2011 edit history, and then compares that data to the corrupted live site, perhaps it could work its way through the edit history making corrections. Does this seem plausible? And, if so, any advice on what to look out for? If anyone has alternate suggestions, I'm up for entertaining just about any idea on how to fix this.
-Steve
MediaWiki-l mailing list To unsubscribe, go to: https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
MediaWiki-l mailing list To unsubscribe, go to: https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Followup. Thanks to John's suggestion, I discovered that only the user IDs (rev_user) in the revision table were corrupt. The user name (rev_user_text) was still intact. So I was able to write a script that walked through the revision table from the beginning, looked up the correct ID for each rev_user_text name and reset the rev_user field. Everything seems ok now and page revisions show the correct user names for edits again.
-Steve
On Mon, 2016-02-29 at 15:33 -0500, John wrote:
Are both the usernames and user ids in the revision history corrupt?
On Mon, Feb 29, 2016 at 3:24 PM, Steve Rainwater srainwater@ncc.com wrote:
This is a somewhat complex problem so bear with me, I'll try to describe it as concisely as possible.
Background: I'm in the process of updating a large (20k pages), old (started ca 2004) MediaWiki site. It was running MediaWiki 1.18.3 and I thought a good first step would be to update the site to the 1.19.x LTS version, which I did (with the plan of moving to 1.23.x in March). Extensions were updated as well. Everything seemed stable and pre-upgrade backups were not kept after a month of no problem reports (big mistake, I know!) There are nightly backups but only a week's worth are kept for storage reasons. However, I have located some very old (3 year+) database backups.
The site admins use a MW extension called "merge-and-delete" to deal with spammers. There is a permanent, blocked user called "spammer" and any time a new user account is created by a spammer, the editors merge- and-delete that account into the "spammer" account. There are < 100 real editors on the site and this process kept them from being overwhelmed by the thousands of spammers creating accounts on the site in recent years.
The Problem About a month after the 1.19.x update, a problem was discovered. Somehow, all edits dated 2011 or earlier were altered such that they are now credited to the "spammer" user rather than the actual user who made the edits. The cause of the corruption appears to be a bug/problem related to the merge-and-delete extension and MW 1.19. I'm not seeking help for that here - I have turned off and removed the extension.
My problem is: how to restore the edit history so that edits are credited to the correct users again. The timestamp and page names of all edits are still correct, only the user name was corrupted. But the only backup old enough to have uncorrupted edit history data is years old and from a much older version of MW (maybe 1.16). And I need to fix the problem without losing years of edits.
The Solution? I can't see any easy fix for this. But I have thought of an approach that might work. If I write a bot that reads the very old database, extracts only <=2011 edit history, and then compares that data to the corrupted live site, perhaps it could work its way through the edit history making corrections. Does this seem plausible? And, if so, any advice on what to look out for? If anyone has alternate suggestions, I'm up for entertaining just about any idea on how to fix this.
-Steve
MediaWiki-l mailing list To unsubscribe, go to: https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
MediaWiki-l mailing list To unsubscribe, go to: https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
mediawiki-l@lists.wikimedia.org