[MediaWiki-l] Need advice fixing corrupted revision history

Steve Rainwater srainwater at ncc.com
Mon Feb 29 21:27:21 UTC 2016


Good suggestion! I looked at the 'revision' table and found that only
the 'rev_user' ID value is corrupt, it is set to the wrong user ID.
However, the 'rev_user_text' field looks intact and still has the
correct user name (or IP address). That could potentially allow a
simple fix. Maybe I can just write a quick script to reset 'rev_user'
to the ID of the user that matches the 'rev_user_text' field? 

Are there other tables beside 'revision' that I should check?

-Steve


On Mon, 2016-02-29 at 15:33 -0500, John wrote:
> Are both the usernames and user ids in the revision history corrupt?
> 
> On Mon, Feb 29, 2016 at 3:24 PM, Steve Rainwater <srainwater at ncc.com>
> wrote:
> 
> > 
> > This is a somewhat complex problem so bear with me, I'll try to
> > describe it as concisely as possible.
> > 
> > Background:
> > I'm in the process of updating a large (20k pages), old (started ca
> > 2004) MediaWiki site. It was running MediaWiki 1.18.3 and I thought
> > a
> > good first step would be to update the site to the 1.19.x LTS
> > version,
> > which I did (with the plan of moving to 1.23.x in March).
> > Extensions
> > were updated as well. Everything seemed stable and pre-upgrade
> > backups
> > were not kept after a month of no problem reports (big mistake, I
> > know!) There are nightly backups but only a week's worth are kept
> > for
> > storage reasons. However, I have located some very old (3 year+)
> > database backups.
> > 
> > The site admins use a MW extension called "merge-and-delete" to
> > deal
> > with spammers. There is a permanent, blocked user called "spammer"
> > and
> > any time a new user account is created by a spammer, the editors
> > merge-
> > and-delete that account into the "spammer" account. There are < 100
> > real editors on the site and this process kept them from being
> > overwhelmed by the thousands of spammers creating accounts on the
> > site
> > in recent years.
> > 
> > The Problem
> > About a month after the 1.19.x update, a problem was discovered.
> > Somehow, all edits dated 2011 or earlier were altered such that
> > they
> > are now credited to the "spammer" user rather than the actual user
> > who
> > made the edits. The cause of the corruption appears to be a
> > bug/problem
> > related to the merge-and-delete extension and MW 1.19. I'm not
> > seeking
> > help for that here - I have turned off and removed the extension.
> > 
> > My problem is: how to restore the edit history so that edits are
> > credited to the correct users again. The timestamp and page names
> > of
> > all edits are still correct, only the user name was corrupted. But
> > the
> > only backup old enough to have uncorrupted edit history data is
> > years
> > old and from a much older version of MW (maybe 1.16). And I need to
> > fix
> > the problem without losing years of edits.
> > 
> > The Solution?
> > I can't see any easy fix for this. But I have thought of an
> > approach
> > that might work. If I write a bot that reads the very old database,
> > extracts only <=2011 edit history, and then compares that data to
> > the
> > corrupted live site, perhaps it could work its way through the edit
> > history making corrections. Does this seem plausible? And, if so,
> > any
> > advice on what to look out for? If anyone has alternate
> > suggestions,
> > I'm up for entertaining just about any idea on how to fix this.
> > 
> > -Steve
> > 
> > 
> > _______________________________________________
> > MediaWiki-l mailing list
> > To unsubscribe, go to:
> > https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
> > 
> _______________________________________________
> MediaWiki-l mailing list
> To unsubscribe, go to:
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-l



More information about the MediaWiki-l mailing list