In Commons there is a bunch of broken/corrupt/missing files (most old versions of the same file).
2012/11/11 MZMcBride z@mzmcbride.com
Hi.
Is there a policy or guideline about the level to which Wikimedia wikis care about data integrity? There are a few specific cases I'm talking about:
- edits or other logged actions with a wrong timestamp;
- incomplete user renames (contributions split between two accounts);
- weird entries in various *links tables (categorylinks, pagelinks, etc.);
- weird entries in various non-links tables (page, user, etc.); and
- revisions with weird user_id (page import bug, I think?).
This has come up in the context of some old Bugzilla bugs about edits with the wrong timestamp. There's been some recent activity to mark at least some of these old bugs as "wontfix". And perhaps this makes sense, given that some of them are very old and it may be quite likely that nobody will ever go back and tweak the old revision timestamps.
But I'm left wondering if there's anything concrete in this area to guide the other Bugzilla bugs, such as the bugs about botched user renames or *links tables having orphaned or invalid entries. And to some degree, there's a human component too, I suppose. An edit with a bad timestamp isn't a big deal; breaking a user's account simply because they have a lot of edits is a much bigger deal. So there's some level of triaging to be done.
I see a lot of these issues as falling under general "database integrity". Is there a page on MediaWiki.org or Meta-Wiki that discusses this particular issue?
I believe https://bugzilla.wikimedia.org/show_bug.cgi?id=16660 ("Database table cleanup (tracking)") is the relevant tracking bug for most of what I'm describing.
MZMcBride
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l