https://bugzilla.wikimedia.org/show_bug.cgi?id=46014
The common thread here seems to be that sometimes when a vandal edit is reverted within seconds, sometimes the page content from the vandal edit is being used while oldids in the skin data (JS, print footer, etc) indicate the revert.
This doesn't seem to have anything to do with missed cache invalidation via the API: * A UI edit calls EditPage::attemptSave, which calls EditPage::internalAttemptSave and then processes the status without doing any additional cache checks. EditPage::internalAttemptSave itself calls WikiPage::doEditContent, then just checks the return value and doesn't do any additional cache checks. * An API edit calls EditPage::internalAttemptSave and then processes the status without doing any additional cache checks. * An API rollback calls WikiPage::doRollback, which calls WikiPage::commitRollback, which calls WikiPage::doEditContent and doesn't do any other cache checks.
The fact that the page content is from an old revision but the skin data is new makes me suspect a parser cache issue. I suspect Gerrit change 85917 probably improved the situation but didn't completely eliminate it, and with the number of reversions ClueBot does it probably manages to hit some race occasionally.
Looking at the parser cache handling: * WikiPage::doEditContent updates page_touched, then calls WikiPage::doEditUpdates which saves the just-parsed revision to the parser cache. * PoolWorkArticleView also saves its parsed data into the parser cache. * ApiPurge with forcelinksupdate saves its parsed data into the parser cache. * RefreshLinksJob saves its parsed data into the parser cache, if it took over 1 second to parse.
I'm suspecting there's a race somehow where the vandal saves their edit, then gets redirected back to action=view on the article, and makes it into the branch of Article::view that uses PoolWorkArticleView to reparse the text. Article pre-fetches the content to be parsed and passes it into PoolWorkArticleView. Then before the PoolWorkArticleView gets a chance to run, ClueBot comes along and reverts, updating the parser cache in the process. Then the PoolWorkArticleView gets to run, re-parses the vandal's content, and then saves that into the parser cache replacing the newer version.
I'm not sure how to test this theory, though.
If this really is what's happening, it might be enough to just have PoolWorkArticleView store a timestamp in __construct (maybe only when $content is passed) instead of calculating the timestamp in doWork. Or we could look at storing the revid in the parser cache (fetch it from the $page passed to ParserCache::save if it's not explicitly provided, for BC) and consider the parser cache entry stale if the stored revid doesn't match the current revid, just like we do with the page_touched timestamp now.