Hi all.
We (the Wikidata team) ran into an issue recently with the value that gets passed as $baseRevId to Content::prepareSave(), see Bug 67831 [1]. This comes from WikiPage::doEditContent(), and, for core, is nearly always set to "false" (e.g. by EditPage).
We interpreted this rev ID to be the revision that is the nominal base revision of the edit, and implemented an edit conflict check based on it. Which works with the way we use doEditContent() for wikibase on wikidata, and with most stuff in core (which generally has $baseRevId = false). But as it turns out, it does not work with rollbacks: WikiPage::commitRollback sets $baseRevId to the ID of the revision we revert *to*.
Now, is that correct, or is it a bug? What does "base revision" mean?
The documentation of WikiPage::doEditContent() is unclear about this (yes, I wrote this method when introducing the Content class - but I copied the interface WikiPage::doEdit(), and mostly kept the code as it was). And in the code, $baseRevId is not used at all except for passing it to hooks and to Content::prepareSave - which doesn't do anything with it for any of the Content implementations in core - only in Wikibase we tried to implement a conflict check here, which should really be in WikiPage, I think.
So, what *does* $baseRevId mean? If you happen to know when and why $baseRevId was introduced, please enlighten me. I can think of three possibilities:
1) It's the edit's reference revision, used to detect edit conflicts (this is how we use this in Wikibase). That is, an edit is done with respect to a specific revision, and that revision is passed back to WikiPage when saving, so a check for edit conflicts can be done as close to the actual edit as possible (ideally, in the same DB transaction). Compare bug 56849 [2].
2) The edit's "physical parent": that would be the same as (1), unless there is a conflict that was detected early and automatic resolved by rebasing the edit. E.g. if an edit is performed based on revision 11, but revision 12 was added since, and the edit was successfully rebased, the "parent" would be 12, not 11. This is what WikiPage::doEditContent() calls $oldid, and what gets saved in rev_parent_id. Since WikiPage::doEditContent() makes the distinction between $oldid and $baseRevId, this is probably not what $baseRevId was intended to be.
3) It could be the "logical parent": this would be identical to (2), except for a rollback: if I revert revision 15 and 14 back to revision 13, the new revision's logical parent would be rev 13's parent. The idea is that you are restoring rev 13 as it was, with the same parent rev 13 had. Something like this seems to be the intention of what commitRollback() currently does, but the way it is now, the new revision would have rev 13 as its logical parent (which, for a rollback, would have identical content).
So at present, what commitRollback currently does is none of the above, and I can't see how what it does makes sense.
I suggest we fix it, define $baseRevId to mean what I explained under (1), and implement a "late" conflict check right in the DB transaction that updates the revision (or page) table. This might confuse some extensions though, we should double check AbuseFilter, if nothing else.
Is that a good approach? Please let me know.
-- daniel
[1] https://bugzilla.wikimedia.org/show_bug.cgi?id=65831 [2] https://bugzilla.wikimedia.org/show_bug.cgi?id=56849
On Wed, May 28, 2014 at 7:25 AM, Daniel Kinzler daniel@brightbyte.dewrote:
If you happen to know when and why $baseRevId was introduced, please enlighten me.
"When" is easy enough: http://mediawiki.org/wiki/Special:Code/MediaWiki/34987
If I had to guess at the "why", I'd guess it was probably for FlaggedRevs and auto-reviewing.
Yes it was for auto-reviewing new revisions. New revisions are seen as a combination of (base revision, changes). If the base revision was reviewed and the user is trusted, then so is the new revision. MW core had the obvious cases of rollback and null edits, which are (base revision, no changes). Their is a lot more "base revision" detection in FlaggedRevs for the remaining cases, some less obvious (user supplied baseRevId, X-top edit undo, fall back to prior edit).
If baseRevId is always set to the revision the user started from it would cause problems for that extension for the cases where it was previously false.
It would indeed be useful to have a casRevId value that was the current revision at the time of editing just for CAS style conflict detection.
-- View this message in context: http://wikimedia.7.x6.nabble.com/Unclear-Meaning-of-baseRevId-in-WikiPage-do... Sent from the Wikipedia Developers mailing list archive at Nabble.com.
Am 29.05.2014 21:07, schrieb Aaron Schulz:
Yes it was for auto-reviewing new revisions. New revisions are seen as a combination of (base revision, changes).
But EditPage in core sets $baseRevId to false. The info isn't there for the "standard" case. In fact, the ONLY thing in core that sets it to anything but false is commitRollback() , and that sets it to a value that to me doesn't make much sense to me - the revision we revert to, instead of either the revision we revert *from* (base/physical parent), or at least the *parent* of the revision we revert to (logical parent).
Also, if you want (base revision, changes), you would use $oldid in doEditContent, not $baseRevId. Perhaps it's just WRONG to pass $baseRevId to the hooks called by doEditCOntent, and it should have been $oldid all along? $oldid is what you need if you want to diff against the previous revision - so presumably, that's NOT what $baseRevId is.
If baseRevId is always set to the revision the user started from it would cause problems for that extension for the cases where it was previously false.
"false" means "don't check", I suppose - or "there is no base", but that could be identified by the EDIT_NEW flag.
I'm not proposing to change the cases where baseRevId is false. They can stay as they are. I'm proposing to set baseRevId to the revision the user started with, OR false, so we can detect conflicts safely & sanely.
It would indeed be useful to have a casRevId value that was the current revision at the time of editing just for CAS style conflict detection.
Indeed - but changing the method signature would be painful, and the existing $baseRevId parameter does not seem to be used at all - or at least, it's used in such an inconsistent way as to be useless, of not misleading and harmful.
For now, I propose to just have commitRollback call doEditContent with $baseRevId = false, like the rest of core does. Since core itself doesn't use this value anywhere, and sets it to false everywhere, that seems consistent. We could then just clarify the documentation. This way, Wikibase could use the $baseRevId value for conflict detection - actually, core could, and should, do just that in doEditContent; this wouldn't do anything in core until the $baseRevId is supplied at least by EditPage.
Of course, we need to check FlaggedRevs and other extensions, but seeing how this argument is essentially unused, I can't imagine how this change could break anything for extensions.
-- daniel
On Fri, May 30, 2014 at 4:06 AM, Daniel Kinzler daniel@brightbyte.de wrote:
Am 29.05.2014 21:07, schrieb Aaron Schulz:
Yes it was for auto-reviewing new revisions. New revisions are seen as a combination of (base revision, changes).
But EditPage in core sets $baseRevId to false. The info isn't there for the "standard" case. In fact, the ONLY thing in core that sets it to anything but false is commitRollback() , and that sets it to a value that to me doesn't make much sense to me - the revision we revert to, instead of either the revision we revert *from* (base/physical parent), or at least the *parent* of the revision we revert to (logical parent).
I think you need to look again into how FlaggedRevs uses it, without the preconceptions you're bringing in from the way you first interpreted the name of the variable. The current behavior makes perfect sense for that specific use case. Neither of your proposals would work for FlaggedRevs.
As for the EditPage code path, note that it has already done edit conflict resolution so "base revision = current revision of the page". Which is probably the intended meaning of false.
Of course, we need to check FlaggedRevs and other extensions, but seeing how this argument is essentially unused, I can't imagine how this change could break anything for extensions.
Except FlaggedRevs.
Am 30.05.2014 15:38, schrieb Brad Jorsch (Anomie):
I think you need to look again into how FlaggedRevs uses it, without the preconceptions you're bringing in from the way you first interpreted the name of the variable. The current behavior makes perfect sense for that specific use case. Neither of your proposals would work for FlaggedRevs.
As far as I understand the rather complex FlaggedRevs.hooks.php code, it assumes that
a) if $newRev === $baseRevId, it's a null edit. As far as I can see, this does not work, since $baseRevId will be null for a null edit (and all other regular edits).
b) if $newRev !== $baseRevId but the new rev's hash is the same as the base rev's hash, it's a rollback. This works with the current implementation of commitRollback(), but does not for manual reverts or trivial undos.
So, FlaggedRevs assumes that EditPage resp WikiPage set $baseRevId to the edits logical parent (basically, the revision the user loaded when starting to edit). That's what I described as option (3) in my earlier mail, except for the rollback case; It would be fined with me to use the target rev as the base for rollbacks, as is currently done.
FlaggedRevs.hooks.php also injects a baseRevId form field and uses it in some cases, adding to the confusion.
In order to handle manual reverts and null edits consistently, EditPage should probably have a base revision as a form field, and pass it on to doEditContent. As far as I can tell, this would work with the current code in FlaggedRevs.
As for the EditPage code path, note that it has already done edit conflict resolution so "base revision = current revision of the page". Which is probably the intended meaning of false.
Right. If that's the case though, WikiPage::doEditContent should probably set $baseRevId = $oldid, before passing it to the hooks.
Without changing core, it seems that there is no way to implement a late/strict conflict check based on the base rev id. That would need an additional "anchor revision" for checking.
The easiest solution for the current situation is to simply drop the strict conflict check in Wikibase and accept a race condition that may cause a revision to be silently overwritten, as is currently the case in core.
-- daniel
It looks like we should leave the existing hook parameters values alone for the moment, but it would improve the situation if we renamed variables which seem to be overloaded or unclear, in MediaWiki core and in FlaggedRevs. What do you think of the following conventions,
'oldid' (index.php parameter) -- keep this name only to preserve interface compatibility. This refers to a historical revision when used in the action=view case, and to the latest revision ID of the page at the time an edit session begins.
$oldid -- keep as-is in the action=view codepath, rename to $parentRevId in action=edit
$parentRevId -- latest available revision ID at the time an edit session begins. Used to detect conflicts, and identify the parent revision record upon save. This is updated during successful automatic rebase. I don't see a good use case for preserving what Daniel calls the "reference revision," the parentRevId before rebase.
$baseRevId and $baseId -- rename everywhere to $contentsRevId, but examining the code contexts for the smell of confounding with $parentRevId.
$contentsRevId -- revision ID of the source text to copy when performing undo or rollback. We will probably want to supplement hooks that only passed $contentsRevId, such as NewRevisionFromEditComplete, with $parentRevId as an additional parameter.
A refactor along these lines would keep me from losing already scant marbles as I attempt to fix related issues in core: https://gerrit.wikimedia.org/r/#/c/94584/ , I see now that I've already begun to introduce mistakes caused by the difficult common-sense interpretation of current variable naming.
-Adam
On Mon, Jun 2, 2014 at 1:22 AM, Daniel Kinzler daniel@brightbyte.de wrote:
Am 30.05.2014 15:38, schrieb Brad Jorsch (Anomie):
I think you need to look again into how FlaggedRevs uses it, without the preconceptions you're bringing in from the way you first interpreted the name of the variable. The current behavior makes perfect sense for that specific use case. Neither of your proposals would work for FlaggedRevs.
As far as I understand the rather complex FlaggedRevs.hooks.php code, it assumes that
a) if $newRev === $baseRevId, it's a null edit. As far as I can see, this does not work, since $baseRevId will be null for a null edit (and all other regular edits).
b) if $newRev !== $baseRevId but the new rev's hash is the same as the base rev's hash, it's a rollback. This works with the current implementation of commitRollback(), but does not for manual reverts or trivial undos.
So, FlaggedRevs assumes that EditPage resp WikiPage set $baseRevId to the edits logical parent (basically, the revision the user loaded when starting to edit). That's what I described as option (3) in my earlier mail, except for the rollback case; It would be fined with me to use the target rev as the base for rollbacks, as is currently done.
FlaggedRevs.hooks.php also injects a baseRevId form field and uses it in some cases, adding to the confusion.
In order to handle manual reverts and null edits consistently, EditPage should probably have a base revision as a form field, and pass it on to doEditContent. As far as I can tell, this would work with the current code in FlaggedRevs.
As for the EditPage code path, note that it has already done edit
conflict
resolution so "base revision = current revision of the page". Which is probably the intended meaning of false.
Right. If that's the case though, WikiPage::doEditContent should probably set $baseRevId = $oldid, before passing it to the hooks.
Without changing core, it seems that there is no way to implement a late/strict conflict check based on the base rev id. That would need an additional "anchor revision" for checking.
The easiest solution for the current situation is to simply drop the strict conflict check in Wikibase and accept a race condition that may cause a revision to be silently overwritten, as is currently the case in core.
-- daniel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I suppose that naming scheme is reasonable.
$contentsRevId sounds awkward, maybe $sourceRevId or $originRevId is better.
-- View this message in context: http://wikimedia.7.x6.nabble.com/Unclear-Meaning-of-baseRevId-in-WikiPage-do... Sent from the Wikipedia Developers mailing list archive at Nabble.com.
On Fri, Jun 6, 2014 at 4:08 PM, Aaron Schulz aschulz4587@gmail.com wrote:
I suppose that naming scheme is reasonable.
$contentsRevId sounds awkward, maybe $sourceRevId or $originRevId is better.
What about "rollbackRevId"? I want the variable name to make its purpose very clear.
-Adam
-- View this message in context: http://wikimedia.7.x6.nabble.com/Unclear-Meaning-of-baseRevId-in-WikiPage-do... Sent from the Wikipedia Developers mailing list archive at Nabble.com.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
FlaggedRevs uses the NewRevisionFromEditComplete hook. Grepping for that, I see reasonable values set in the callers at a quick glance. This cover various null edit scenarios too. The $baseRevId in WikiPage is just one of the cases of that value passed to the hook, and is fine there (being mostly false). "false" indeed means "not determined" and that behavior is needed for the hook values. The values given in that hook variable make sense and are more or less consistent.
As I said before, if the NewRevisionFromEditComplete hook is given the same base revision ID values for all cases, then I don't care to much what happens to the $baseRevId value semantics in doEditContent(). As long as everything is changed to keep that part consistent then it won't effect anything. However, just naively change the $baseRevId values for the non-false cases will break the extension using it.
As as side note, FlaggedRevs doesn't just end up using $oldid. It only uses that as the last resort after picking other values in difference scenarios it detects.
-- View this message in context: http://wikimedia.7.x6.nabble.com/Unclear-Meaning-of-baseRevId-in-WikiPage-do... Sent from the Wikipedia Developers mailing list archive at Nabble.com.
wikitech-l@lists.wikimedia.org