On 20 November 2014 09:30, Gergo Tisza gtisza@wikimedia.org wrote:
On Mon, Nov 17, 2014 at 11:03 AM, James Forrester < jforrester@wikimedia.org> wrote:
Moving to character-level rather than paragraph-level diffing might
help
here, potentially. I vaguely remember that we attempted that and abandoned
it because it caused more issues than it solved back in ?2004, though.
A paragraph-level diff means that you only get an edit conflict if two people change the same paragraph. A character-level diff would mean, then, that you only get a conflict if they change the same character? That sounds a bit excessive. (Stupid example: if I change "sixty-three" to "sixty-five" and someone else changes it to "seventy-three", that should probably be a conflict, but a character-level diff would happily merge them into "seventy-five".)
Sure, but wikitext "paragraphs" are significantly more extensive and diverse than the NLP concept; to give an example:
Original wikitext:
There are six [[alpaca]] shearers on [[Sunningdale Acers|the farm]].
My changes:
There are six [[*Alpaca fiber|*alpaca]] shearers on [[Sunningdale Acr*e*s|the farm]].
Their changes:
There are six [[alpaca]] shearers on [[Sunningdale Acers|the farm*stead* ]].
Merging these two changes requires character-level merging (or something that natively understand wikitext at a subtle level. The first change would go through as a word-level diff (but not at sentence-level); the second wouldn't go through even then. Of course, we could prompt people to review the diff after saving if we're auto-merging, but that might be something we should be doing even now?
Another low-hanging fruit would be to special-case the situation when editor A adds text to the end of a section but does not start a new section, while editor B adds a new section to the same place. This is currently a conflict as they both try to insert to the same "slot" between paragraphs, so a generic merge tool cannot figure out whether those additions conflict and what would be the right order if they don't; however, knowing the semantics of wikitext, inserting the text from A first and the one from B after that seems a pretty safe bet. This kind of conflict is very typical on talk pages where people almost always edit the end of a section, and the few "hot topic" sections get the majority of the edits.
That seems like a sensible idea. Filed: https://bugzilla.wikimedia.org/show_bug.cgi?id=73667
(Of course, using unstructured wikitext for talk pages is a bad thing in general, but that's a long-term problem, and this kind of edit conflict could be prevented quickly.)
Indeed!
J.