On Mon, Nov 17, 2014 at 11:03 AM, James Forrester jforrester@wikimedia.org wrote:
Moving to character-level rather than paragraph-level diffing might help
here, potentially. I vaguely remember that we attempted that and abandoned
it because it caused more issues than it solved back in ?2004, though.
A paragraph-level diff means that you only get an edit conflict if two people change the same paragraph. A character-level diff would mean, then, that you only get a conflict if they change the same character? That sounds a bit excessive. (Stupid example: if I change "sixty-three" to "sixty-five" and someone else changes it to "seventy-three", that should probably be a conflict, but a character-level diff would happily merge them into "seventy-five".) A sentence-level diff would make much more sentence, except breaking text to sentences is a less trivial task than breaking it to paragraphs (lines). It is a very fundamental step in natural language processing though, so I am sure mature algorithms and tools exist for it, we just would have to research them.
Another low-hanging fruit would be to special-case the situation when editor A adds text to the end of a section but does not start a new section, while editor B adds a new section to the same place. This is currently a conflict as they both try to insert to the same "slot" between paragraphs, so a generic merge tool cannot figure out whether those additions conflict and what would be the right order if they don't; however, knowing the semantics of wikitext, inserting the text from A first and the one from B after that seems a pretty safe bet. This kind of conflict is very typical on talk pages where people almost always edit the end of a section, and the few "hot topic" sections get the majority of the edits. (Of course, using unstructured wikitext for talk pages is a bad thing in general, but that's a long-term problem, and this kind of edit conflict could be prevented quickly.)