Right now, MediaWiki has 2 pure-PHP engines to produce diffs (there's also a native PHP extension wikidiff2, but we're not discussing it right now): * DairikiDiff is what everybody uses, and * Wikidiff3, and alternative implementation by Guy Van den Broeck that was around for 8 years but required a configuration change While less battle-tested, Wikidiff3 offers vastly improved performance on heavy diffs compared to DairikiDiff. The price, however, is that it makes certain shortcuts if the diff is too complex. I ran through 100K diffs from English Wikipedia, and 6% of diffs were different. Lots of changes were seemingly insignificant but I need your help with determining if it's really so.
I've built this tool https://diff-forge.wmflabs.org/wiki/Special:DiffCompare[1] to facilitate the comparison. It displays two diffs from different algorithms side by side (yeah, it can get too wide, I know:P). Which of them is which is random. Parts with differences between the implementations are highlighted in yellow. Below is the diff of differences for the reference. You can vote with buttons above the diffs, no registration is required. If you see a catastrophically bad diff please send me the link.
Unless the results are significantly worse, I'd like to go ahead and make wikidiff3 the only implementation.
[1] https://diff-forge.wmflabs.org/wiki/Special:DiffCompare
Max Semenik wrote:
Right now, MediaWiki has 2 pure-PHP engines to produce diffs (there's also a native PHP extension wikidiff2, but we're not discussing it right now):
- DairikiDiff is what everybody uses, and
- Wikidiff3, and alternative implementation by Guy Van den Broeck that was
around for 8 years but required a configuration change While less battle-tested, Wikidiff3 offers vastly improved performance on heavy diffs compared to DairikiDiff. The price, however, is that it makes certain shortcuts if the diff is too complex. I ran through 100K diffs from English Wikipedia, and 6% of diffs were different. Lots of changes were seemingly insignificant but I need your help with determining if it's really so.
Is there a related Phabricator Maniphest task about this? I'm not sure I understand the motivation for making a switch. I would think that heavy diffs are a very small portion of traffic.
MZMcBride
On Fri, 2016-04-15 at 21:00 -0500, MZMcBride wrote:
Max Semenik wrote:
Right now, MediaWiki has 2 pure-PHP engines to produce diffs (there's also a native PHP extension wikidiff2, but we're not discussing it right now):
- DairikiDiff is what everybody uses, and
- Wikidiff3, and alternative implementation by Guy Van den Broeck that was
around for 8 years but required a configuration change While less battle-tested, Wikidiff3 offers vastly improved performance on heavy diffs compared to DairikiDiff. The price, however, is that it makes certain shortcuts if the diff is too complex. I ran through 100K diffs from English Wikipedia, and 6% of diffs were different. Lots of changes were seemingly insignificant but I need your help with determining if it's really so.
Is there a related Phabricator Maniphest task about this? I'm not sure I understand the motivation for making a switch. I would think that heavy diffs are a very small portion of traffic.
https://phabricator.wikimedia.org/T128896 looks related.
andre
On Friday, April 15, 2016, MZMcBride z@mzmcbride.com wrote:
Max Semenik wrote:
Right now, MediaWiki has 2 pure-PHP engines to produce diffs (there's also a native PHP extension wikidiff2, but we're not discussing it right now):
- DairikiDiff is what everybody uses, and
- Wikidiff3, and alternative implementation by Guy Van den Broeck that was
around for 8 years but required a configuration change While less battle-tested, Wikidiff3 offers vastly improved performance on heavy diffs compared to DairikiDiff. The price, however, is that it makes certain shortcuts if the diff is too complex. I ran through 100K diffs from English Wikipedia, and 6% of diffs were different. Lots of changes were seemingly insignificant but I need your help with determining if it's really so.
Is there a related Phabricator Maniphest task about this? I'm not sure I understand the motivation for making a switch. I would think that heavy diffs are a very small portion of traffic.
MZMcBride
I think optimizing the worst case performance makes sense, especially if we dont really lose anything in doing so.
To clarify, this is just for third parties, right? Wmf uses wikidiff2.
-- -bawolff
Le 16/04/2016 04:00, MZMcBride a écrit :
Is there a related Phabricator Maniphest task about this? I'm not sure I understand the motivation for making a switch. I would think that heavy diffs are a very small portion of traffic.
An intensive would be for MediaWiki core to only have a single diff system instead of two.
For the historic part, wikidiff3 got introduced in August 2008:
https://www.mediawiki.org/wiki/Special:Code/MediaWiki/38653 commit e45cf2b8
All right, votes indicate that wikidiff3 is even better in quality, so here we go: https://gerrit.wikimedia.org/r/#/c/284003/ removes DairikiDiff. After it's merged, I plan to refactor this area further and work on improving diff quality now that we'll have 2 places to make changes instead of 3.
On Mon, Apr 18, 2016 at 1:56 PM, Antoine Musso hashar+wmf@free.fr wrote:
Le 16/04/2016 04:00, MZMcBride a écrit :
Is there a related Phabricator Maniphest task about this? I'm not sure I understand the motivation for making a switch. I would think that heavy diffs are a very small portion of traffic.
An intensive would be for MediaWiki core to only have a single diff system instead of two.
For the historic part, wikidiff3 got introduced in August 2008:
https://www.mediawiki.org/wiki/Special:Code/MediaWiki/38653 commit e45cf2b8
-- Antoine "hashar" Musso
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Cool!
It's entirely possible that the shortcuts it takes correspond to what makes a more cohesive thing to present to the user - the same shortcuts in the diff implementation are what we want in the front-end when looking for meaningful changes anyway.
I mean, this is just random speculation, but it would be interesting if this is indeed the case.
On 21/04/16 20:53, Max Semenik wrote:
All right, votes indicate that wikidiff3 is even better in quality, so here we go: https://gerrit.wikimedia.org/r/#/c/284003/ removes DairikiDiff. After it's merged, I plan to refactor this area further and work on improving diff quality now that we'll have 2 places to make changes instead of 3.
On Mon, Apr 18, 2016 at 1:56 PM, Antoine Musso hashar+wmf@free.fr wrote:
Le 16/04/2016 04:00, MZMcBride a écrit :
Is there a related Phabricator Maniphest task about this? I'm not sure I understand the motivation for making a switch. I would think that heavy diffs are a very small portion of traffic.
An intensive would be for MediaWiki core to only have a single diff system instead of two.
For the historic part, wikidiff3 got introduced in August 2008:
https://www.mediawiki.org/wiki/Special:Code/MediaWiki/38653 commit e45cf2b8
-- Antoine "hashar" Musso
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org