Hi,
I think the HTML diff page I've been developing for the Google Summer of Code is ready to be tested as an experimental feature. You enable it by setting $wgEnableHtmlDiff to true in r39564. What you'll see is a rendered version of the diff page with indications where words were added or removed. Image edits are supported too. Words that got a different style are underlined and you get an English (only, for now) explanation of what happened.
The interface is pretty basic and needs work. I'm not very good with cross browser stuff though. I can provide meta data in the HTML such as descriptions, id's, pointers to the previous and next change, etc. Usability can be enhanced by adding links that take you to the first or last change on the page, tool tips that open when clicking a change, or keyboard shortcuts that scroll through the changes. Help is appreciated in this department.
I spent a lot of time optimizing the code (include/HTMLDiff.php) for speed which makes the code less readable but performance is an issue. PHP is not my native tongue and the code would probably run faster if an expert took a look at it. I think the performance is pretty decent as it is (what do you expect from code that needs to parse 2 pages, diff every single word and keep everything in memory). The algorithm will probably choke on big pages (set your available memory high!).
Huge changes make the page look messy but that can't be avoided. In my biased opinion the results look very good for reasonably sized pages and versions that are not too distant.
So here is where your feedback and bug reports kick in.
Cheers,
Guy
Guy Van den Broeck schreef:
Hi,
I think the HTML diff page I've been developing for the Google Summer of Code is ready to be tested as an experimental feature. You enable it by setting $wgEnableHtmlDiff to true in r39564. What you'll see is a rendered version of the diff page with indications where words were added or removed. Image edits are supported too. Words that got a different style are underlined and you get an English (only, for now) explanation of what happened.
The interface is pretty basic and needs work. I'm not very good with cross browser stuff though. I can provide meta data in the HTML such as descriptions, id's, pointers to the previous and next change, etc. Usability can be enhanced by adding links that take you to the first or last change on the page, tool tips that open when clicking a change, or keyboard shortcuts that scroll through the changes. Help is appreciated in this department.
I spent a lot of time optimizing the code (include/HTMLDiff.php) for speed which makes the code less readable but performance is an issue. PHP is not my native tongue and the code would probably run faster if an expert took a look at it. I think the performance is pretty decent as it is (what do you expect from code that needs to parse 2 pages, diff every single word and keep everything in memory). The algorithm will probably choke on big pages (set your available memory high!).
I cleaned up the code a bit in r39585. I rewrote two loops, so that may influence performance (haven't done any tests or benchmarks). In the optimization department I can't really help you with more than these generic tips: * Put wfProfileIn() and wfProfileOut() calls all over the place and do some profiling to see which functions are bottlenecks * If you're foreach()ing large arrays somewhere, try to use references: foreach($arr as $key => &$value) instead of foreach($arr as $key => $value) The latter makes a copy of $arr whereas the former doesn't. The former also allows you to change $value.
I'll start experimenting with HTMLDiff on my wiki now, input will follow.
Roan Kattouw (Catrope)
2008/8/18 Roan Kattouw roan.kattouw@home.nl:
Guy Van den Broeck schreef:
Hi,
I think the HTML diff page I've been developing for the Google Summer of Code is ready to be tested as an experimental feature. You enable it by setting $wgEnableHtmlDiff to true in r39564. What you'll see is a rendered version of the diff page with indications where words were added or removed. Image edits are supported too. Words that got a different style are underlined and you get an English (only, for now) explanation of what happened.
The interface is pretty basic and needs work. I'm not very good with cross browser stuff though. I can provide meta data in the HTML such as descriptions, id's, pointers to the previous and next change, etc. Usability can be enhanced by adding links that take you to the first or last change on the page, tool tips that open when clicking a change, or keyboard shortcuts that scroll through the changes. Help is appreciated in this department.
I spent a lot of time optimizing the code (include/HTMLDiff.php) for speed which makes the code less readable but performance is an issue. PHP is not my native tongue and the code would probably run faster if an expert took a look at it. I think the performance is pretty decent as it is (what do you expect from code that needs to parse 2 pages, diff every single word and keep everything in memory). The algorithm will probably choke on big pages (set your available memory high!).
I cleaned up the code a bit in r39585. I rewrote two loops, so that may influence performance (haven't done any tests or benchmarks). In the optimization department I can't really help you with more than these generic tips:
- Put wfProfileIn() and wfProfileOut() calls all over the place and do
some profiling to see which functions are bottlenecks
My experience is that wfProfile gives too much overhead for the diff code. There are just too many nested loops and the function call is pretty expensive. I use the XDEBUG profiler. I assume it is at least as accurate as wfProfile.
- If you're foreach()ing large arrays somewhere, try to use references: foreach($arr as $key => &$value) instead of foreach($arr as $key =>
$value) The latter makes a copy of $arr whereas the former doesn't. The former also allows you to change $value.
Doesn't make a significant difference here, added it anyway.
I'll start experimenting with HTMLDiff on my wiki now, input will follow.
Great! Is your wiki publicly available? I don't have a public test server of my own.
Roan Kattouw (Catrope)
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Mon, Aug 18, 2008 at 9:56 AM, Roan Kattouw roan.kattouw@home.nl wrote:
- If you're foreach()ing large arrays somewhere, try to use references: foreach($arr as $key => &$value) instead of foreach($arr as $key =>
$value) The latter makes a copy of $arr whereas the former doesn't. The former also allows you to change $value.
I looked this syntax up, and found this interestingly braindead gotcha that can occur when you do this. It's PHP bug 29992, of course marked BOGUS: http://bugs.php.net/bug.php?id=29992. Test case:
<?php $array = array( 1, 2, 3 ); foreach( $array as &$item ); foreach( $array as $item ); print_r( $array );
Outputs
Array ( [0] => 1 [1] => 2 [2] => 2 )
Clever, huh? Naturally it can't be changed because "people might use this for some weird reason". It's probably a good idea to either not use this syntax, or make sure you unset the variable after the loop:
foreach( $array as &$item ) { ... } unset( $item );
Aryeh Gregor schreef:
On Mon, Aug 18, 2008 at 9:56 AM, Roan Kattouw roan.kattouw@home.nl wrote:
- If you're foreach()ing large arrays somewhere, try to use references: foreach($arr as $key => &$value) instead of foreach($arr as $key =>
$value) The latter makes a copy of $arr whereas the former doesn't. The former also allows you to change $value.
I looked this syntax up, and found this interestingly braindead gotcha that can occur when you do this. It's PHP bug 29992, of course marked BOGUS: http://bugs.php.net/bug.php?id=29992. Test case:
<?php $array = array( 1, 2, 3 ); foreach( $array as &$item ); foreach( $array as $item ); print_r( $array );
Outputs
Array ( [0] => 1 [1] => 2 [2] => 2 )
Clever, huh? Naturally it can't be changed because "people might use this for some weird reason". It's probably a good idea to either not use this syntax, or make sure you unset the variable after the loop:
foreach( $array as &$item ) { ... } unset( $item );
I know about this. But in my understanding, this bug can only occur when you mix non-referenced foreach() loops with referenced ones. If you use a reference at *every* loop (which is what Guy did), you should be fine.
Roan Kattouw (Catrope)
Is anyone actually interested in this feature? My GSoC is officially over now and I need to decide how much work I want to put in to the HTML differ voluntarily.
2008/8/18 Guy Van den Broeck guyvdb@gmail.com:
Hi,
I think the HTML diff page I've been developing for the Google Summer of Code is ready to be tested as an experimental feature. You enable it by setting $wgEnableHtmlDiff to true in r39564. What you'll see is a rendered version of the diff page with indications where words were added or removed. Image edits are supported too. Words that got a different style are underlined and you get an English (only, for now) explanation of what happened.
The interface is pretty basic and needs work. I'm not very good with cross browser stuff though. I can provide meta data in the HTML such as descriptions, id's, pointers to the previous and next change, etc. Usability can be enhanced by adding links that take you to the first or last change on the page, tool tips that open when clicking a change, or keyboard shortcuts that scroll through the changes. Help is appreciated in this department.
I spent a lot of time optimizing the code (include/HTMLDiff.php) for speed which makes the code less readable but performance is an issue. PHP is not my native tongue and the code would probably run faster if an expert took a look at it. I think the performance is pretty decent as it is (what do you expect from code that needs to parse 2 pages, diff every single word and keep everything in memory). The algorithm will probably choke on big pages (set your available memory high!).
Huge changes make the page look messy but that can't be avoided. In my biased opinion the results look very good for reasonably sized pages and versions that are not too distant.
So here is where your feedback and bug reports kick in.
Cheers,
Guy
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Guy Van den Broeck wrote:
Is anyone actually interested in this feature? My GSoC is officially over now and I need to decide how much work I want to put in to the HTML differ voluntarily.
Well, I am! :)
It could make for much more legible RSS feeds, for example.
- -- brion
Brion Vibber schreef:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Guy Van den Broeck wrote:
Is anyone actually interested in this feature? My GSoC is officially over now and I need to decide how much work I want to put in to the HTML differ voluntarily.
Well, I am! :)
It could make for much more legible RSS feeds, for example.
Maybe someone with a public test wiki could enable this? Maybe test.wikipedia.org could?
Roan Kattouw (Catrope)
I did for sandbox.wiki-tools.com, however the whole thing looks utterly broken to me. I've disabled it since it basically breaks my ability to diff things in the sandbox.
~Daniel Friesen(Dantman, Nadir-Seen-Fire) of: -The Nadir-Point Group (http://nadir-point.com) --It's Wiki-Tools subgroup (http://wiki-tools.com) --The ElectronicMe project (http://electronic-me.org) --Games-G.P.S. (http://ggps.org) -And Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) --Animepedia (http://anime.wikia.com) --Narutopedia (http://naruto.wikia.com)
Roan Kattouw wrote:
Brion Vibber schreef:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Guy Van den Broeck wrote:
Is anyone actually interested in this feature? My GSoC is officially over now and I need to decide how much work I want to put in to the HTML differ voluntarily.
Well, I am! :)
It could make for much more legible RSS feeds, for example.
Maybe someone with a public test wiki could enable this? Maybe test.wikipedia.org could?
Roan Kattouw (Catrope)
Can you elaborate one what you think is "utterly broken"?
2008/8/22 Daniel Friesen dan_the_man@telus.net:
I did for sandbox.wiki-tools.com, however the whole thing looks utterly broken to me. I've disabled it since it basically breaks my ability to diff things in the sandbox.
~Daniel Friesen(Dantman, Nadir-Seen-Fire) of: -The Nadir-Point Group (http://nadir-point.com) --It's Wiki-Tools subgroup (http://wiki-tools.com) --The ElectronicMe project (http://electronic-me.org) --Games-G.P.S. (http://ggps.org) -And Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) --Animepedia (http://anime.wikia.com) --Narutopedia (http://naruto.wikia.com)
Roan Kattouw wrote:
Brion Vibber schreef:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Guy Van den Broeck wrote:
Is anyone actually interested in this feature? My GSoC is officially over now and I need to decide how much work I want to put in to the HTML differ voluntarily.
Well, I am! :)
It could make for much more legible RSS feeds, for example.
Maybe someone with a public test wiki could enable this? Maybe test.wikipedia.org could?
Roan Kattouw (Catrope)
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Everything in a technical form. If you look at my sandbox, you'll notice I mainly use it for looking at technical bits of the parser. No real articles there. Basically I'm saying that you can't get any useful output out of it for anything but a text article.
~Daniel Friesen(Dantman, Nadir-Seen-Fire) of: -The Nadir-Point Group (http://nadir-point.com) --It's Wiki-Tools subgroup (http://wiki-tools.com) --The ElectronicMe project (http://electronic-me.org) --Games-G.P.S. (http://ggps.org) -And Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) --Animepedia (http://anime.wikia.com) --Narutopedia (http://naruto.wikia.com)
Guy Van den Broeck wrote:
Can you elaborate one what you think is "utterly broken"?
2008/8/22 Daniel Friesen dan_the_man@telus.net:
I did for sandbox.wiki-tools.com, however the whole thing looks utterly broken to me. I've disabled it since it basically breaks my ability to diff things in the sandbox.
~Daniel Friesen(Dantman, Nadir-Seen-Fire) of: -The Nadir-Point Group (http://nadir-point.com) --It's Wiki-Tools subgroup (http://wiki-tools.com) --The ElectronicMe project (http://electronic-me.org) --Games-G.P.S. (http://ggps.org) -And Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) --Animepedia (http://anime.wikia.com) --Narutopedia (http://naruto.wikia.com)
Roan Kattouw wrote:
Brion Vibber schreef:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Guy Van den Broeck wrote:
Is anyone actually interested in this feature? My GSoC is officially over now and I need to decide how much work I want to put in to the HTML differ voluntarily.
Well, I am! :)
It could make for much more legible RSS feeds, for example.
Maybe someone with a public test wiki could enable this? Maybe test.wikipedia.org could?
Roan Kattouw (Catrope)
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org