I'm building an application that uses DifferenceEngine.php to generate word level unified diffs. I've figured out how to do this but now need to generate patches given the diff.
This is what I have written to generate the diff:
$orig = array('One Two Three', 'One Two Three'); $closing = array('One Two Three', 'One Two Four');
$diff = new WordLevelDiff($orig, $closing); $formatter = new UnifiedDiffFormatter(); echo $formatter->format($diff);
Which returns:
@@ -5,3 +5,3 @@ One Two - Three + Four
So my application will store this and when it comes to time to patch that diff, I need a function that will do that, i.e. given the diff string above and $orig, it should generate $closing.
Does such a patch functionality exist in MediaWiki, or anywhere else?
I'm using PHP and am aware of the xdiff extension but it doesn't support word-level-diff, only line-level. And I can't install it anyway.
2009/4/22 Kent Wang kwang@kwang.org:
I'm building an application that uses DifferenceEngine.php to generate word level unified diffs. I've figured out how to do this but now need to generate patches given the diff.
This is what I have written to generate the diff:
$orig = array('One Two Three', 'One Two Three'); $closing = array('One Two Three', 'One Two Four');
$diff = new WordLevelDiff($orig, $closing); $formatter = new UnifiedDiffFormatter(); echo $formatter->format($diff);
Which returns:
@@ -5,3 +5,3 @@ One Two
- Three
- Four
So my application will store this and when it comes to time to patch that diff, I need a function that will do that, i.e. given the diff string above and $orig, it should generate $closing.
Does such a patch functionality exist in MediaWiki, or anywhere else?
It's not in MediaWiki, and I don't know if it's in PHP, but there's a very widespread command line program installed on virtually every UNIX/Linux system that can do this. Unsurprisingly, it's called "patch". You should be able to create two temporary files, say tmp1 and tmp2, put $orig in tmp1 and the unified diff in tmp2, then run "patch tmp1 tmp2", and read the contents of tmp1 into $closing (don't forget to delete the temp files, of course).
Roan Kattouw (Catrope)
On Wed, Apr 22, 2009 at 10:36 AM, Roan Kattouw roan.kattouw@gmail.comwrote:
It's not in MediaWiki, and I don't know if it's in PHP, but there's a very widespread command line program installed on virtually every UNIX/Linux system that can do this. Unsurprisingly, it's called "patch". You should be able to create two temporary files, say tmp1 and tmp2, put $orig in tmp1 and the unified diff in tmp2, then run "patch tmp1 tmp2", and read the contents of tmp1 into $closing (don't forget to delete the temp files, of course).
He searched for word-based, not line-based diff. I dunno if GNU patch supports 'em, at least I didn't spot it in the manpage.
Marco
Roan Kattouw wrote:
2009/4/22 Kent Wang kwang@kwang.org:
I'm building an application that uses DifferenceEngine.php to generate word level unified diffs. I've figured out how to do this but now need to generate patches given the diff.
It's not in MediaWiki, and I don't know if it's in PHP, but there's a very widespread command line program installed on virtually every UNIX/Linux system that can do this. Unsurprisingly, it's called "patch".
The problem is that diff and patch do line-level diffs, and he wants to do it on the word level.
Of course, a possible workaround would be to reversibly transform the files such that every word (or other token) ends up on a separate line. Since the transformed version doesn't really have to be readable, you could, say, URL-encode every token. Then you'd just have to figure out how to correspondingly transform your diff so that it can be applied to the transformed files by patch.
Of course, it's not that hard to apply a patch by hand either: a diff is essentially just a list of straightforward intructions of the form "delete these lines/tokens, insert these in their place". In general, you just first tokenize the file you're patching, and then loop over the diff applying the changes to the list of tokens.
This works just fine as long as the patch applies exactly. Much of the complexity in the patch utility is involved in "fuzzy matching", which allows it to apply patches even if the target file isn't quite identical to the one the diff was generated against, by using the context information in the diff to adjust the offsets. For some purposes, this feature isn't particularly important or useful; for others, it's vital.
wikitech-l@lists.wikimedia.org