Roan Kattouw wrote:
2009/4/22 Kent Wang kwang@kwang.org:
I'm building an application that uses DifferenceEngine.php to generate word level unified diffs. I've figured out how to do this but now need to generate patches given the diff.
It's not in MediaWiki, and I don't know if it's in PHP, but there's a very widespread command line program installed on virtually every UNIX/Linux system that can do this. Unsurprisingly, it's called "patch".
The problem is that diff and patch do line-level diffs, and he wants to do it on the word level.
Of course, a possible workaround would be to reversibly transform the files such that every word (or other token) ends up on a separate line. Since the transformed version doesn't really have to be readable, you could, say, URL-encode every token. Then you'd just have to figure out how to correspondingly transform your diff so that it can be applied to the transformed files by patch.
Of course, it's not that hard to apply a patch by hand either: a diff is essentially just a list of straightforward intructions of the form "delete these lines/tokens, insert these in their place". In general, you just first tokenize the file you're patching, and then loop over the diff applying the changes to the list of tokens.
This works just fine as long as the patch applies exactly. Much of the complexity in the patch utility is involved in "fuzzy matching", which allows it to apply patches even if the target file isn't quite identical to the one the diff was generated against, by using the context information in the diff to adjust the offsets. For some purposes, this feature isn't particularly important or useful; for others, it's vital.