On 01/17/2011 03:49 PM, Anthony wrote:
How would you define a particular sentence, paragraph or section of an article? The difficulty of the solution lies in answering that question.
I think the definition could vary, and the functionality could still be useful. The API parameters could be the offset and length in the given article version, just like substr().
A user interface (depending on skin) could input the offset and length by point-and-click (region select) or by pointing at a word and finding the preceding and following blank line. Some user interface might care about sentence separators.
The search could be simplified if each edit preserved some parameters of the diff, an "edit index", e.g. "inserted 7 characters at offset 4711". Then we know that this edit is irrelevant if the sought offset is nowhere near 4711 and as we go back in history, our offset needs to be reduced by 7 if it is larger than 4711. Doing such offset arithmetics for a thousand article edits should be a lot faster than calling diff over and over again. And then again, the diffs are necessary to build such an edit index. This could be done in a one-time conversion or on demand, using the edit index as a cache of such parameters.