On Tue, Jan 25, 2011 at 10:27 AM, Platonides Platonides@gmail.com wrote:
Had LST used <section name=foo> </section> to mark sections, instead of <section begin=foo />content<section end=foo />, it would be as easy as traversing the preprocessor output, which would already have the sections splitted.
It was done this way in order to allow overlapping sections: LST was created so arbitrary parts of a document on Wikisource can be quoted while retaining a direct link to the original document as it continues to be edited.
Basically, the section markers are permanent markers for the source of a copy-and-paste operation. One person might be copying from paragraph 1 to paragraph 4; another might copy from paragraph 3 to paragraph 5; your page structure looks like this:
[page] [section-open 1/] [para 1/] <!-- in section 1 only --> [para 2/] <!-- in section 1 only --> [section-open 2/] [para 3/] <!-- in both section 1 and 2 --> [para 4/] <!-- in both section 1 and 2 --> [section-close 1/] [para 5/] <!-- in section 2 only --> [section-close 2/] [/page]
Since the LST sections overlap, they don't really fit well in the hierarchical structures that the preprocessor deals in except as standalone start/end markers.
*BUT* ... it's probably possible to actually redo things to use that above structure in a sensible way, instead of doing text regexes:
iterate through the node tree: if found desired section start node: start saving our spot if found desired section end node: if start node was at same level: grab everything in between RETURN that to upstream parser else: find the closed common parent node of start and end build a node tree that has the parts of the start's parent before the start trimmed, and the parts of the end's parent after the end trimmed RETURN that to upstream parser
One could also pull the markers out of the original text and store them as separate metadata in some way, which seems to be part of the suggestions earlier in thread. The main problem here is that we could easily end up losing track of the markers during editing; we have no persistent identity for pieces of text, so if there's not a visible node in there for editors to move & copy along with their alterations, they not be able to persist automatically.
-- brion