[Wikitext-l] Refactoring in progress with parserTests

Wed Dec 28 04:45:53 UTC 2011

I pulled out most of the parser-y parts from the parserTests, leaving 
behind just tests.

However, the parser is still a bit of a monster object, hence the 
deliberately silly name, ParserThingy.

I'm trying to decompose it into a chain, roughly like:

1. wikiText -> tokenize -> tokens

2. tokens -> treeBuilder -> dom-tree

3. dom-tree -> serialize -> wikiDom or HTML

There's a "postprocess" step as well, but I think that makes sense as 
part of step 2.

Each step should be individually testable. And the whole enchilada is 
like a big composition of initialized objects, e.g.

     wikiTextToWikiDom = new wikiTextToSerialization(
         new wikiTextTokenizer(tokenConfig),
         new domTreeBuilder(treeConfig),
         new domTreeToWikiDom(serializationConfig)
     );

     var wikiDom = wikiTextToWikiDom( wikitext );

Just a query on what interfaces people would like:

I'm assuming exceptions are not a good idea, due to Node's async nature 
and there are certain constructs where we are explicitly async -- 
tokenizing can be streamed, and I assume when we start doing lookups to 
figure out what to do with templates we'll need to go async for seconds 
at a time.

I'm also assuming that 99.99% of the time we want a simple in-out 
interface as described above. But for testing and debugging, we want to 
instrument what's really going on.  And we may want to pass control off 
for a while when we bring template parsing into the mix. So that means 
that either there are magic values, or there's some way to attach event 
listeners to the serializer? Is it okay to attach event listeners to the 
serializer without tying them to a specific pipeline of wikitext that's 
finding its way through the code?

-- 
Neil Kandalgaonkar   ) <neilk at wikimedia.org>