Does anyone have a requirements document for the Wikipedia parser?
If not, will those programmers who have already begun work on such a
parser, like Magnus and Frithjof, please send me any scraps of
documentation you have?
I would like to assemble these into a wiki grammar or something like
that. So we can help each other with parser development.
I guess the list of "stupid parser tricks" would start with bracket
notation for links:
[
http://www.edpoor.com] is a link to my outdated, static website
[
http://www.edpoor.com/images/Ae-inAndDog.jpg girl with dog] is an
annotated link
[[Iraq]] links to the Wikipedia article on Iraq
[[Iraq|Rummyland]] links to Iraq but is shown as "Rummyland" (a
Doonesbury reference, okay? ;-)
Etc.
Along with parsing rules for the rendering of text, is the problem of
fetching and posting files. That is, coordinating each user's off-line
stash (cache?) with the database. Note that some users might not want
the entire encyclopedia, but perhaps only those articles they're working
on. Or articles one click away?
Ed Poor