Platonides platonides@gmail.com wrote:
Any pointers to things that I overlooked? Thoughts on in- terfaces & Co.? Volunteers? :-)
It's a bit hard for me to understand what your tool does, since it gives a blank page when English is selected, and it takes the html source instead of the wiki source.
Ah! Didn't notice that. It works (solely) on the wiki source, though.
I get that you look for two kind of bugs: "wiki text errors" (like an unclosed tag) and "wikipedia errors" (the date doesn't conform to the manual of style). [...]
It does mostly the latter, but I'm not looking for some grammar to define an article complying with a manual of style, but for a parser to parse wikitext.
[...] I have dealt with the parser a bit (see bug 18765) and I don't think we could make some things remotely sane as they are handled at completely different steps. But linting completely insane ones shouldn't be too hard. :)
On the other hand, going into the Parser is probably quite far from what you expected when wanting to leave your ugly mess of regexes. Also, I may have misunderstood your position and it may not be appropiate for your lint expectations.
I think so :-). My use case with wikilint and some other tools is:
- Are there more than one and less than x images per arti- cle? - Is there more than one link to another article? - Are there links in a "See also" section that have already appeared in the article? - If there are "Main article:" links, do they appear direct- ly following a section heading indented and italic? - Does the {{Personendaten}} data have a fuzzy relationship to the introductory line of the article?
To address these, I'd like to parse the wiki source from a concatenation of characters to a logical structure. The MediaWiki parser does not seem to care for that, so I have not looked further into that (and don't plan to do so).
So, to emphasize: I'm looking for *a* parser, that's a lowercase "p".
Tim