What you can do is to run some experiments on the existing dump. How many cases are there where ''''' is hard to resolve? Did anybody count?
It is of course possible to write articles with unbalanced apostrophes. If I write '''hey'' it will render as '<i>hey</i>, and that's also how a conversion program should leave it. How many such user mistakes are there in the current dump?
I can't answer that question for a current dump, but I can answer it for a dump of EN that's about 15 months old (this was done as part of http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Wiki_Syntax ).
The formats and figures are shown below, and I've added examples to show a single line that would cause it to be logged (assuming the rest of the wikitext in that article is well-formed). Basically they're all about _balance_ - if you open a bit of paired syntax, you should close it. Some syntaxes must be closed on the same line (e.g. ''' ), and some must be closed in the same article (e.g. {| ).
Note however that these figures are from only a few months after a previous run (I think - it *was* a while ago), so the figures for now I'm guestimating would be between 2 and 4 times higher - because it's been so long since it was last done, and because there are probably more contributions now, and "wikitext format errors introduced" is probably directly proportional to the number of contributions.
---------------------------------------------- mysql> select format, count(*) as count from malformed_page group by format order by count desc; +-------------------------+-------+ | format | count | +-------------------------+-------+ | '' | 7161 | example: this is a ''test | ''' | 1248 | example: this is a '''test | '' and ''' | 1155 | example: this is a '''test'' | ''' and '' | 1091 | example: this is a ''test''' | ] | 587 | example: this is a] test | [ and ]] | 507 | example: this is a [test]] | ]] | 417 | example: this is a test]] | [[ | 413 | example: this is a [[test | [ | 372 | example: this is [a test | [[ and ] | 347 | example: this is a [[test] | {| | 261 | example: {| (and never close it) | |} | 238 | example: |} (and never open it) | --> | 67 | example: <!-- blah --> --> | <div> | 60 | example: <div> <div> blah </div> | {{ | 46 | example: {{ {{delete}} | <!-- | 43 | example: <!-- <!-- blah --> | </div> | 39 | example: <div> blah </div> </div> | }} | 34 | example: {{delete}} }} | ]] and [[ | 24 | example: this is a ]]test[[ | == and === | 20 | example: ==heading=== | ] and [ | 14 | example: this is a ]test[ | [[image: | 11 | example: [[image: [[image:test.gif]] | === and == | 8 | example: ===heading== | '' and [[ | 5 | example: this ''is a [[test | [ and '' | 5 | example: this [is a ''test | '' and ]] | 5 | example: this ''is a]] test | <code> | 5 | example: <code> <code> for i=1 </code> | </pre> | 4 | example: <pre> for i=1 </pre> </pre> | </nowiki> | 4 | example: <nowiki> for i=1 </nowiki> </nowiki> | '' and ] | 3 | etc .... | ]] and ''' | 3 | | ]] and '' | 3 | | ] and '' | 2 | | </math> | 2 | | [[ and '' | 2 | | '' and [[ and ] | 2 | | </code> | 2 | | === and ==== | 2 | | '' and ''' and ] | 2 | | [ and ''' | 1 | | ]] and '' and ''' | 1 | | [[ and ] and [ | 1 | | [ and ]] and '' | 1 | | ''' and [[ | 1 | | <math> | 1 | | ==== and === | 1 | | ]] and [[ and '' | 1 | | ] and [ and '' | 1 | | ''' and '' and ]] | 1 | | ''' and '' and [[ and [ | 1 | | ''' and ]] | 1 | | ]] and ] and ''' and '' | 1 | | [ and '' and ''' | 1 | +-------------------------+-------+ 53 rows in set (0.57 sec)
mysql> ----------------------------------------------
Note: ''''' was treated as ''' + '' (rather than as a separate category), so it will be mixed in with the above figures for ''' and ''.
Perhaps somebody is already running a robot to find and fix such errors?
Not that I'm aware of - humans were better at it anyway, because some of the above were false positives (e.g. some math formulas), and the ''' and '' & ''' and '' tests had lots of false positives. If a robot went around blindly automatically fixing these, it'd be banned for vandalism. However, some automated approach could be good, as it's an ongoing problem with no real closure (like sorting the mail), so people eventually have enough of doing it (as I did), and move onto other stuff.
All the best, Nick.