[Mediawiki-l] Wikitext grammar

Magnus Manske magnusmanske at googlemail.com
Sat Aug 7 18:19:09 UTC 2010


So why not use the "real" parser?

* Get rendered HTML page
* Extract <div id="bodyContent">
* Take the first <p> element in there

Profit!


Magnus

On Sat, Aug 7, 2010 at 6:19 PM, Brian J Mingus
<brian.mingus at colorado.edu> wrote:
> On Sat, Aug 7, 2010 at 10:54 AM, lmhelp <lmbox at wanadoo.fr> wrote:
>
>>
>> Hi,
>>
>> Thank you for your answer.
>>
>> > mwlib is the best parser available for folks who want to do a quick job
>> > such
>> > as yours.
>>
>> Maybe it is, I don't know...
>> I know (since recently) it is not an easy task constructing a parser for
>> "Wikitext"...
>> but, fairly, it is not really satisfactory to have {{lang}},
>> {{formatnum:1401}}
>> left in the generated "HTML" code, is it (I mean... given the fact that it
>> never
>> happens with "Wikipedia").
>>
>>
> mwlib was written in conjunction with the WMF, and IIRC had at least some
> input from Brion Vibber. It's high quality and works well. There is a 2-3
> hour learning curve for navigating the python modules and methods using dir
> and help.
>
>
>
>> > You can use the dumpHTML maintenance script to convert wikitext to html
>>
>> Would "dumpHTML" work with only one "Wikitext" sentence
>> having to be translated to "HTML"?
>>
>> Actually, on: http://www.mediawiki.org/wiki/Extension:DumpHTML
>> one can read:
>>    "dumpHTML is an extension for generating a simple HTML
>>     dump, including images and media files, of a MediaWiki
>>     installation".
>> It looks a bit oversized in my case... doesn't it?
>>
>
> IIRC dumpHTML is a maintenance script that is included with mediawiki. I
> don't believe that it requires you to have images. I have used both of the
> approaches I described to you in the past, and found them both to be
> straightforward.
>
>
>>
>> All the best,
>> --
>> Lmhelp
>> --
>> View this message in context:
>> http://old.nabble.com/Wikitext-grammar-tp29350471p29375714.html
>> Sent from the WikiMedia General mailing list archive at Nabble.com.
>>
>>
>> _______________________________________________
>> MediaWiki-l mailing list
>> MediaWiki-l at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>>
> _______________________________________________
> MediaWiki-l mailing list
> MediaWiki-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>



More information about the MediaWiki-l mailing list