Thank you all for your contribs :).
Hi,
So... I was over-optimistic about managing to extract the first
paragraph of a "Wikipedia" article out of its "Wikitext" easily...
Yet, I managed (1) for instance (for the "Wikipedia" article "Čokot")
to get the following "Wikitext" sentence:
-------------------------------------------------------------------------
'''Cokot''', en [[serbe]] [[Alphabet cyrillique
serbe|cyrillique]]
{{lang|sr|?????}}, est une localité de [[Serbie]] située dans la
municipalité de [[Palilula (Niš)]], district de [[Nišava (district)|
Nišava]]. En [[2002]], elle comptait {{formatnum:1401}} habitants<ref
name="stats1">{{Historique de la population (Serbie)}}</ref>, dont une
majorité de [[Serbes]].
-------------------------------------------------------------------------
I then used the "Bliki" (2) engine to convert this
"Wikitext" sentence to "HTML". Here is what I got:
-------------------------------------------------------------------------
<p>Cokot, en
http://fr.wikipedia.org/wiki/Serbe serbe
http://fr.wikipedia.org/wiki/
Alphabet_cyrillique_serbe cyrillique {{lang}}, est une localité de http://
fr.wikipedia.org/wiki/Serbie Serbie située dans la
municipalité de
http://fr.wikipedia.org/wiki/Palilula_(Ni
%C2%9A) Palilula (Niš) , district de
http://fr.wikipedia.org/wiki/Ni%C2%9Aava_(district) Nišava . En http://
fr.wikipedia.org/wiki/2002 2002 , elle comptait
{{formatnum:1401}} habitants<sup id="_ref-stats1_a"
class="reference"> #_note-stats1 [1] </sup>, dont
une majorité de
http://fr.wikipedia.org/wiki/Serbes Serbes .</p>
-------------------------------------------------------------------------
This "HTML" sentence still contains two "Wikitext" chunks:
- {{lang}}
and
- {{formatnum:1401}}.
=> "{{lang}}" should have been suppressed.
=> "{{formatnum:1401}}" should have been replaced by "1401".
So, I posted on the "Bliki" forum (3) and someone told me they hadn't
implemented yet what was necessary to handle those two chunks of
"Wikitext" that remain in the example above... and that I had to do it
myself...
The reason I chose "Bliki" is because there was a Java ".jar" archive
available (and ready to be embedded in my Eclipse project) which is
quite convenient for me.
MY FIRST QUESTION IS:
=====================
I was wondering if you knew a better tool than this one... one which
wouldn't "miss" some "Wikitext" chunks of code like in the above
example (or maybe which at least would handle usual templates like
"lang" and "formatnum")?
MY SECOND QUESTION IS:
======================
I was also wondering: the parser which is used in "Wikipedia" works
pretty well... I mean: such things as above never happen... as far as
I know...
So my question is: is this parser available? Where?
Can I use it with my Java code?
And please, forgive me if this question is naïve...
Thank you for your help and indulgence.
All the best,
--
Lmhelp
(1) Really, it is something which wouldn't probably work in all cases and
is based on the fact that a paragraph ends with "\n\n" as
"Platonides"
said in his first post.
(2)
http://code.google.com/p/gwtwiki/
(3)
http://groups.google.com/group/bliki/browse_thread/thread/7ed33272b206826f
--
View this message in context:
http://old.nabble.com/Wikitext-grammar-tp29350471p29375222.html
Sent from the WikiMedia General mailing list archive at
Nabble.com.