Re: [Mediawiki-l] Wikitext grammar

7 Aug 2010

Thank you all for your contribs :).

Hi,

So... I was over-optimistic about managing to extract the first
paragraph of a "Wikipedia" article out of its "Wikitext" easily...

Yet, I managed (1) for instance (for the "Wikipedia" article "Čokot")
to get the following "Wikitext" sentence:
-------------------------------------------------------------------------
'''Cokot''', en [[serbe]] [[Alphabet cyrillique
serbe|cyrillique]]
{{lang|sr|?????}}, est une localité de [[Serbie]] située dans la
municipalité de [[Palilula (Niš)]], district de [[Nišava (district)|
Nišava]]. En [[2002]], elle comptait {{formatnum:1401}} habitants<ref
name="stats1">{{Historique de la population (Serbie)}}</ref>, dont une
majorité de [[Serbes]].
-------------------------------------------------------------------------

I then used the "Bliki" (2) engine to convert this
"Wikitext" sentence to "HTML". Here is what I got:
-------------------------------------------------------------------------
<p>Cokot, en  http://fr.wikipedia.org/wiki/Serbe serbe  
http://fr.wikipedia.org/wiki/
Alphabet_cyrillique_serbe cyrillique  {{lang}}, est une localité de  http://
fr.wikipedia.org/wiki/Serbie Serbie  située dans la
municipalité de  http://fr.wikipedia.org/wiki/Palilula_(Ni
%C2%9A) Palilula (Niš) , district de 
http://fr.wikipedia.org/wiki/Ni%C2%9Aava_(district) Nišava . En  http://
fr.wikipedia.org/wiki/2002 2002 , elle comptait
{{formatnum:1401}} habitants<sup id="_ref-stats1_a"
class="reference"> #_note-stats1 [1] </sup>, dont
une majorité de  http://fr.wikipedia.org/wiki/Serbes Serbes .</p>
-------------------------------------------------------------------------
This "HTML" sentence still contains two "Wikitext" chunks:
- {{lang}}
  and
- {{formatnum:1401}}.

=> "{{lang}}" should have been suppressed.
=> "{{formatnum:1401}}" should have been replaced by "1401".

So, I posted on the "Bliki" forum (3) and someone told me they hadn't 
implemented yet what was necessary to handle those two chunks of 
"Wikitext" that remain in the example above... and that I had to do it
myself...

The reason I chose "Bliki" is because there was a Java ".jar" archive
available (and ready to be embedded in my Eclipse project) which is 
quite convenient for me.

MY FIRST QUESTION IS:
=====================
I was wondering if you knew a better tool than this one... one which 
wouldn't "miss" some "Wikitext" chunks of code like in the above 
example (or maybe which at least would handle usual templates like
"lang" and "formatnum")?

MY SECOND QUESTION IS:
======================
I was also wondering: the parser which is used in "Wikipedia" works
pretty well... I mean: such things as above never happen... as far as
I know...
So my question is: is this parser available? Where?
Can I use it with my Java code?
And please, forgive me if this question is naïve...

Thank you for your help and indulgence.
All the best,
--
Lmhelp

(1) Really, it is something which wouldn't probably work in all cases and
     is based on the fact that a paragraph ends with "\n\n" as
"Platonides" 
     said in his first post.
(2) http://code.google.com/p/gwtwiki/
(3)
http://groups.google.com/group/bliki/browse_thread/thread/7ed33272b206826f
-- 
View this message in context:
http://old.nabble.com/Wikitext-grammar-tp29350471p29375222.html
Sent from the WikiMedia General mailing list archive at Nabble.com.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

Re: [Mediawiki-l] Wikitext grammar