[Mediawiki-l] Wikitext grammar

Edward Swing deswing at vsticorp.com
Thu Aug 5 11:50:48 UTC 2010


We have written a Java utility program that accesses the first few
paragraphs of a wiki article (or the article as a whole). Of course,
with trying to parse templates and the wiki syntax, it isn't perfect.

If you are trying to get the non-template text, you'll have to do the
following:
	Determine if you're parsing to the end of the first paragraph
(\n\n) or the first section (line beginning with ===)
	Remove all templates (there's no way to parse the templates into
their presentation formats easily), or at least the infoboxes
	Transform any links to their visible representation (look for
square brackets)
	Remove citations, and other markup

We are trying to work through some IP issues, but I might be able to
post source code.


-----Original Message-----
From: mediawiki-l-bounces at lists.wikimedia.org
[mailto:mediawiki-l-bounces at lists.wikimedia.org] On Behalf Of lmhelp
Sent: Wednesday, August 04, 2010 3:45 PM
To: mediawiki-l at lists.wikimedia.org
Subject: [Mediawiki-l] Wikitext grammar


Hi,

Thank you or reading my post.

I am wondering if there exists a "grammar" for the "Wikicode"/"Wikitext"
language (or an *exhaustive* (and formal) set of rules about how is
constructed 
a "Wikitext"). 
I've looked for such a grammar/set of rules on the Web but I couldn't
find
one...

I need to extract automatically the first paragraph of a Wiki article...

I did it from the HTML version of a Wiki article (because
I noticed the first paragraph was the first <p> element
child of a <div> element which id is "bodyContent"...)
but I need to work with the "Wikitext" itself...

- Is a grammar available somewhere?
- Do you have any idea how to extract the first paragaph of a Wiki
article?
- Any advice?
- Does a Java "Wikitext" "parser" exists which would do it?

Thank you for your help.
All the best,

--
Lmhelp
-- 
View this message in context:
http://old.nabble.com/Wikitext-grammar-tp29350471p29350471.html
Sent from the WikiMedia General mailing list archive at Nabble.com.


_______________________________________________
MediaWiki-l mailing list
MediaWiki-l at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l



More information about the MediaWiki-l mailing list