lmhelp wrote:
Hi,
Thank you or reading my post.
I am wondering if there exists a "grammar" for the "Wikicode"/"Wikitext" language (or an *exhaustive* (and formal) set of rules about how is constructed a "Wikitext"). I've looked for such a grammar/set of rules on the Web but I couldn't find one...
No. But see http://www.mediawiki.org/wiki/Markup_spec for grammars which "kind of work".
I need to extract automatically the first paragraph of a Wiki article...
I did it from the HTML version of a Wiki article (because I noticed the first paragraph was the first <p> element child of a <div> element which id is "bodyContent"...) but I need to work with the "Wikitext" itself...
- Is a grammar available somewhere?
- Do you have any idea how to extract the first paragaph of a Wiki article?
- Any advice?
- Does a Java "Wikitext" "parser" exists which would do it?
Get the first text before a double new line (\n\n), which is what splits paragraphs in wikitext.
However, pages commonly begin with templates, so if the page begins with {{, you would remove everything up to the matching }} (and remove leading whitespace).