If you are to extract only Wikipedia'a articles first paragraph no problema.
2010/8/6 Katharina Wolkwitz wolkwitz@fh-swf.de
Hi,
Am 05.08.2010 16:47 schrieb lmhelp2:
Thank you!
So here is the list I have for the moment: I need to ignore lines:
- containing: {{...}} => possibly spreading over several lines, => being possibly nested {{... {{ ... }} ... }}.
- containing: [[...]] => being possibly nested [[... [[ ... ]] ... ]].
- equal to: __TOC__
- equal to: __NOTOC__
- beginning with the '=' character
- beginning with the '*' character
I don't think you should ignore lines beginning with the '*' character - those may include the wanted first paragraph of the text as the '*' is just a way of formatting the page...
Greetings Katharina
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l