If you are to extract only Wikipedia'a articles first paragraph no problema.
2010/8/6 Katharina Wolkwitz <wolkwitz(a)fh-swf.de>
Hi,
Am 05.08.2010 16:47 schrieb lmhelp2:
Thank you!
So here is the list I have for the moment:
I need to ignore lines:
- containing: {{...}}
=> possibly spreading over several lines,
=> being possibly nested {{... {{ ... }} ... }}.
- containing: [[...]]
=> being possibly nested [[... [[ ... ]] ... ]].
- equal to: __TOC__
- equal to: __NOTOC__
- beginning with the '=' character
- beginning with the '*' character
I don't think you should ignore
lines beginning with the '*' character -
those
may include the wanted first paragraph of the text as the '*' is just a way
of
formatting the page...
Greetings
Katharina
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l