Thank you very much. That exactly what I wanted to know.
2011/11/27 Bjoern Hoehrmann <derhoermi(a)gmx.net>
* Khalida BEN SIDI AHMED wrote:
In the html code of a Wikipedia article how to
recognise the
*first*sentence of this article?
It's not marked up and probably differs among language versions. On the
english version the first `p` child of a `mw-content-ltr` element is a
good bet, as I pointed out earlier, to identify the first paragraph. It
would then be necessary to find the full stop at the end of a sentence;
criteria for that include that a space or the end of a paragraph follows
and that it is not included in some nesting construct like parentheses;
http://en.wikipedia.org/wiki/Sentence_boundary_disambiguation discusses
some of the problems and includes pointers to some solutions.
--
Björn Höhrmann · mailto:bjoern@hoehrmann.de ·
http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 ·
http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 ·
http://www.websitedev.de/
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l