-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Platonides wrote:
Farkas, Illes wrote:
Dear All,
Is the dump file containing the page abstracts for Yahoo produced by
human or machines ?
Thanks
It's producesd by a machine, extracting the beginning of all articles
(which are human-created).
It's a machine attempting to pull the first two sentences of the article
as plaintext, sometimes more successfully than others. :)
I'm not sure these files are actually still being used, though.
You can find the code in:
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/ActiveAbstract/
But I think the newer code here to pull the first sentence is more
reliable (requires current MediaWiki with new parser):
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/OpenSearchXml/
- -- brion
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla -
http://enigmail.mozdev.org
iEYEARECAAYFAkklvocACgkQwRnhpk1wk458QgCfQythKEvXp9ssRsILQOejNQ09
bWoAn31APe3W773YkBTy2UuKOE2drQJ9
=MGM8
-----END PGP SIGNATURE-----