Dear All,
Is the dump file containing the page abstracts for Yahoo produced by human or machines ?
Thanks
Farkas, Illes wrote:
Dear All,
Is the dump file containing the page abstracts for Yahoo produced by human or machines ?
Thanks
It's producesd by a machine, extracting the beginning of all articles (which are human-created).
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Platonides wrote:
Farkas, Illes wrote:
Dear All,
Is the dump file containing the page abstracts for Yahoo produced by human or machines ?
Thanks
It's producesd by a machine, extracting the beginning of all articles (which are human-created).
It's a machine attempting to pull the first two sentences of the article as plaintext, sometimes more successfully than others. :)
I'm not sure these files are actually still being used, though.
You can find the code in: http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/ActiveAbstract/
But I think the newer code here to pull the first sentence is more reliable (requires current MediaWiki with new parser): http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/OpenSearchXml/
- -- brion
mediawiki-l@lists.wikimedia.org