On Wed, Jan 22, 2014 at 10:31 AM, Matthew Flaschen mflaschen@wikimedia.orgwrote:
On 01/21/2014 09:47 PM, Amir Ladsgroup wrote:
One of the things I can't understand is why we are extracting summary of pages for Yahoo? Is it our job to do it? the dumps are really huge e.g. forwikidata:http://dumps.wikimedia.org/wikidatawiki/20140106/ wikidatawiki-20140106-abstract.xml<http://dumps. wikimedia.org/wikidatawiki/20140106/wikidatawiki-20140106-abstract.xml
14.1
GB Compare it to: full history: wikidatawiki-20140106-pages-meta-history.xml.bz2http:// dumps.wikimedia.org/wikidatawiki/20140106/wikidatawiki-20140106-pages- meta-history.xml.bz28.8 GB
That's because the Yahoo one isn't compressed.
why? can we make it compressed? It's really annoying to see that huge file
there for (even almost) no reason.
I'm not sure if Yahoo still uses those abstracts, but I wouldn't be surprised at all if other people are.
Matt Flaschen
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l