Hi folks,
we in Hungarian Wikipedia have been watching new huwiki dumps by bot since 2011, so this page history: https://hu.wikipedia.org/w/index.php?title=Sablon:A_dump_d%C3%A1tuma&off... clearly shows the freqency. Back in 2012 it took 8-10 days to create the new dump. Now it takes one month. Are we less develeoped or do we have less hardware than 4 years ago?
Hi Binaris,
We actually have better hardware than 4 years ago [0]. However, we have more projects with more content than 4 years ago. Wikidata did not exist in 2011; today it has almost 1/2 the revisions of the English language Wikipedia. The English language Wikipedia itself has increased 51% in size since early 2012. And the Hungarian language wiki has grown by over 50% as well.
We should be running two dumps a month going forwards [1], where the second run each month does not contain full history revisions. This isn't as good as 2011 but it's not as bad as once a month either.
The main work however to improve the dumps situation will be in a complete rearchitecturing of the dumps. One big change will be to move to a format and structure that is truly incremental.
Folks interested in these issues are welcome to subscribe to or watch the Phabricator project for the current dumps [3] and/or the future dumps [4]. There is also a dedicated (low-traffic) list for users and contributors to the xml dumps [4].
Lastly, your email reminds me that I should update the dumps information at Meta; the documentation there has fallen a bit behind. Thanks! Ariel
[0] For current hardware, see https://wikitech.wikimedia.org/wiki/Dumps/Snapshot_hosts [1] https://phabricator.wikimedia.org/T126339 [2] https://phabricator.wikimedia.org/tag/dumps-generation/ [3] https://phabricator.wikimedia.org/tag/dumps-rewrite/ [4] https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
On Wed, Aug 3, 2016 at 8:31 AM, Bináris wikiposta@gmail.com wrote:
Hi folks,
we in Hungarian Wikipedia have been watching new huwiki dumps by bot since 2011, so this page history:
https://hu.wikipedia.org/w/index.php?title=Sablon:A_dump_d%C3%A1tuma&off... clearly shows the freqency. Back in 2012 it took 8-10 days to create the new dump. Now it takes one month. Are we less develeoped or do we have less hardware than 4 years ago?
-- Bináris _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org