Bryan Tong Minh wrote:
On Sun, Sep 9, 2012 at 8:34 PM, Roberto Flores f.roberto.isc@gmail.com wrote:
Could you please provide HTML dumps (I mean, with the templates pre-processed into HTML, everything else the same as now) every 3 or 4 months?
How a template is rendered into HTML depends very much on the context (i.e. page title, last modification date, etc.) and the arguments that it is called with. So an HTML render of all template pages is unlikely to be very useful for you.
This reply and others in this thread don't make any sense to me. It seems like the opening poster is looking at http://dumps.wikimedia.org/other/static_html_dumps/ and noticing that the HTML dumps haven't been updated in years. This is a problem and it's tracked by https://bugzilla.wikimedia.org/show_bug.cgi?id=15017.
For reference, the English Wikipedia has over 4 million content pages. At a rate of a page per second (which is completely unrealistic, but let's just assume for a moment), you could get the HTML of every content page on the English Wikipedia in about 46.3 days(!). The suggestion in any discussion regarding HTML dumps that people should just use the API themselves (presumably in combination with an XML dump containing the wikitext of each page) is just absurd. There's enormous value in the HTML dumps. This subject came up in December 2011 and from the comments in that thread, it seemed as though the only reason the HTML dumps have been updated is that nobody has run the relevant script.
MZMcBride