Dear Ariel,

Happy New Year.  I am gearing up for wp-mirror-0.7.  To that end, I would like to list some issues that I see; and I would like to offer my help in solving them.

0) Problem Statements

0.1) Page Rendering.  Wp-mirror-0.6 works well in the sense that it builds a faithful mirror of any of your wikis.  However, during 2013 the rendering of pages eroded materially.  For example,

     o interlanguage links have vanished both from rendered pages and from dump files;
     o infoboxes are no longer rendered;
     o most transclusions now render as redlinks even though the templates are easily found in the underlying database; etc.

I understand that this erosion occurred because wp-mirror-0.6 still uses mediawiki-1.19, whereas WMF has moved on to mediawiki-1.23.  For example, I understand that:

     o interlanguage links have been removed to the wikidata project, the rendering of which requires mediawiki-1.21+;
     o infoboxes now require the scribunto extension which requires mediawiki-1.20+

0.2) Database Schema.  Some differences in database schema have appeared.

     o category - dump files now have 5 fields, whereas the database schema has 6 fields;
     o exterallinks - dump files now have 4 fields, whereas the database schema has 3 fields.

Loading these two tables generate the error message:  ``Column count doesn't match value at row 1.''

0.3) Version Lifecycle.  According to <http://www.mediawiki.org/wiki/Version_lifecycle> mediawiki 1.23 LTS is slated for May 2014.  However, the Debian packaging team is silent as to their plans for a transition from mediawiki-1.19 LTS to mediawiki-1.23 LTS.

0.4) Image Dumps.  The large image dump tarballs are now a year old.  This means that, while wp-mirror still downloads the bulk of its images from these tarballs, there are a growing number that must be downloaded individually from WMF.

0.5) Thumbs.  One person has asked me if dump files of thumbs could be made available. We are beginning to see thumb dumps from the xowa project.

0.6) IPv6.  I am glad to see that <gerrit.wikimedia.org> has an IPv6 address.  However, <bastion.wmflabs.org> still does not.  My internal network is IPv6 only.

1) mwxml2sql

This utility from Ariel Glenn has proved invaluable to the wp-mirror project. This utility, together with MySQL 5.5 fast index creation, allows wp-mirror to build mirrors much faster than before (80% less time). 

1.1) Need for update.  According to its version information, mwxml2sql may only be valid through mediawiki-1.21.

(shell)$ mwxml2sql --version
mwxml2sql 0.0.2
Supported input schema versions: 0.4 through 0.8.
Supported output MediaWiki versions: 1.5 through 1.21.

Whereas, I am looking forward to mediawiki-1.23 LTS (see below), I would like to know if mwxml2sql should be updated.

1.2) Help Offer.  If mwxml2sql does need updating, I would be happy to help with this; and to package it for Debian as I have done before. Perhaps we could call it mwxml2sql-0.0.3.

2) mediawiki-1.23 LTS.  

2.1) Vision. I would like wp-mirror-0.7 to be able to build a mirror that serves pages that look no different than those served by WMF.

2.2) DEB package.  To that end, I am thinking of packaging mediawiki-1.23 together with the extensions needed for rendering WMF wikis with wikidata content, infoboxes, math, transclusions, etc.   Given WMF's ``continuous integration'' development model, I would like to be able to automatically generate a tarball and DEB package each time WMF pushes an update to its servers.

2.3) Debian package repository.  Such a DEB package would be distributed with wp-mirror. In preparation for this, I have set up a Debian package repository at <http://download.savannah.gnu.org/releases/wp-mirror/>.  It is currently used to distribute wp-mirror-0.6 and an unstable version of wp-mirror-0.7.  Home page <http://www.nongnu.org/wp-mirror/>.

2.4) Help Offer.  I am happy to do most of this work myself.  However, I will need some guidance on interacting with the appropriate GIT repositories.  I hope that you can put me in touch with someone involved in the ``continuous integration'' process.

3) Media dumps

I am thinking that updating the image dumps annually would be adequate.  Including thumbs in those dumps would materially assist the off-line community.  I could easily update wp-mirror-0.7 to give the user a choice (no media files, thumbs only, full size media files).

Sincerely Yours,
Kent