I know it's been mentioned on this list before, but it would be incredibly useful to have incremental dumps of Wikidata, as downloading the current dumps can now take several hours over a poor-bandwidth Internet connection.

Here's my proposal:

* the incremental dumps should have exactly the same format as the current JSON dumps, with two exceptions:

** entries which are unchanged since the previous dump (as determined by their "modified" timestamp) should be omitted

** entries which have been deleted since the previous dump should have stub entries of the form {"id": "Q123", "deleted": true}

I would imagine that these dumps would be vastly smaller than the standard dumps, and would, for many re-users who only want to know about changed data, be just as useful, with a fraction of the download time and in many cases without significant modification of any of their tools. Doing this would only need a small amount of processing time, and add only an insignificant amount to the disk storage needed on the servers, yet could save considerable amounts of Internet bandwidth.

This difference-file format should be easy to generate using slight tweaks to the existing dump code, but, if needed, I can easily write a simple Python script to take two existing dump files and generate the differences between them in the format above. Please drop me an email, or reply here, if you would like me to write this.

Kind regards,

Neil