I recently was doing similar thing in Rust, but I had a bit different idea in mind.
If you can implement it in PHP that would be awesome. Here is the idea:

I have very little free space on my laptop, so I cannot really download the dump.
What I wanted is to read dump directly from the server, unzip it in stream and parse each line separately (more or less what your tool does, except downloading the dump).
The problem with this approach that it may take some time to download dump and in that time network may have hiccups what will result in failure somewhere in the middle. Then I need to restart the process and download it again, which results in wasted bandwidth and CPU time.
What if the tool were storing the position and in case of failure would resume download from the needed offset?
That would make this package just awesome!

On 14 August 2018 at 22:26, Jeroen De Dauw <> wrote:
Hey all,

I've released version 2.0 of JSON Dump Reader, the PHP library for reading from Wikidata JSON dumps. This version has been tested with the latest dumps and is optimized for PHP 7.x users.

You can find usage and installation instructions at

Software Crafter | Speaker | Student | Strategist | Contributor to Wikimedia and Open Source

Wikidata-tech mailing list