New subject: Request for input on de fr wiki breaking change to dumps!

1 Dec 2015


      I assume you have all seen https://phabricator.wikimedia.org/T116907
"Explore the possibility of splitting dewiki and frwiki into smaller
chunks"
If not, and you ever use frwiki or dewiki page content dumps, go read
it now.  Or if you know of anyone who uses them, please nag them to go
read it.
The upshot is that we will most likely on January 1st 2016 do all
further dump runs of frwiki and dewiki with so-called 'checkpointing'.
This change is being made so that if one of these jobs is interrupted
for whatever reason, it can be rerun with only the missing page ranges
dumped on the second run, saving quite a lot of time. A second reason
is to ease the burden on downloaders, who generally prefer downloading
several smaller files rather than one large 90gb file (example taken
from dewiki history dumps).
WHat does this mean in practice for you, users of the dumps?  It means
that filenames for the page content (article, meta-current and meta
-history) dumps will have pXXXXpYYYY in the names, where XXXX is the
first page id in the file and YYY is the last pageid in the file.  For
examples of this you can look at the enwiki page content dumps, which
have been running that way for a few years now.
This notice should give you plenty of time to convert your tools to use
the new nameing scheme.  I encourage you to forward this message to
other appropriate people or groups.
Thanks,
Ariel