If it's gone, that's coincidence. Flagging this to look into, thanks for
the report. Please follow that ticket,
https://phabricator.wikimedia.org/T184258 for more info.
On Tue, Apr 10, 2018 at 5:35 PM, Derk-Jan Hartman <
> It seems that the pagecounts-ez sets disappeared from
> dumps.wikimedia.org starting this date. Is that a coincidence ?
> Is it https://phabricator.wikimedia.org/T189283 perhaps ?
> On Thu, Mar 29, 2018 at 2:42 PM, Ariel Glenn WMF <ariel(a)wikimedia.org>
> > Here it comes:
> > For the April 1st run and all following runs, the Wikidata dumps of
> > pages-meta-current.bz2 will be produced only as separate downloadable
> > files, no recombined single file will be produced.
> > No other dump jobs will be impacted.
> > A reminder that each of the single downloadable pieces has the siteinfo
> > header and the mediawiki footer so they may all be processed separately
> > whatever tools you use to grab data out of the combined file. If your
> > workflow supports it, they may even be processed in parallel.
> > I am still looking into what the best approach is for the pags-articles
> > dumps.
> > Please forward wherever you deem appropriate. For further updates, don't
> > forget to check the Phab ticket! https://phabricator.wikimedia.
> > On Mon, Mar 19, 2018 at 2:00 PM, Ariel Glenn WMF <ariel(a)wikimedia.org>
> > wrote:
> >> A reprieve! Code's not ready and I need to do some timing tests, so the
> >> March 20th run will do the standard recombining.
> >> For updates, don't forget to check the Phab ticket!
> >> https://phabricator.wikimedia.org/T179059
> >> On Mon, Mar 5, 2018 at 1:10 PM, Ariel Glenn WMF <ariel(a)wikimedia.org>
> >> wrote:
> >>> Please forward wherever you think appropriate.
> >>> For some time we have provided multiple numbered pages-articles bz2
> >>> for large wikis, as well as a single file with all of the contents
> >>> into one. This is consuming enough time for Wikidata that it is no
> >>> sustainable. For wikis where the sizes of these files to recombine is
> >>> large", we will skip this recombine step. This means that downloader
> >>> scripts relying on this file will need to check its existence, and if
> >>> not there, fall back to downloading the multiple numbered files.
> >>> I expect to get this done and deployed by the March 20th dumps run.
> >>> can follow along here: https://phabricator.wikimedia.org/T179059
> >>> Thanks!
> >>> Ariel
> > _______________________________________________
> > Wikitech-l mailing list
> > Wikitech-l(a)lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
First, I am new to the list, so apologies if I am asking a question that has been answered before.
I am trying to identify which en.wikipedia dump contains the links between the English language version of wikipedia and other language versions for individual articles. I have downloaded several of the dumps for en.wikipedia.org, but I cannot seem to find these links in the files.
What I mean by language links between different versions of Wikipedia can be seen on the Barack Obama page:
On the left hand side of the screen under "languages" lists more than 100 links to Barack Obama articles in other language versions of wikipedia.
The first link listed for for Achenese
The last one listed is the following:
If anyone could identify for me any dump that contains the language links between en.wikipedia and the other language versions of wikipedia, I would be grateful.
As you'll have seen from previous email, we are now using a new beefier
webserver for your dataset downloading needs. And the old server is going
away on TUESDAY April 10th.
This means that if you are using 'dataset1001.wikimedia.org' or the IP
address itself in your scripts, you MUST change it before Tuesday, or it
will stop working.
There will be no further reminders.
Hello dumps.wikimedia.org users,
The servers that host the dumps.wikimedia.org site are being replaced with
shiny new hardware! The web service migration is set to happen at 14:30 UTC
on Wednesday, April 4 2018. If you are trying to connect to
dumps.wikimedia.org around the migration window, you might experience a
short downtime. The switchover should ideally only take a few minutes, and
we'll keep you posted once it's all done, or if anything changes!
As always, please feel free to reach out to us with any questions or
Madhumitha Viswanathan & Ariel Glenn
Those of you that rely on the abstracts dumps will have noticed that the
content for wikidata is pretty much useless. It doesn't look like a
summary of the page because main namespace articles on wikidata aren't
paragraphs of text. And there's really no useful summary to be generated,
even if we were clever.
We have instead decided to produce abstracts output only for pages in the
main namespace that consist of text. For pages that are of type
wikidata-item, json and so on, the <abstract> tag will contain the
attribute 'not-applicable' set to the empty string. This impacts a very few
pages on other wikis; for the full list and for more information on this
change, see https://phabricator.wikimedia.org/T178047
We hope this change will be merged in a week or so; it won't take effect
for wikidata until the next dumps run on April 20th, since the wikidata
abstracts are already in progress.
If you have any questions, don't hesitate to ask.