Dear wikimedia Community
My Name is Fidel Gil I am a master's Student from
Technical University of Kaiserslautern in Germany, and I
am currently running some experiments with the enwiki xml
datadumps. where I recreate the linking structure between
articles. when doing so for the article
https://en.wikipedia.org/wiki/Animation
I found that it has a link that has as name 'walt disney
studios' in the subsection 'Animated Features CGI' that
resolves to 'walt disney animation studios'.
When going through the xml file the entry Animation does
reference 'walt disney studios' a disambiguation page
rather than 'walt disney animation studios'.
small excerpt from the line in question 'In 1937, [[Walt
Disney Studios]] premiered their first-ever animated
feature'.
Do the xml file dump use the tag names rather than some
other form of URL resolution to create this [[<name of
article>]] tags.?
Looking forward to your reply
Fidel Gil
Greetings XML Dump users and contributors!
This is your automatic monthly Dumps FAQ update email. This update
contains figures for the 20200501 full revision history content run.
We are currently dumping 913 projects in total.
---------------------
Stats for udmwiki on date 20200501
Total size of page content dump files for articles, current content only:
22584497
Total size of page content dump files for all pages, current content only:
26960582
Total size of page content dump files for all pages, all revisions:
493829286
---------------------
Stats for enwiki on date 20200501
Total size of page content dump files for articles, current content only:
77391323458
Total size of page content dump files for all pages, current content only:
171945757548
Total size of page content dump files for all pages, all revisions:
20707694819796
---------------------
Sincerely,
Your friendly Wikimedia Dump Info Collector
Hi.
*I apologize if this is the wrong place to ask, but I have sent multiple
messages to the WikiTaxi Facebook page and have got no reply.
To clarify my problem, please see the attached images and compare it to "
https://en.wikipedia.org/wiki/Cat" (the online version of the wiki). I have
highlighted some of the respective parts of the page, which are causing the
problem. Besides being inconvenient regarding visuals, alot of these
unidentified strings replace actual information in the Wiki, due to which
that info becomes inaccessible. I have faced this problem in every Wikitaxi
page that I have used.
My WikiTaxi version is 1.3.0 and the dump file is called "Offline
Wiki.taxi", and has a size of 25.59 GBs.
Any help is appreciated.
Cat
Hello,
If it was something advertised, sorry. Where should I get this kind of
info. Otherwise, It seems that the dumps had stopped at 1500 for today the
first of may. Is it expected ? should'nt it be 1800 since it's 22:41, no ?
[image: image.png]
T: @mguiraud | m. 06 95 92 51 33
Greetings XML Dump users and contributors!
This is your automatic monthly Dumps FAQ update email. This update
contains figures for the 20200401 full revision history content run.
We are currently dumping 911 projects in total.
---------------------
Stats for scwiktionary on date 20200401
Total size of page content dump files for articles, current content only:
46507
Total size of page content dump files for all pages, current content only:
150040
Total size of page content dump files for all pages, all revisions:
761268
---------------------
Stats for enwiki on date 20200401
Total size of page content dump files for articles, current content only:
76927677600
Total size of page content dump files for all pages, current content only:
170982392783
Total size of page content dump files for all pages, all revisions:
20544506982337
---------------------
Sincerely,
Your friendly Wikimedia Dump Info Collector
Hi all,
I’m doing research to the existence of gender bias in Wikipedia texts over time. To do this, I need old pages-articles.xml dumps. I am still looking for dumps from 2009 and 2011-2013, does anyone know how I can get one of these or does someone have one of these stored themselves?
Thanks in advance,
Katja Schmahl
Hi,
First of all, excuse me as I guess this is not appropriate channel to ask
this.
The dumps are not accessible anymore from some Kubernetes pods in the
ToolLabs server: https://phabricator.wikimedia.org/T247455
Please, could anyone help me to improve this ticket so it is taken into
account?
Kind regards,
For the past few years we have not dumped private tables at all; they would
not be accessible to the public in any case, and they do not suffice as a
backup in case of catastrophic failure.
We are therefore removing the feature to dump private tables along with
public tables in a dump run. Anyone who wishes to use the dump scripts in
our python repo to dump privat tables in their wiki will need to create a
separate dumps configuration file and tables yaml file describing which
tables to dump and where to put them, as a separate dump run.
This change will be committed by April 20, 2020, in time for the second
dump run of the month.
Note that this does not impact the actual output of the Wikimedia SQL/XML
dumps at all, since we have not been dumping private tables since late 2016.
See T249508 to follow along.
Ariel