Hello.
I am doing some converts to aarddict https://aarddict.org/ offline
wikipedia and wiktionary app. I use mw2slob and the N0 files found on
https://dumps.wikimedia.org/other/enterprise_html/runs/ for this
conversions.
But in the spanish Wikipedia for example the article
https://es.wikipedia.org/wiki/Anexo:Aves_de_Canarias seems not to be
part of the tar.gz file.
And in the french Wiktionary the article
https://fr.wiktionary.org/wiki/Conjugaison:espagnol/aumentar also is
missing in the respective tar.gz file.
Can they be found somewhere else? In N6 or N14? For me it seems that
articles/pages that have a colon like Anexo: or Conjugaison: are not
part. But why? And where could I find them? Or are they to big or what
is the idea of not including them?
Regards,
Erik
Hello
I'm looking through the dump files and am not sure 'what contains what'. Maybe there's a descriptive page that I've missed somewhere?
I'd like XML or HTML, no images, to make a crawl of UK local elections, via keywords or simulating a web crawler (or a mixture of both, some pruning, then crawling).
Sorry about the question, best regards Hugh Barnard
---------
https://www.hughbarnard.org
Twitter: @hughbarnard
Hi!
I am trying to find a dump of all imageinfo data [1] for all files on
Commons. I thought that "Articles, templates, media/file descriptions,
and primary meta-pages" XML dump would contain that, given the
"media/file descriptions" part, but it seems this is not the case. Is
there a dump which contains that information? And what is "media/file
descriptions" then? Wiki pages of files?
[1] https://www.mediawiki.org/wiki/API:Imageinfo
Mitar
--
http://mitar.tnode.com/https://twitter.com/mitar_m
Greetings XML Dump users and contributors!
This is your automatic monthly Dumps FAQ update email. This update
contains figures for the 20220101 full revision history content run.
We are currently dumping 945 projects in total.
---------------------
Stats for nnwiki on date 20220101
Total size of page content dump files for articles, current content only:
683,266,535
Total size of page content dump files for all pages, current content only:
743,779,759
Total size of page content dump files for all pages, all revisions:
15,451,351,380
---------------------
Stats for enwiki on date 20220101
Total size of page content dump files for articles, current content only:
87,092,585,288
Total size of page content dump files for all pages, current content only:
191,642,115,143
Total size of page content dump files for all pages, all revisions:
23,994,550,626,483
---------------------
Sincerely,
Your friendly Wikimedia Dump Info Collector