Hi, If you don't mind, please, starting next time, insert commas into those huge counts. Without commas they are VERY difficult to read. Thanks! Sincerely, Todd Shandelman Austin, TX
On Sun, Aug 2, 2020, 07:01 xmldatadumps-l-request@lists.wikimedia.org wrote:
Send Xmldatadumps-l mailing list submissions to xmldatadumps-l@lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l or, via email, send a message with subject or body 'help' to xmldatadumps-l-request@lists.wikimedia.org
You can reach the person managing the list at xmldatadumps-l-owner@lists.wikimedia.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Xmldatadumps-l digest..."
Today's Topics:
- XML Dumps FAQ monthly update (noreply.xmldatadumps@wikimedia.org)
- List of dumped wikis, discrepancy with Wikidata (Count Count)
Message: 1 Date: Sat, 01 Aug 2020 16:07:36 +0000 From: noreply.xmldatadumps@wikimedia.org To: xmldatadumps-l@lists.wikimedia.org Subject: [Xmldatadumps-l] XML Dumps FAQ monthly update Message-ID: 20200801160736.AneN_%noreply.xmldatadumps@wikimedia.org
Greetings XML Dump users and contributors!
This is your automatic monthly Dumps FAQ update email. This update contains figures for the 20200701 full revision history content run.
We are currently dumping 916 projects in total.
Stats for lmowiki on date 20200701
Total size of page content dump files for articles, current content only: 151410097
Total size of page content dump files for all pages, current content only: 179774126
Total size of page content dump files for all pages, all revisions: 3555369968
Stats for enwiki on date 20200701
Total size of page content dump files for articles, current content only: 78326324425
Total size of page content dump files for all pages, current content only: 173926604054
Total size of page content dump files for all pages, all revisions: 21045320844828
Sincerely,
Your friendly Wikimedia Dump Info Collector
Message: 2 Date: Sun, 2 Aug 2020 00:04:22 +0200 From: Count Count countvoncount123456@gmail.com To: xmldatadumps-l@lists.wikimedia.org Subject: [Xmldatadumps-l] List of dumped wikis, discrepancy with Wikidata Message-ID: <CAOHwkzAk6R+W4Xj673h= p44zxwX+22Pt+Zd3UBg_NbSUUTg+1w@mail.gmail.com> Content-Type: text/plain; charset="utf-8"
Hi!
I am currently working on a dump search and download tool for all Wikimedia wikis. In order to find out which Wikimedia wikis exist I used Wikidata. While comparing the list of wikis from Wikidata with the list of dumped projects I found out that the following wikis are currently not being dumped:
- alswikibooks (last dump 20180101)
- alswikiquote (last dump 20180101)
- alswiktionary (last dump 20180101)
- ecwikimedia (never dumped, private but not marked private in
Wikidata?)
- fixcopyrightwiki (last dump 20200220)
- labswiki (never dumped?)
- labtestwiki (never dumped?)
- mowiki (last dump 20180101)
- mowiktionary (last dump 20180101)
- ru_sibwiki (last dump 20071011)
- ukwikiversity (never dumped?)
Is there an uptodate machine-readable list of currently dumped wikis besides https://dumps.wikimedia.org/backup-index.html?
(Off-topic) Spoiler for dump searching tool on my laptop: $ target/release/wdgrep "asdfdefased" /c/Users/xyz/wpdumps/dewiki-20200701-pages-articles-multistream.xml -v --ns 0 Searched 21437.064 MiB in 8.467969 seconds (2531.5474 MiB/s).
Best regards,
Count Count