Send Xmldatadumps-l mailing list submissions to
xmldatadumps-l@lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
or, via email, send a message with subject or body 'help' to
xmldatadumps-l-request@lists.wikimedia.org
You can reach the person managing the list at
xmldatadumps-l-owner@lists.wikimedia.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Xmldatadumps-l digest..."
Today's Topics:
1. XML Dumps FAQ monthly update (noreply.xmldatadumps@wikimedia.org)
2. List of dumped wikis, discrepancy with Wikidata (Count Count)
----------------------------------------------------------------------
Message: 1
Date: Sat, 01 Aug 2020 16:07:36 +0000
From: noreply.xmldatadumps@wikimedia.org
To: xmldatadumps-l@lists.wikimedia.org
Subject: [Xmldatadumps-l] XML Dumps FAQ monthly update
Message-ID: <20200801160736.AneN_%noreply.xmldatadumps@wikimedia.org>
Greetings XML Dump users and contributors!
This is your automatic monthly Dumps FAQ update email. This update
contains figures for the 20200701 full revision history content run.
We are currently dumping 916 projects in total.
---------------------
Stats for lmowiki on date 20200701
Total size of page content dump files for articles, current content only:
151410097
Total size of page content dump files for all pages, current content only:
179774126
Total size of page content dump files for all pages, all revisions:
3555369968
---------------------
Stats for enwiki on date 20200701
Total size of page content dump files for articles, current content only:
78326324425
Total size of page content dump files for all pages, current content only:
173926604054
Total size of page content dump files for all pages, all revisions:
21045320844828
---------------------
Sincerely,
Your friendly Wikimedia Dump Info Collector
------------------------------
Message: 2
Date: Sun, 2 Aug 2020 00:04:22 +0200
From: Count Count <countvoncount123456@gmail.com>
To: xmldatadumps-l@lists.wikimedia.org
Subject: [Xmldatadumps-l] List of dumped wikis, discrepancy with
Wikidata
Message-ID:
<CAOHwkzAk6R+W4Xj673h=p44zxwX+22Pt+Zd3UBg_NbSUUTg+1w@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi!
I am currently working on a dump search and download tool for all Wikimedia
wikis. In order to find out which Wikimedia wikis exist I used Wikidata.
While comparing the list of wikis from Wikidata with the list of dumped
projects I found out that the following wikis are currently not being
dumped:
- alswikibooks (last dump 20180101)
- alswikiquote (last dump 20180101)
- alswiktionary (last dump 20180101)
- ecwikimedia (never dumped, private but not marked private in Wikidata?)
- fixcopyrightwiki (last dump 20200220)
- labswiki (never dumped?)
- labtestwiki (never dumped?)
- mowiki (last dump 20180101)
- mowiktionary (last dump 20180101)
- ru_sibwiki (last dump 20071011)
- ukwikiversity (never dumped?)
Is there an uptodate machine-readable list of currently dumped wikis
besides https://dumps.wikimedia.org/backup-index.html?
(Off-topic) Spoiler for dump searching tool on my laptop:
$ target/release/wdgrep "asdfdefased"
/c/Users/xyz/wpdumps/dewiki-20200701-pages-articles-multistream.xml -v --ns
0
Searched 21437.064 MiB in 8.467969 seconds (2531.5474 MiB/s).
Best regards,
Count Count
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/xmldatadumps-l/attachments/20200802/d5c67d51/attachment-0001.html>
------------------------------
Subject: Digest Footer
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
------------------------------
End of Xmldatadumps-l Digest, Vol 120, Issue 2
**********************************************