Hi!
I am currently working on a dump search and download tool for all Wikimedia wikis. In order to find out which Wikimedia wikis exist I used Wikidata. While comparing the list of wikis from Wikidata with the list of dumped projects I found out that the following wikis are currently not being dumped:
- alswikibooks (last dump 20180101) - alswikiquote (last dump 20180101) - alswiktionary (last dump 20180101) - ecwikimedia (never dumped, private but not marked private in Wikidata?) - fixcopyrightwiki (last dump 20200220) - labswiki (never dumped?) - labtestwiki (never dumped?) - mowiki (last dump 20180101) - mowiktionary (last dump 20180101) - ru_sibwiki (last dump 20071011) - ukwikiversity (never dumped?)
Is there an uptodate machine-readable list of currently dumped wikis besides https://dumps.wikimedia.org/backup-index.html?
(Off-topic) Spoiler for dump searching tool on my laptop: $ target/release/wdgrep "asdfdefased" /c/Users/xyz/wpdumps/dewiki-20200701-pages-articles-multistream.xml -v --ns 0 Searched 21437.064 MiB in 8.467969 seconds (2531.5474 MiB/s).
Best regards,
Count Count
labswiki and labtestwiki are copies of Wikitech, which is maintained and dumped in a special fashion. You can find those dumps here: https://dumps.wikimedia.org/other/wikitech/dumps/ uk.wikiversity.org does not exist. ecwikimedia, as you rightly note, is private. The remaining wikis have all been deleted. We dump closed wikis but we do not dump deleted ones.
I hope this addresses your concerns.
Ariel
On Sun, Aug 2, 2020 at 1:04 AM Count Count countvoncount123456@gmail.com wrote:
Hi!
I am currently working on a dump search and download tool for all Wikimedia wikis. In order to find out which Wikimedia wikis exist I used Wikidata. While comparing the list of wikis from Wikidata with the list of dumped projects I found out that the following wikis are currently not being dumped:
- alswikibooks (last dump 20180101)
- alswikiquote (last dump 20180101)
- alswiktionary (last dump 20180101)
- ecwikimedia (never dumped, private but not marked private in
Wikidata?)
- fixcopyrightwiki (last dump 20200220)
- labswiki (never dumped?)
- labtestwiki (never dumped?)
- mowiki (last dump 20180101)
- mowiktionary (last dump 20180101)
- ru_sibwiki (last dump 20071011)
- ukwikiversity (never dumped?)
Is there an uptodate machine-readable list of currently dumped wikis besides https://dumps.wikimedia.org/backup-index.html?
(Off-topic) Spoiler for dump searching tool on my laptop: $ target/release/wdgrep "asdfdefased" /c/Users/xyz/wpdumps/dewiki-20200701-pages-articles-multistream.xml -v --ns 0 Searched 21437.064 MiB in 8.467969 seconds (2531.5474 MiB/s).
Best regards,
Count Count _______________________________________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
This does not seem to comply with foundation data protection retention policy for article removal since you are keeping data outside of the policy stated
On 3 Aug 2020, at 06:56, Ariel Glenn WMF ariel@wikimedia.org wrote:
labswiki and labtestwiki are copies of Wikitech, which is maintained and dumped in a special fashion. You can find those dumps here: https://dumps.wikimedia.org/other/wikitech/dumps/ https://dumps.wikimedia.org/other/wikitech/dumps/ uk.wikiversity.org http://uk.wikiversity.org/ does not exist. ecwikimedia, as you rightly note, is private. The remaining wikis have all been deleted. We dump closed wikis but we do not dump deleted ones.
I hope this addresses your concerns.
Ariel
On Sun, Aug 2, 2020 at 1:04 AM Count Count <countvoncount123456@gmail.com mailto:countvoncount123456@gmail.com> wrote: Hi!
I am currently working on a dump search and download tool for all Wikimedia wikis. In order to find out which Wikimedia wikis exist I used Wikidata. While comparing the list of wikis from Wikidata with the list of dumped projects I found out that the following wikis are currently not being dumped: alswikibooks (last dump 20180101) alswikiquote (last dump 20180101) alswiktionary (last dump 20180101) ecwikimedia (never dumped, private but not marked private in Wikidata?) fixcopyrightwiki (last dump 20200220) labswiki (never dumped?) labtestwiki (never dumped?) mowiki (last dump 20180101) mowiktionary (last dump 20180101) ru_sibwiki (last dump 20071011) ukwikiversity (never dumped?) Is there an uptodate machine-readable list of currently dumped wikis besides https://dumps.wikimedia.org/backup-index.html https://dumps.wikimedia.org/backup-index.html?
(Off-topic) Spoiler for dump searching tool on my laptop: $ target/release/wdgrep "asdfdefased" /c/Users/xyz/wpdumps/dewiki-20200701-pages-articles-multistream.xml -v --ns 0 Searched 21437.064 MiB in 8.467969 seconds (2531.5474 MiB/s).
Best regards,
Count Count _______________________________________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org mailto:Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l _______________________________________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
In short, you need to fix Wikidata. I think that's maintained manually. Authoritative sources can be found in the usual places: https://noc.wikimedia.org/
colin johnston, 03/08/20 09:15:
This does not seem to comply with foundation data protection retention policy for article removal
No such policy exists. If you mean the data retention guidelines, these are meant for private data. Publicly contributed information must be assumed to be publicly available forever, as made clear by the privacy policy.
Federico
xmldatadumps-l@lists.wikimedia.org