Hello,
I'm currently looking for latest Wikipedia data dumps that includes the complete history of Wikipedia edits for research purpose. I'm aware that a similar data dump exists for Wikidata edits, but I haven't been able to locate the same for Wikipedia. Despite checking https://dumps.wikimedia.org/, I couldn't find the latest dump featuring Wikipedia edits. I would greatly appreciate any help in this matter.
Cheers,
Hrishi
Greetings XML Dump users and contributors!
This is your automatic monthly Dumps FAQ update email. This update
contains figures for the 20231101 full revision history content run.
We are currently dumping 982 projects in total.
---------------------
Stats for snwiktionary on date 20231101
Total size of page content dump files for articles, current content only:
506,341
Total size of page content dump files for all pages, current content only:
607,858
Total size of page content dump files for all pages, all revisions:
741,521
---------------------
Stats for enwiki on date 20231101
Total size of page content dump files for articles, current content only:
96,959,256,118
Total size of page content dump files for all pages, current content only:
200,144,987,436
Total size of page content dump files for all pages, all revisions:
27,663,890,580,870
---------------------
Sincerely,
Your friendly Wikimedia Dump Info Collector
Greetings XML Dump users and contributors!
This is your automatic monthly Dumps FAQ update email. This update
contains figures for the 20231001 full revision history content run.
We are currently dumping 978 projects in total.
---------------------
Stats for siwikibooks on date 20231001
Total size of page content dump files for articles, current content only:
132,780,514
Total size of page content dump files for all pages, current content only:
137,973,182
Total size of page content dump files for all pages, all revisions:
400,297,875
---------------------
Stats for enwiki on date 20231001
Total size of page content dump files for articles, current content only:
96,514,232,871
Total size of page content dump files for all pages, current content only:
199,261,948,528
Total size of page content dump files for all pages, all revisions:
27,494,135,074,915
---------------------
Sincerely,
Your friendly Wikimedia Dump Info Collector
Hello folks!
For some years now, I've been the main or only point of contact for the
Wiki project sql/xml dumps semimonthly, as well as for a number of
miscellaneous weekly datasets.
This work is now passing to Data Platform Engineering (DPE), and your new
points of contact, starting right away, will be Will Doran (email:wdoran)
and Virginia Poundstone (email:vpoundstone). I'll still be lending a hand
in the background for a little while but by the end of the month I'll have
transitioned into a new role at the Wikimedia Foundation, working more
directly on MediaWiki itself.
The Data Products team, a subteam of DPE, will be managing the current
dumps day-to-day, as well as working on a new dumps system intended to
replace and greatly improve the current one. What formats will it produce,
and what content, and in what bundles? These are all great questions, and
you have a chance to help decide on the answers. The team is gathering
feedback right now; follow this link [
https://docs.google.com/forms/d/e/1FAIpQLScp2KzkcTF7kE8gilCeSogzpeoVN-8yp_S…]
to give your input!
If you want to follow along on work being done on the new dumps system, you
can check the phabricator workboard at
https://phabricator.wikimedia.org/project/board/6630/ and look for items
with the "Dumps 2.0" tag.
Members of the Data Products team are already stepping up to manage the
xmldatadumps-l mailing list, so you should not notice any changes as far as
that goes.
And as always, for dumps-related questions people on this list cannot
answer, and which are not covered in the docs at
https://meta.wikimedia.org/wiki/Data_dumps or
https://wikitech.wikimedia.org/wiki/Dumps you can always email ops-dumps
(at) wikimedia.org.
See you on the wikis!
Ariel Glenn
ariel(a)wikimedia.org
Greetings XML Dump users and contributors!
This is your automatic monthly Dumps FAQ update email. This update
contains figures for the 20230901 full revision history content run.
We are currently dumping 977 projects in total.
---------------------
Stats for zh_min_nanwikisource on date 20230901
Total size of page content dump files for articles, current content only:
24,387,985
Total size of page content dump files for all pages, current content only:
25,074,834
Total size of page content dump files for all pages, all revisions:
107,678,639
---------------------
Stats for enwiki on date 20230901
Total size of page content dump files for articles, current content only:
96,081,888,863
Total size of page content dump files for all pages, current content only:
198,396,512,576
Total size of page content dump files for all pages, all revisions:
27,333,035,543,183
---------------------
Sincerely,
Your friendly Wikimedia Dump Info Collector
Hi,
I'm interested to obtain and reuse Wikipedia browsing and search logs by users. I need them for teaching purposes, so I just need a sample of them.
I located a 2012 article announcing the release of search logs to Wikipedia: Wikimedia releases anonymous search log files for Wikipedia, pulls the logs [Updated] https://thenextweb.com/news/wikimedia-releases-anonymous-search-log-files-w… I've checked the source referred in this article, dumps.wikimedia.org/other/search, but I was unable to locate the logs on it.
Thank you in advance.
Miquel Centelles
Hello everyone,
I'd like to share my automated deployments of the Wikipedia category trees for multiple languages. The automated deployments are stored on GitHub at https://github.com/jon-edward/wiki_categories_datastore
The utility for collecting and trimming the category trees can be found here https://github.com/jon-edward/wiki_categories
My next steps will probably be to create a dashboard for people to download this data through a comfortable UI instead of raw GH files but I wanted to share my progress because I feel that this might be useful in its current state to someone needing the category tree data.
Thank you for reading,
jon-edward
------- Original Message -------
On Thursday, September 7th, 2023 at 3:51 PM, johnvikterschmiytz <johnvikterschmiytz(a)proton.me> wrote:
> Hi, I would like to request a copy of the full history dump of English Wiktionary taken on May 1, 2020. Is this possible? Thanks!
>
> Sent with [Proton Mail](https://proton.me/) secure email.