I am from time to time using dumps for parsing data that I cannot get via SQL/API. For example in summer I fetched Wikimedia Commons page history for getting the list of old categories of images so that I would not be re-inserting categories by bot which were least once removed from the photo.
Br, -- Kimmo Virtanen, Zache
On Tue, Oct 8, 2024 at 6:59 PM Bryan Davis bd808@wikimedia.org wrote:
I was asked recently what I knew about the types of tools that use data from the https://dumps.wikimedia.org/ project. I had to admit that I really didn't know of many tools off the top of my head that relied on dumps. Most of the use cases I have heard about are for research topics like looking at word frequencies and sentence complexity, or machine learning things that consume some or all of the wiki corpus.
Do you run a tool that needs data from Dumps to do its job? I would love to hear some stories about how this data helps folks advance the work of the movement.
Bryan
Bryan Davis Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808 _______________________________________________ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/