Hey,
I'm maintaining some torrents for the past couple months of Wikipedia monthly XML dumps on Academic Torrents in this collection https://academictorrents.com/collection/english-wikipedia.
I'm currently doing time-series research on Wikipedia so I'll be maintaining the collection for at least the next couple years as I work on my dissertation for my PhD (and do related work).
Great to hear, thanks for the torrents Will!
On Fri, Aug 30, 2024 at 4:37 PM Will Beason beason@utexas.edu wrote:
Hey,
I'm maintaining some torrents for the past couple months of Wikipedia monthly XML dumps on Academic Torrents in this collection https://academictorrents.com/collection/english-wikipedia.
I'm currently doing time-series research on Wikipedia so I'll be maintaining the collection for at least the next couple years as I work on my dissertation for my PhD (and do related work).
-- Will Beason (he/him) PhD Student in Information Studies University of Texas at Austin beason@utexas.edu willbeason.com _______________________________________________ Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org To unsubscribe send an email to xmldatadumps-l-leave@lists.wikimedia.org
You're welcome!
Update: I just added torrents for 17 more months from the past 5 years to the list https://academictorrents.com/collection/english-wikipedia. If anyone happens to be aware of other pages-articles-multistream dumps or torrents I can add, please let me know! (If the torrent bundles the title-index, I'll split it into two torrents as quite a few workflows I use only title-index, e.g. tracking article renamings)
On Wed, Sep 4, 2024 at 12:09 PM Xabriel Collazo Mojica < xcollazo@wikimedia.org> wrote:
Great to hear, thanks for the torrents Will!
On Fri, Aug 30, 2024 at 4:37 PM Will Beason beason@utexas.edu wrote:
Hey,
I'm maintaining some torrents for the past couple months of Wikipedia monthly XML dumps on Academic Torrents in this collection https://academictorrents.com/collection/english-wikipedia.
I'm currently doing time-series research on Wikipedia so I'll be maintaining the collection for at least the next couple years as I work on my dissertation for my PhD (and do related work).
-- Will Beason (he/him) PhD Student in Information Studies University of Texas at Austin beason@utexas.edu willbeason.com _______________________________________________ Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org To unsubscribe send an email to xmldatadumps-l-leave@lists.wikimedia.org
-- Xabriel J. Collazo Mojica (he/him, pronunciation https://commons.wikimedia.org/wiki/File:Xabriel_Collazo_Mojica_-_pronunciation.ogg ) Sr Software Engineer Wikimedia Foundation
Oh, and (sorry to send two messages in quick succession, forgot to mention).
If you're reading this and it is 2026 or later: I plan to seed these torrents more or less continuously until January 2026. After that I will be uploading most of them to a long-term file storage service like Amazon S3 Glacier. If you want/need a month on the list that isn't seeded any more, please email me or leave a comment! I'll find a way to get the file to you.
On Sun, Sep 8, 2024 at 3:25 PM Will Beason beason@utexas.edu wrote:
You're welcome!
Update: I just added torrents for 17 more months from the past 5 years to the list https://academictorrents.com/collection/english-wikipedia. If anyone happens to be aware of other pages-articles-multistream dumps or torrents I can add, please let me know! (If the torrent bundles the title-index, I'll split it into two torrents as quite a few workflows I use only title-index, e.g. tracking article renamings)
On Wed, Sep 4, 2024 at 12:09 PM Xabriel Collazo Mojica < xcollazo@wikimedia.org> wrote:
Great to hear, thanks for the torrents Will!
On Fri, Aug 30, 2024 at 4:37 PM Will Beason beason@utexas.edu wrote:
Hey,
I'm maintaining some torrents for the past couple months of Wikipedia monthly XML dumps on Academic Torrents in this collection https://academictorrents.com/collection/english-wikipedia.
I'm currently doing time-series research on Wikipedia so I'll be maintaining the collection for at least the next couple years as I work on my dissertation for my PhD (and do related work).
-- Will Beason (he/him) PhD Student in Information Studies University of Texas at Austin beason@utexas.edu willbeason.com _______________________________________________ Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org To unsubscribe send an email to xmldatadumps-l-leave@lists.wikimedia.org
-- Xabriel J. Collazo Mojica (he/him, pronunciation https://commons.wikimedia.org/wiki/File:Xabriel_Collazo_Mojica_-_pronunciation.ogg ) Sr Software Engineer Wikimedia Foundation
-- Will Beason (he/him) PhD Student in Information Studies University of Texas at Austin beason@utexas.edu willbeason.com
xmldatadumps-l@lists.wikimedia.org