Hi everyone,
We're pleased to announce the availability of the MediaWiki Content File Exports https://wikitech.wikimedia.org/wiki/MediaWiki_Content_File_Exports, a new way to access the unparsed content from Wikimedia’s public wikis in XML format.
*What’s Available:* The exports are provided in two datasets, updated monthly starting generation on the 1st:
- mediawiki_content_history https://dumps.wikimedia.org/other/mediawiki_content_history/ - Full revision history for all pages - mediawiki_content_current https://dumps.wikimedia.org/other/mediawiki_content_current/ - Latest revision only for each page
Both are available per wiki in compressed XML format compatible with MediaWiki’s Special:Export https://www.mediawiki.org/wiki/Manual:Parameters_to_Special:Exportand the legacy XML dumps.
*Why the Change:* The legacy dump infrastructure at https://dumps.wikimedia.org/backup-index.html has struggled to reliably produce XML exports for larger wikis. The Data Engineering team has reimplemented this process to ensure this data is accessible long term.
*How to Access:* Files are available at https://dumps.wikimedia.org/other/mediawiki_content_history/ and https://dumps.wikimedia.org/other/mediawiki_content_current/.
For any specific monthly export, say, for https://dumps.wikimedia.org/other/mediawiki_content_current/simplewiki/2026-..., check first for the SHA256SUMS file to confirm the export is complete before downloading. Full instructions are available at: https://wikitech.wikimedia.org/wiki/MediaWiki_Content_File_Exports
*Other Related Changes:* While we’ll continue attempting legacy XML generation for the time being, that path is now deprecated. Note that this deprecation affects only the XML content artifacts. All other SQL dumps of various database tables will continue.
Additionally, the publication of artifacts on the legacy infrastructure will be reduced from the current twice per month cadence to once per month. The incremental dumps at https://dumps.wikimedia.org/other/incr/, which were experimental, will be sunset.
*Questions?* Please review the FAQ https://wikitech.wikimedia.org/wiki/MediaWiki_Content_File_Exports#FAQ on the documentation page. You can also reply here if you have any questions, or follow up at https://phabricator.wikimedia.org/T414389.
Best regards, The Data Engineering Team
xmldatadumps-l@lists.wikimedia.org