I cannot actually answer this question (it is not easy), but I
sometimes get this kind of question related to our main relational
database storage (MariaDB). I am preparing some slides for a
presentation, and took some numbers and wanted to share those with you
(as of June 2019):
* There is approximately 550 TB of used data in the MariaDB-related
servers along the Wikimedia infrastructure (mostly compressed in some
way- InnoDB, gzip, etc.)
* If we do not account for redundancy, 60TB of data is unique (average
of 9x redundancy, which seems about right)
** Of that, 24TB is for insert-only highly-compressed content (External Storage)
** The rest is metadata, local content, misc services, disc cache,
analytics, cloud dbs, and backups.
Please note this doesn't have into account storage in other mediums or
technologies (search, maps, analytics, REST, file storage, etc.). Also
content compression be very efficient so uncompressed data can be much
larger. We are in fact aiming at reducing even more the storage
footprint over the next months.
If someone is interested on seeing size evolution, you can get the
latest up to date metrics on Grafana:
The idea of separating PHPUnit unit and integration/system tests in MediaWiki core has been around for some time. Currently, the tests assume the presence of valid MediaWiki settings and a database connection, meaning one must install & configure MediaWiki and an RDBMS in their local development environment to be able to run the tests. The fact we use a non-standard entry point (phpunit.php) also makes these tests incompatible with existing tooling such as IDE integrations.
At the 2019 hackathon I worked with Amir Sarabadani, Michael Große and Kosta Harlan to perform some preliminary investigation into separating unit tests (that can be run without a database and MediaWiki configuration) into a separate PHPUnit configuration that could be run via the official phpunit binary. After some additional work, this has evolved into a patch that separates 5301 unit tests into a dedicated suite that can be executed via vendor/bin/phpunit in 15 seconds (on my machine!). By contrast, running the same 5301 tests via phpunit.php takes around 30 seconds. Not using phpunit.php here also allows for integrating with e.g. Intellij/PHPStorm’s test execution and code coverage functionality—here’s a screenshot of the execution and coverage information of PathRouterTest via IntelliJ. I feel that these benefits would be felt both by developers and CI maintainers—developers could iterate more rapidly by running the unit tests, and Jenkins would have to spend less time executing the test suite.
I’d like to thank everyone who has supported this enterprise so far—Amir (Ladsgroup) for creating a script to identify the initial set of tests that do not rely on a database and providing assistance throughout, Kosta Harlan for highlighting this old and forgotten issue and bringing it to the hackathon, Bartosz Dziewoński (MatmaRex) for providing a solution to scoping issues around MediaWiki core files, Michael Große for demonstrating the feasibility of this approach in the Wikibase extension test suite, and James Forrester for reviewing the changes and outlining possible next steps.
However, the work is far from done yet! The patch is not yet merged, so any reviews, comments and suggestions would be very welcome there! And if it does get merged, we will have to think about how to bring this separation to extensions’ test suites as well as potentially port more core tests to the unit test suite. So if you have any ideas on how to improve this patch or would like to add to the next steps, don’t hesitate to leave a note :)
+36 30 947 5903
WIKIA sp. z o.o. z siedzibą w Poznaniu, ul. Abp. A. Baraniaka 6
Sąd Rejonowy Poznań – Nowe Miasto i Wilda w Poznaniu, VIII Wydział Gospodarczy Krajowego Rejestru Sądowego, KRS 0000254365
Kapitał zakładowy: 50.000,00 złotych