Hi,

We've added Mediawiki parser content analysis to the content analysis report that the Reading web team performed last quarter.

We also added the option to see the Gzip (lvl6) version of the report to have a look at more realistic numbers (since traffic is gzipped in prod) (see select box at the top).

http://chimeces.com/loot-content-analysis/

No surprises, the results are pretty similar to the restbase analysis, in that navboxes are around 14% of the content and references are around 50%.

Request: If you know about useless html markup emitted by the mediawiki parser and would like to see what % of the content it accounts for, please answer here or in the task with examples and we'll add it to the report (like we did with restbase and the extraneous markup).

Related phab task: https://phabricator.wikimedia.org/T123325

Thanks,

Joaquin