The list of articles was decided here phabricator.wikimedia.org/T120504

There's a page for a few categories, a very short article (Campus honeymoon), a long one (Barack Obama).

As I mentioned in the task it is trivial to run the report with a different sample set and I'm happy to if somebody has the interest to visualize a different set of articles.

Adam for example in the task posted different sets of articles based on other criteria like page views (https://phabricator.wikimedia.org/T120504#1900287), it'd be interesting to run those and see if the trends on navbox and reference sizes hold up.

We're also thinking about running a similar report across a bigger dataset in a more aggregated way, maybe the top 100.000 articles on pageviews with the sizes weighted with the pageviews number to get a more global understanding, but we haven't gotten around to it yet (it would be a new more global, less per-page one).

On Jan 20, 2016 6:53 PM, "Federico Leva (Nemo)" <nemowiki@gmail.com> wrote:

Joaquin Oltra Hernandez, 20/01/2016 16:55:

We've added Mediawiki parser content analysis to the content analysis
report that the Reading web team performed last quarter.

Thanks. It would be useful to understand what your dataset is: I see 9 page titles, presumably fetched from the English Wikipedia. Is this your dataset? How did you ensure it's representative of what users see?

Nemo