The list of articles was decided here
phabricator.wikimedia.org/T120504
There's a page for a few categories, a very short article (Campus
honeymoon), a long one (Barack Obama).
As I mentioned in the task it is trivial to run the report with a different
sample set and I'm happy to if somebody has the interest to visualize a
different set of articles.
Adam for example in the task posted different sets of articles based on
other criteria like page views (
https://phabricator.wikimedia.org/T120504#1900287), it'd be interesting to
run those and see if the trends on navbox and reference sizes hold up.
We're also thinking about running a similar report across a bigger dataset
in a more aggregated way, maybe the top 100.000 articles on pageviews with
the sizes weighted with the pageviews number to get a more global
understanding, but we haven't gotten around to it yet (it would be a new
more global, less per-page one).
On Jan 20, 2016 6:53 PM, "Federico Leva (Nemo)" <nemowiki(a)gmail.com>
wrote:
Joaquin Oltra Hernandez, 20/01/2016 16:55:
We've added Mediawiki parser content analysis
to the content analysis
report that the Reading web team performed last quarter.
Thanks. It would be useful to understand what your dataset is: I see 9
page titles, presumably fetched from the English Wikipedia. Is this your
dataset? How did you ensure it's representative of what users see?
Nemo