A month ago, PageImages extension[1] was black-deployed, intended to automatically associate images with articles. It populates its data when LinksUpdate is run, i.e. when a page or templates it trascludes is edited or purged. Since then, most of pages were re-parsed, however slightly less than a million English WP articles remain:
select count(*), avg(page_len) from page where page_namespace=0 and page_is_redirect=0 and page_touched < '20121229000000'; +----------+---------------+ | count(*) | avg(page_len) | +----------+---------------+ | 977568 | 3172.0948 | +----------+---------------+ 1 row in set (5 min 59.55 sec)
Waiting for these pages to be updated naturally could take forever:
select min(page_touched) from page where page_namespace=0 and page_is_redirect=0; +-------------------+ | min(page_touched) | +-------------------+ | 20090714142954 | +-------------------+ 1 row in set (2 min 15.13 sec)
That was [2] before I purged it: obscure topic, no templates.
Thus, I would like to populate this data with a script[3]. To reduce the scare, let me remark that these pages have almost no templates and are significantly smaller than average: 3172 bytes vs. 5673 so they should be mostly fast to parse.
Is running it a good idea?
----- [1] https://www.mediawiki.org/wiki/Extension:PageImages [2] https://en.wikipedia.org/wiki/City_of_Melbourne_election,_2008 [3] https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/extensions/PageImages.git;...