A month ago, PageImages extension[1] was black-deployed, intended to
automatically associate images with articles. It populates its data
when LinksUpdate is run, i.e. when a page or templates it trascludes
is edited or purged. Since then, most of pages were re-parsed, however
slightly less than a million English WP articles remain:
select count(*), avg(page_len) from page where page_namespace=0 and page_is_redirect=0 and
page_touched < '20121229000000';
+----------+---------------+
| count(*) | avg(page_len) |
+----------+---------------+
| 977568 | 3172.0948 |
+----------+---------------+
1 row in set (5 min 59.55 sec)
Waiting for these pages to be updated naturally could take forever:
select min(page_touched) from page where page_namespace=0 and page_is_redirect=0;
+-------------------+
| min(page_touched) |
+-------------------+
| 20090714142954 |
+-------------------+
1 row in set (2 min 15.13 sec)
That was [2] before I purged it: obscure topic, no templates.
Thus, I would like to populate this data with a script[3]. To reduce
the scare, let me remark that these pages have almost no templates and
are significantly smaller than average: 3172 bytes vs. 5673 so they
should be mostly fast to parse.
Is running it a good idea?
-----
[1]
https://www.mediawiki.org/wiki/Extension:PageImages
[2]
https://en.wikipedia.org/wiki/City_of_Melbourne_election,_2008
[3]
https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/extensions/PageImages.git…
--
Best regards,
Max Semenik ([[User:MaxSem]])