I'm designing an experiment and want a random sample of wiki articles. The 'Random article' seems like a convenient way of generating these with having to compile a list of the population of articles myself.
My hunch (based on clicking it lots and very little else), is that 'Random article' is a uniform sampling of pages in article namespace, excluding redirects but including disambiguation pages. As implemented on en.wiki (which is the wiki I'm starting on) it probably has a slight bias against very recently created pages (due to cross-server synchronization).
Has anyone looked into this?
cheers stuart
I don't know if anyone's looked into this, I'm afraid. I'd be interested to see what our replication lag on production is. I imagine it's pretty small, and so the impact would be negligible, but...
On 27 June 2014 23:24, stuart yeates syeates@gmail.com wrote:
I'm designing an experiment and want a random sample of wiki articles. The 'Random article' seems like a convenient way of generating these with having to compile a list of the population of articles myself.
My hunch (based on clicking it lots and very little else), is that 'Random article' is a uniform sampling of pages in article namespace, excluding redirects but including disambiguation pages. As implemented on en.wiki (which is the wiki I'm starting on) it probably has a slight bias against very recently created pages (due to cross-server synchronization).
Has anyone looked into this?
cheers stuart
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
stuart yeates, 28/06/2014 05:24:
Has anyone looked into this?
https://bugzilla.wikimedia.org/show_bug.cgi?id=65366 was just fixed.
Nemo
wiki-research-l@lists.wikimedia.org