On 23/08/13 10:48, Lars Aronsson wrote:
But it is not obvious how a bug report or feature request should be written. A naive approach would be to ask for a random article that wasn't created by a bot, but this is not to the point.
That was my solution when this issue came up on the English Wikipedia:
http://www.mediawiki.org/wiki/Special:Code/MediaWiki/4256
The configured SQL excluded pages most recently edited by Rambot. Derek Ramsey was opposed to it, since he thought his US census stubs deserved eyeballs just as much as any hand-written article, but IIRC I managed to get this solution deployed, at least for a year or two.
Users want bot generated articles to come up, only not so often. And some manually written article stubs are also less wanted. Perhaps the random function should be weighted by article length or by the number of page views? But is it practical to implement such a weighted random function? Are the necessary data in the database?
It would not be especially simple. The existing database schema does not allow weighted random selection. A special data structure could be used, or it could be implemented (inefficiently) in Lucene.
An approximation would be to select, say, 100 articles from the database using page_random, then calculate a weight for each of those 100 articles using complex criteria, then do a weighted random selection from those 100 articles.
Article length is in the database, but page view count is not.
-- Tim Starling