On Thu, 2003-05-01 at 08:53, Nick Reinking wrote:
Off the top of my head, I can't think of any simple mathematical way to do want you want to do. (that being making articles selected randomly less likely to be selected randomly over time)
Not sure who wants to do that... For me it's quite sufficient to make the probability of selection random, and let the size of the selection field make multiple selections _reasonably_ rare.
The resetting of the random weight upon load is just meant to keep the pot stirring, and "in theory" shouldn't have any effect at all on randomness if the random indexes were random to begin with.
Seems to me that the best way to do this is either:
- Be truly random. In PostgreSQL, this would be something like: SELECT cur_id from cur LIMIT 1 OFFSET random(SELECT count(cur_id) FROM cur) (this function should be really fast, not sure if faster than:)
Mysql doesn't seem to let you use a non-constant expression for 'LIMIT offset,count' (and besides, still no subqueries), but we can pick a random number easily enough in PHP. The problem is that we aren't sure what range to use; hence a random index attached to each article whose range is known (between 0 and 1).
We don't want to select non-article pages or redirects, but we don't currently have a count of exactly that. There's the "good articles" count, but it has other criteria, so if we used it we'd always crop off relatively recent articles. If we inflate it based on an estimate we'd get overemphasis on the _most_ recent article if we estimate too high.
Sure, there's count(*), but that takes time to run... might be faster with an index specifically on cur_namespace,cur_is_redirect. Or we could maintain another column in the stats table which counts non-redirect article pages with _no_ other criteria (such as size or comma/bracket count). Rebuilding indexes on cur is _very_ slow, so if we need to change the indexes, best to plan ahead.
-- brion vibber (brion @ pobox.com)