Brion wrote:
On Tue, 3 Jun 2003, Hr. Daniel Mikkelsen wrote:
On Tue, 3 Jun 2003, Thomas Luft wrote: The random page on the English one isn't random either. You will
regularly get
the same page again and again if you try a few times.
Grrr.... looks like the random indices are all off again; somehow MySQL's rand() function is biasing high the way we've been using it, and there are _very_ few articles set with lower indexes (<0.25), so those few get pulled up way too often. I've just told it to redo all the random indexes on the german wiki in a lump; I'll reset them on the English wiki later tonight when traffic is lower.
Unless you've already fixed it, the English cur_random column is still fine.
SELECT cur_random FROM cur WHERE cur_random>0.01 ORDER BY cur_random LIMIT 10
returns...
0.0100032491702617 0.010005122059961 0.010018127048405 0.0100242663268226 0.0100461980421526 0.0100568952132546 0.0100595876668204 0.0100729866047138 0.0100769354124339 0.0100776087586559
Sounds to me like you fixed the English one after I first described the cause of the problem a month ago, but you didn't fix the other languages. If in fact the English cur_random was stuffed up again, and you fixed it before I ran the above query, I want to know about it. I consider this a pet bug of mine now.
I've also gone ahead and replaced the random seed generator in the wiki and changed Special:Random to use its own random number instead of asking for one from MySQL. I don't trust MySQL anymore. :) And I took out the reset-index-on-load, which was probably trouble.
Why not just go the whole hog and use a noisy diode? ;)
I was pretty confident I worked out the problem last time around. I even wrote a little program simulating the behaviour of the previous version of Special:Randompage. It's attached. Compile it with "g++ drift_test.cc" and watch all those "random" numbers gravitate towards 1.0 like it's a hot woman at a party or something.
Hr. Daniel Mikkelsen daniel@copyleft.no wrote: <snip>
The random page on the English one isn't random either. You will regularly get the same page again and again if you try a few times.
This should have been fixed a month ago, when Brion reset the index. Have you checked since then?
-- Tim Starling.
_________________________________________________________________ Get mobile Hotmail. Go to http://ninemsn.com.au/mobilecentral/signup.asp
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Je Merkredo 04 Junio 2003 00:06, Tim Starling skribis:
Unless you've already fixed it, the English cur_random column is still fine.
SELECT cur_random FROM cur WHERE cur_random>0.01 ORDER BY cur_random LIMIT 10
Nope, it got broke again. Try restricting the query to the pages Special:Randompage is concerned with:
SELECT cur_random FROM cur WHERE cur_random>0.01 and cur_namespace=0 and cur_is_redirect=0 ORDER BY cur_random LIMIT 10
Not only did it take two minutes to return (mmm, partial table scans), but there are *huge* gaps: 0.010371912280444 0.0106440456683226 0.0130615625652127 0.0138753941411855 0.0403562412041763 0.0404494861517562 0.044624416199163 0.0523233311738105 0.0542095322666779 0.0586327482560954
When I checked this afternoon, there were only 82 article-space non-redirect pages with cur_random values less than 0.22something. 82 out of a hundred some thousand... so, no wonder that the page with the particular value was coming up repeatedly in my Random page clicks.
I'm regenerating the index again now...
- -- brion vibber (brion @ pobox.com)
Sorry to pine users...
Taking the SQL query used in Special:Randompage from CVS and modifying it very slightly...
SELECT cur_id,cur_title,cur_random FROM cur USE INDEX (cur_random) WHERE cur_namespace=0 AND cur_is_redirect=0 AND cur_random>RAND() ORDER BY cur_random LIMIT 20
returns...
cur_id cur_title cur_random 124125 Pierce,_Nebraska 0.0030205277754185 205997 Wagh_el_Birket 0.00385735184313483 120605 Custer_Township,_Minnesota 0.00416424684614339 131375 Lorane,_Pennsylvania 0.00439120363853053 150887 Columbiana,_Ohio 0.00589350611520326 53913 Castle_Rock 0.00614019670164231 10438 Komyo 0.00616735406794339 131027 Newberg,_Oregon 0.00645017624502087 120060 Hartland_Township,_Minnesota 0.00903007575220435 126590 Osceola,_New_York 0.00905275718220766
It doesn't always return the same articles, but they're always very low-numbered. I don't know about you, but I would call that a MySQL bug.
May I make a suggestion, while we're on the topic? How about changing the query to:
SELECT cur_id,cur_title,cur_random FROM cur USE INDEX (cur_random) WHERE cur_namespace=0 AND cur_is_redirect=0 AND cur_random>{$rand} AND cur_user<>3903 AND cur_user<>6120 ORDER BY cur_random LIMIT 20
which will skip anything last edited by Ram-Man or Rambot. Like Daniel Mikkelsen said, the most important function for Special:Randompage is to impress passers-by. We should rig it any way we can to make Wikipedia look better.
-- Tim Starling.
On Wed, 4 Jun 2003, Tim Starling wrote:
Taking the SQL query used in Special:Randompage from CVS and modifying it very slightly...
[snip]
It doesn't always return the same articles, but they're always very low-numbered. I don't know about you, but I would call that a MySQL bug.
Hmm... let's simplify this further:
mysql> select cur_random from cur where cur_random > RAND() limit 20; +---------------------+ | cur_random | +---------------------+ | 0.00257335324080042 | | 0.00301321596187839 | | 0.00409562141084636 | | 0.00434284564512115 | | 0.00447388831942704 | | 0.00527506415292161 | | 0.00677824021017015 | | 0.00724384654987962 | | 0.00791455340377479 | | 0.00809311867513984 | | 0.00832060632139501 | | 0.00845975429607532 | | 0.00916914828975606 | | 0.00930567272874124 | | 0.010219381200354 | | 0.010613451721718 | | 0.011154617193299 | | 0.0122322952488738 | | 0.0126715852065679 | | 0.0127805173516092 | +---------------------+
This quite consistently gives me results in the 0.001-0.017 range.
The results are presumably already sorted by the use of the index, but it's definitely odd that they seem to so consistently come out so small. Because, we *don't* see that if we grab a rand() value as a column: mysql> select cur_random,rand() from cur where cur_random > RAND() limit 20;+---------------------+-------------------+ | cur_random | rand() | +---------------------+-------------------+ | 0.00180685892869426 | 0.059753977749268 | | 0.00333090965014967 | 0.27524542461638 | | 0.00345821027034727 | 0.083287411446951 | | 0.00541453902182592 | 0.1797017885183 | | 0.00616005901820963 | 0.10850227168622 | | 0.00718451943917621 | 0.44687699754432 | | 0.00725775678386703 | 0.24804242723439 | | 0.0073513565653482 | 0.66955343696247 | | 0.00753892400072787 | 0.58810817505057 | | 0.00818642974662262 | 0.35786299627075 | | 0.00856924430333939 | 0.92427461121629 | | 0.00867950265172823 | 0.58906755278731 | | 0.00916074124086717 | 0.16777823601642 | | 0.00939816703032532 | 0.19605916291108 | | 0.00979022216963603 | 0.10278878091163 | | 0.0102785686126711 | 0.27433319694766 | | 0.0103007052189677 | 0.76059719990995 | | 0.0105801159614512 | 0.25284009636644 | | 0.0111034736140663 | 0.53778274221139 | | 0.0113199194998666 | 0.046922132416556 | +---------------------+-------------------+
Note that the rand() we WHERE with and the rand() we SELECT are separate invocations of the function, and don't return the same result as each other.
Hmm, let's look at the docs:
Note that a RAND() in a WHERE clause will be re-evaluated every time the WHERE is executed. RAND() is not meant to be a perfect random generator, but instead a fast way to generate ad hoc random numbers that will be portable between platforms for the same MySQL version. -- http://www.mysql.com/doc/en/Mathematical_functions.html
*WHAM WHAM WHAM*
Now, let's think what that means. We're selecting for cur_random. It uses the index on cur_random, so it's going to sort starting from the infintesimally small end, but can't use a constant to index by because the WHERE clause is a function -- we have to scan. For each row it makes up a random number, and sees if this row if at least as big. If yes, it puts the row in the return queue. If no, it goes to the next row and makes up another random number.
Some portion of those small-numbered rows are going to match, and at some point we fill up our quota and return the matching rows.
AAAAAAGGGGGHHHHH!!!!! :)
It's not a MySQL _bug_, just a very non-intuitive behavior which leads to over-biasing to the low-end when we misuse it (and the updates to 'stir the pot' would thus tend to depopulate the low-end and bias the value distribution high). Generating one random number ourselves and giving it to mysql as a constant, as my recent update does, should solve this.
May I make a suggestion, while we're on the topic? How about changing the query to:
[snip]
AND cur_random>{$rand} AND cur_user<>3903 AND cur_user<>6120
[snip]
which will skip anything last edited by Ram-Man or Rambot. Like Daniel Mikkelsen said, the most important function for Special:Randompage is to impress passers-by. We should rig it any way we can to make Wikipedia look better.
That's awfully specific to be hard-coding. :)
-- brion vibber (brion @ pobox.com)
AND cur_random>{$rand} AND cur_user<>3903 AND cur_user<>6120
That's awfully specific to be hard-coding. :)
Add a settings variable that's a partial SQL query. I.e., in LocalSettings, put something like:
$wgRestrictRandom = "cur_user<>3903 AND cur_user<>6120";
then in DefaultSettings, put:
$wgRestrictRandom = "TRUE";
then the query becomes:
WHERE ($randonQuery) AND ($wgRestrictRandom)
Any individual wiki-specific hack can be done this way while keeping all source files identical except for LocalSettings.
On Wed, 4 Jun 2003, Tim Starling wrote:
Hr. Daniel Mikkelsen daniel@copyleft.no wrote:
The random page on the English one isn't random either. You will regularly get the same page again and again if you try a few times.
This should have been fixed a month ago, when Brion reset the index. Have you checked since then?
Yes, this happened to me about a week ago.
-- Daniel
wikitech-l@lists.wikimedia.org