On Fri, Aug 23, 2013 at 9:24 AM, Lord_Farin lord_farin@proofwiki.orgwrote:
The probability of displaying a "bad" page would be:
B q ((p B)^N - 1) / (p B - 1) + B (p B)^N
(modulo errors), where B is the fraction of bad pages, p is the probability of repeating, q is the probability of displaying (so p+q = 1), and N is the allowed number of repetitions.
I'm going to rewrite that as: B (1-p) ((p B)^N - 1) / (p B - 1) + B (p B)^N ...and I'm also going to take your word on the math, because my brain is lazy this morning.
Let's run the numbers, assuming the 500,000 articles Swedish wiki had in Sept 2012 were all good, and the million articles added since are all bad. Thus B = 2/3. Let's start with N at 5, so worse case we're going to be doing 5x as many SQL queries. p is the tunable parameter. So if: p = 0 prob of getting a bad page = 67% (sanity check, this is what they've got now) p = 0.5 prob of getting a bad page = 50% p = 0.75 prob of getting a bad page = 34% p = 0.80 prob of getting a bad page = 30% p = 0.90 prob of getting a bad page = 20% p = 0.95 prob of getting a bad page = 15% p = 1.00 prob of getting a bad page = 9% (this is set by N)
If you let N go up to 10, then: p = 0.90 prob of getting a bad page = 17% p = 0.95 prob of getting a bad page = 10% p = 1.00 prob of getting a bad page = 1%
My expectation that about a 10% chance of getting a 'bad page' would make Swedish wikipedians happy, so I'd recommend p=1 N=5. But the knobs can be twiddled. --scott