On Fri, Aug 23, 2013 at 9:24 AM, Lord_Farin <lord_farin(a)proofwiki.org>wrote;wrote:
The probability of displaying a "bad" page
would be:
B q ((p B)^N - 1) / (p B - 1) + B (p B)^N
(modulo errors), where B is the fraction of bad pages, p is the
probability of repeating, q is the probability of displaying (so p+q =
1), and N is the allowed number of repetitions.
I'm going to rewrite that as:
B (1-p) ((p B)^N - 1) / (p B - 1) + B (p B)^N
...and I'm also going to take your word on the math, because my brain is
lazy this morning.
Let's run the numbers, assuming the 500,000 articles Swedish wiki had in
Sept 2012 were all good, and the million articles added since are all bad.
Thus B = 2/3. Let's start with N at 5, so worse case we're going to be
doing 5x as many SQL queries. p is the tunable parameter. So if:
p = 0 prob of getting a bad page = 67% (sanity check, this is what
they've got now)
p = 0.5 prob of getting a bad page = 50%
p = 0.75 prob of getting a bad page = 34%
p = 0.80 prob of getting a bad page = 30%
p = 0.90 prob of getting a bad page = 20%
p = 0.95 prob of getting a bad page = 15%
p = 1.00 prob of getting a bad page = 9% (this is set by N)
If you let N go up to 10, then:
p = 0.90 prob of getting a bad page = 17%
p = 0.95 prob of getting a bad page = 10%
p = 1.00 prob of getting a bad page = 1%
My expectation that about a 10% chance of getting a 'bad page' would make
Swedish wikipedians happy, so I'd recommend p=1 N=5. But the knobs can be
twiddled.
--scott
--
(
http://cscott.net)