Re: [Wikitech-l] Weighted random article

23 Aug 2013

On Fri, Aug 23, 2013 at 9:24 AM, Lord_Farin &lt;lord_farin(a)proofwiki.org&gt;wrote;wrote:

...
  The probability of displaying a "bad" page
would be:

 B q ((p B)^N - 1) / (p B - 1) + B (p B)^N

 (modulo errors), where B is the fraction of bad pages, p is the
 probability of repeating, q is the probability of displaying (so p+q =
 1), and N is the allowed number of repetitions.

I'm going to rewrite that as:
B (1-p) ((p B)^N - 1) / (p B - 1) + B (p B)^N
...and I'm also going to take your word on the math, because my brain is
lazy this morning.

Let's run the numbers, assuming the 500,000 articles Swedish wiki had in
Sept 2012 were all good, and the million articles added since are all bad.
 Thus B = 2/3.  Let's start with N at 5, so worse case we're going to be
doing 5x as many SQL queries.  p is the tunable parameter.  So if:
 p = 0    prob of getting a bad page = 67% (sanity check, this is what
they've got now)
 p = 0.5 prob of getting a bad page = 50%
 p = 0.75 prob of getting a bad page = 34%
 p = 0.80 prob of getting a bad page = 30%
 p = 0.90 prob of getting a bad page = 20%
 p = 0.95 prob of getting a bad page = 15%
 p = 1.00 prob of getting a bad page = 9% (this is set by N)

If you let N go up to 10, then:
 p = 0.90 prob of getting a bad page = 17%
 p = 0.95 prob of getting a bad page = 10%
 p = 1.00 prob of getting a bad page =  1%

My expectation that about a 10% chance of getting a 'bad page' would make
Swedish wikipedians happy, so I'd recommend p=1 N=5.  But the knobs can be
twiddled.
  --scott

-- 
(http://cscott.net)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Weighted random article