On Tue, Dec 20, 2016 at 12:45 AM, Quim Gil qgil@wikimedia.org wrote:
The questions for this session are being crowdsourced at http://www.allourideas.org/wikidev17-product-technology-questions. Anyone can propose questions and vote, anonymously, as many times as you want. At the moment, we have 25 questions and 451 votes.
An important technical detail: questions posted later have also good chances to make it to the top of the list as long as new voters select them. The ranking is made out of comparisons between questions, not accumulation of votes. For instance, the current top question is in fact one of the last that has been submitted so far.
Right now the top question has a score of 70 based on 88 votes; the second question has a score of 67 based on 1 vote. (This is not some super-rare accident, either: number 8 and 9 on the popularity list both have 4 votes.) I argued that All Our Ideas is too experimental to be relied on back when it was considered as the voting tool for an early iteration of what ended up being the Community Tech Wishlist, and I still think that's the case.
The way their voting system works is that they assume each idea has some appeal (an arbitrary real number) for each voter, the appeals for a given idea are normally distributed, and when a voter is shown a question pair, their probability of voting a given way is a certain function of the difference in appeals. They then use various statistical methods to come up with random values for the appeals which match the observed votes, and using those values they can calculate the probability for each question that a randomly selected voter would prefer that question to a randomly selected alternative; those probabilities are used to score the questions.
That means that the scores can be heavily underspecified (ie. mostly result from the random numbers generated by their algorithm and not actual votes) for some questions; this is especially true for recently submitted questions, which have a very small number of votes, so they will basically get a random position in the ranking. As far as I can see, the journal article [1] where they present their method doesn't discuss this problem at all. This is not terribly useful as a real-world ranking model IMO, so I hope that 1) there will be some human oversight when evaluating the results, and 2) that we don't intend to use this system for any voting that actually matters (getting weirdly prioritized results for a Q&A session is, of course, not a huge deal).
[1] http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0123483