WikiEN-l October 2005

wikien-l@lists.wikimedia.org

160 participants
455 discussions

Re: [WikiEN-l] How many Arbitrators should we have?
by David Gerard 06 Oct '05

06 Oct '05

Carbonite wrote: >I agree with Tony that it would be difficult to sustain a very large number >of arbitrators. However, if we had an efficient system for replacing >arbitrators, I could see maintaining a "steady state" of 25-35. Replacements >could be appointed by Jimbo, be elected as alternates during the regular >ArbCom elections or we could utilize a system like the one suggested by >Dragons's Flight. At present we have the election, then people drop out, then Jimbo drafts people to cover until December. So far the draftees have been appointed following detailed thought and discussion amongst the existing and previous AC on the AC mailing list (which contains all present AC and any past AC who want to be on it.) About half shudder in horror and say "No thank you!" which doesn't surprise me. I'm not sure it's a sustainable method in the long run. Also, en: is far too big for us to know everyone even a bit. (This may create worries of cabalism amongst the AC. Let me assure you, speaking from the view inside the sausage factory, that the AC members frequently agree on things even less than admins do. Oh boy. But we do respect each others' general cluefulness.) - d.

1 0

Will Wikipedia Swallow Google?
by charles matthews 06 Oct '05

06 Oct '05

Title of a column in today's London Independent, by technology correspondent Charles Arthur. Oh, BTW, it's favourable to WP. Charles

7 9

Re: [WikiEN-l] Tag-teaming (was: Fvw's abuse of admin powers)
by actionforum＠comcast.net 06 Oct '05

06 Oct '05

> Let's not continue subject tags that imply someone is guilty. Most of > the time it's a baseless complaint from someone who was banned or > blocked for a very good reason. > > I have changed the Subject in the e-mail thread as follows: > > * Tag-teaming (was: Fvw's abuse of admin powers) > > And when you reply to _this_ message, please strip off the part in > parentheses, so it looks like this: > > * Tag-teaming Maybe the next reply can strip it off. By accusing Fvw of abuse I was making a point, and that point was not the Fvw is somehow worse than other admins, but rather to make the point that using a system to implement 24 hour blocks that for a long time has been known to routinely block people for much longer is an abuse of admin powers. Regardless of the fact that some oppose the 3RR rule, when the prescribed punishment is a 24 hour block, putting someone in a system that almost certainly will result in a longer block without careful monitoring and intervention by the admin is abusive. Of course having to track the times to make sure this abuse does not happen is a burden for the admins, and they should not be put in this position. The admins should insist the blocking software be fixed, or refuse to use it, rather than abuse either by intent or oversight. -- Silverback

2 1

Re: [WikiEN-l] Afd nominations
by David Gerard 06 Oct '05

06 Oct '05

Ray Saintonge >I agree that most things won't draw such a debate, and even if a lot of >the articles proposed for deletions really are deletable there is still >that small obsessive group determined to preserve our "bodily humours" >in the manner of Dr. Strangelove. They need to be more sensitive to the >efforts of others, and to understand that many of these most bitter >disputes are not about what's in the articles, but about a small group >that wants to control the work of others. Yes. The opposed groups are not "inclusionists" versus deletionists, but *contributors* versus deletionists. (This is still the case even though the deletionists contribute elsewhere.) Then the deletionists wonder why the contributors get so damn stroppy. - d.

9 16

Re: [WikiEN-l] How many Arbitrators should we have?
by DF 05 Oct '05

05 Oct '05

On 10/5/05, geni <geniice(a)gmail.com> wrote: >On 10/5/05, Phroziac <phroziac(a)gmail.com> wrote: > > > > Where and how do you run? > > Nominations have not yet opened. When they do thry > will be hard to miss. Well I don't know about what has officially opened, but there are already 7 people offering statements about why they would be a good candidate: http://en.wikipedia.org/wiki/Wikipedia:Arbitration_Committee_Elections_Dece… -DF

1 0

Tag-teaming (was: Fvw's abuse of admin powers)
by Poor, Edmund W 05 Oct '05

05 Oct '05

Let's not continue subject tags that imply someone is guilty. Most of the time it's a baseless complaint from someone who was banned or blocked for a very good reason. I have changed the Subject in the e-mail thread as follows: * Tag-teaming (was: Fvw's abuse of admin powers) And when you reply to _this_ message, please strip off the part in parentheses, so it looks like this: * Tag-teaming Thank you. Ed Poor One of several mailing list admins > -----Original Message----- > From: Phroziac [mailto:phroziac@gmail.com] > Sent: Wednesday, October 05, 2005 4:52 PM > To: English Wikipedia > Subject: Re: [WikiEN-l] Fvw's abuse of admin powers > > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > steve v wrote: > > If the autoblocker actually adds additional time to a > > block just because someone attempts to edit a page > > before the block is up, then that is a completely > > bullshit configuration for the block script. > It does, and I agree. It blocks you for 24 hours, even if the > block is less then 24 hours. I have no idea who thought this > was a good idea... > > > 3RR was intended to be a protective measure against > destructive revert > > wars, and instead has been turned into a kind punitive > measure which > > often is only unilaterally applied. Some people have gotten into the > > habit of simply gaming the system, by getting a buddy > > to assist in tag-teaming someone else --producing an > > apparent greater culpability on the part of the one > > individual over the other group. > If they are tag teaming, then the article can be protected to > force them to discuss it. Additionally, gaming the system > about 3RR is actually against the 3RR, if you read it > closely. -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.1 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org > > iD8DBQFDRDzSVLzO3sKvVBcRAqcgAJ92W77p0rjaMILXRWxlUEN+8qgXrgCffErB > 8TtYCDrLWg2BSpNFfustcCs= > =//oj > -----END PGP SIGNATURE----- > >

1 0

Re: [WikiEN-l] How many Arbitrators should we have?
by steve v 05 Oct '05

05 Oct '05

Why? Arbcom largely decides how to handle personality disputes, and in the process must deal somewhat with applications of principle -neutrality being first among them. It seems to be popular belief that the Arbcom is understaffed relative to Wikipedia size, anyway. As a consequence, it seems to only thinly treat its duty to be treat each specific point/issue/claim, and in turn this means that its rulings offer only a decree, and show little about the thinking process. Of course, things are open to discussion, but IAC, NPOV is clear enough for a committee to deal with regard to its application to certain points, and is separate from personality issues. Growth often means diversification. SV --- Kelly Martin <kelly.lynn.martin(a)gmail.com> wrote: > On 10/5/05, steve v <vertigosteve(a)yahoo.com> wrote: > > Would this "lower court filter" resemble a > > Wikipedia:NPOV committee? > > > > SV > > At this time, I'm opposed to either ArbCom or any > body subordinate to > it being involved deciding content disputes. > > Kelly > __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com

1 0

Ranking articles using machine-generated stats
by Neil Harris 05 Oct '05

05 Oct '05

Ray Saintonge wrote: > Neil Harris wrote: > >> Phroziac wrote: >> >>> No way would we fit in the 30 volumes of Britannica for this >>> hypothetical print release! Anyway, what if we had a feature in the >>> Wikipedia 1.0 idea, where we could rate how useful the inclusion of an >>> article in a print version would be. This would allow anyone making a >>> print version, be it the foundation, or someone else, to trim wikipedia >>> easier. Certainly you could do it by hand, but eek. that's huge. With >>> our current database dumps, it would already not be unreasonable to >>> make >>> a script to automatically remove articles with stub tags in them. >>> Obviously these would be worthless in a print version. >> >> >> In my opinion, an article ranking system would be an ideal way to >> start collecting data for trying to place articles in rank order for >> inclusion in a fixed amount of space. >> >> One interesting possibility is, in addition to user rankings, using >> the number of times the article's title is mentioned on the web -- >> the Google test -- as an extra input to any hypothetical ranking system. > > > The thing to remember if a ranking system is used is that it is a tool > rather than a solution. It can point to problem articles that need > work. We don't need to be limited to a single algorithm for > evaluating an article. The Google test can be added, but so can > others too. > > Ec > That's right. The _gold standard_ for article assessment is peer review; the next best is based on manual ranking by a sufficiently large and well-distributed group of users; the next best is based on carefully-chosen algorithms which blend together machine-generated statistics and human-generated statistics. Given that we have 750,000+ articles in the English-language Wikipedia alone, it is likely to take some time for reasonable amounts of votes to be accumulated for all articles. According to my earlier calculation, if we wanted to trim en: Wikipedia into 32 volumes, we would need to keep out five out of six articles. (We could keep Wikilinks in, thinly underlined with page/volume references in the margin, for those in the print version, and say dotted underlines for those which exist online but are not in the print version, to let people know there is an online article on that topic). This raises the possibility of using machine-generated statistics to act as a proxy for manual review where it is not yet available. Given a sufficient number of human-rated articles, and a sufficient number of machine-generated statistics for articles, we could use machine learning (a.k.a function approximation) algorithms to attempt to predict the scores of as-yet-unranked articles. This could then be used as a "force multiplier" for human-based ranking, to rank articles which have not yet received sufficient human rankings to be statistically significant. This approach could easily be sanity-checked by taking one random sample of articles as a training set, and another disjoint random sample as a testing set: the predictive power of a machine-learning algorithm trained on the training set could be determined by measuring the quality of its predictions of the true user rankings of the training set. As the number of articles with statistically significant human rankings increase, the algorithm can be re-trained repeatedly; this would also help resist attempt to "game" the ranking algorithm. What statistics could be used as input to this kind of approach? It's not hard to think of possible measures: 0. any available user rankings, by value and number or rankings 0a. stub notices 0b. "disputed" notices, "cleanup" notices, "merge" notices, now, 0c. ...or in the past 0d. has it survived an AfD process? by what margin? 0e. what fraction of edits are reverts? 0f. has it been a featured article? 0g. has it ever been a featured article? 0h. log10(page view rate via logwood) ...and so on... 1. log10(total Google hits for exact phrase searches for title and redirects) 1a. same as above, but limited to .gov or .edu sites 1b. same as above, but using matches _within en: Wikipedia itself_ 1c. same as above, but using _the non-en: Wikipedias_ 1d. same as above, but using matches _within the 1911 EB_ 1e. same as above, but using matches _within the Brown corpus_ 1f. ditto, but within the _NIMA placename databases_ 1g, h. _Brewer's Dictionary of Phrase and Fable_, _Gray's Anatomy_ 1i, j, k, l... the Bible, the Qu'ran, the Torah, the Rig Veda... 1m, n... the collected works of Dickens, Shakespeare... ... and so on, for various other corpora... 2. log10(number of distinct editors for an article) 3. log10(total number of edits for this article, conflating sequential edits by same user) 4. log10(age of the article) 5. size of the article text in words 6. size of the article source in bytes 7. approx. "reading age/ease" of the article, using... 7a. Flesch-Kincaid Grade Level 7b. Gunning-Fog Index 7c. SMOG Index 8. number of inter-language links from/to this article 9. inwards wikilinks, including via redirects, perhaps weighted by referring article scores (although we should be careful not to infringe the Google patents) 10. # of outwards wikilinks 11. # of categories assigned 12. # of redirects created for this article 13. Bayesian scoring for individual words, using the "bag of words" model 13a. as above, but using assigned categories as tokens 13b. as above, but for words in the article title 13c. as above, but for words in edit comments 13d. as above, but for text in talk pages 13e. as above, but for names of templates 13f. as above, but for _usernames of contributing authors_, ho ho ho! 14. shortest category distance from the "fundamental" category 15. shortest wikilink distance from various "seed" pages 16. length of article title, in characters (shorter is "more fundamental"?) 17. length of article title, in words 18. what fraction of the article text contains letters from which other scripts? 19. does it contain images? how many? 19a. what is the images-to-words ratio? 20. what is the average paragraph length? 21. how many subheadings does it have? 22. how many external links does it have in its "external links" section? 23. how many inline links does it contain in the main article body? 23. how many "see also"s does it have? 24. what is the ratio of list items to overall words? 25. what is the ratio of proper nouns (crudely measured) to overall words? ..and so on, and so forth. Some of these are easy to calculate, some are hard. Can anyone think of better ones? Individually, I doubt whether any of these are a really good predictor of article quality. However, learning algorithms are surprisingly good at pattern recognition from very noisy multi-dimensional data. It's quite possible that this approach would work with only a limited number of reasonably statistically independent input metrics (ten?); the huge list above is only to give an idea of the large number of possible choices of article metrics, ranging from the simple to the complex. The corpus-based measures are particularly interesting; they mean we don't need to bug Google for a million search keys. The machine learning algorithm of choice is probably a support vector machine; they're powerful, simple to use, capable of learning highly non-linear functions (for example, recognising handwritten Han characters from preprocessed bitmaps), and there are numerous pre-packaged GPL'd implementations available as tools. No doubt there will be lots of academics who might be willing to assign this as a project or PhD topic to one of their research students. ;-) Before any of this could be possible, we would in any case need the article ranking system to be up and running for some time, which we need anyway for the manual approach. -- Neil

2 2

Paper is not paper
by Daniel P. B. Smith 05 Oct '05

05 Oct '05

It is frequently said that Wikipedia is not paper. Specifically, "Wikipedia is not a paper encyclopedia. This means that there is no practical limit to number of topics we can cover other than verifiability and the other points presented on this page." But paper is not paper, either. That is, paper encyclopedias are NOT physically limited in size. Some encylopedias (Columbia) have one volume. Some have more. The first edition of the Encyclopedia Britannica had three volumes; the Eleventh Edition had 29. The current Britannica 3 has 32 volumes. (By the way, the Britannica states, rather hyperbolically, that those 32 volumes offer "a boundless range of information.") Is the print Britannica limited to 32 volumes by some kind of physical law? Certainly not. In fact, tens of thousands of households that purchase print encyclopedias wisely or foolish subscribe to yearbook programs, often for many years, until they get tired of gluing little cross-reference stickers into their volumes. So the number of books on the shelf actually grows. But there is a practical limit of about thirty volumes for a print publication, isn't there? No, there isn't. The existence proof is any journal. Journals can and do grow linearly, year after year, into long rows of bound volumes which libraries, if not homes, manage to find room for on their shelves. I am sure that some homes have more than 30 bound-volumes-worth of the National Geographic neatly stacked up in attics or basements. So what DOES set the limit to what an encyclopedia can include? It is not any physical characteristic, whether measured in quarto leaves or in bytes. It is that little detail, "verifiability and the other points presented on this page." The limit to what an encyclopedia can include is governed basically by the available labor of editors to integrate, synthesize, verify, copy-edit, and fact-check. What this tells me is that it should be possible to get some kind of reasonable estimate of an appropriate size for Wikipedia by estimating the number of work-hours WIkipedias volunteers put in, and comparing it with the number of work-hours available to the Britannica. If we're putting in three times as much work, we should be able to cover three times as much content. If we try to cover more content than the Britannica without putting in more work than the Britannica, then our reach is exceeding our grasp. I have no idea how to even begin estimating these numbers, but I think it would be instructive to try.

15 20

Re: [WikiEN-l] Re: Taking your eyes off the ball
by Worldtraveller 05 Oct '05

05 Oct '05

I appreciate your help on peer review! I think there are plenty of fields in which layman assistance is essential to keep Wikipedia's audience as broad as possible, and I find thorough reviews by non-astronomers of my astronomy articles very helpful in that respect. Just remember that astronomy is the father of all science, and only bad writing can make an astronomy article less than utterly fascinating :) Cheers, WT -----Original Message----- From: Ryan Norton [mailto:wxprojects@comcast.net] Sent: 05 October 2005 4:11 PM To: English Wikipedia Subject: Re: [WikiEN-l] Re: Taking your eyes off the ball Hi there :), > Ouch! Do you mean astronomy is boring, or my writing about astronomy? > I > do try to write articles that are interesting for people who might only > have a passing interest in astronomy but it's easy for a scientist to > forget what's interesting to people not in the field. That's partly > why I > sought peer review for Herbig-Haro object, because I was half afraid I > was > going to spend lots of time on something no-one would be interested in. > Astrocruft, if you like. Please tell me if I'm heading that way! > > The general point is that for many featured articles, only one person > might be interested in writing about a subject, but its appeal should > be > broad based if it really represents the best of Wikipedia. I'm not > particularly fussed about architects in colonial New Zealand, for > example, > but Giano's articles on Benjamin Mountfort etc are interesting enough > to > keep me reading right through. > > WT Don't get discouraged! You're articles are some of the best around here! I think often though with technical subjects like this it can be tough to write for the layman :). Anyway, I'm still offering responses on your peer reviews :). Thanks, Ryan

1 0

← Newer
1
...
36
37
38
39
40
41
42
...
46
Older →

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

WikiEN-l October 2005