Once again: In memoriam

List overview All Threads
Download

newer

older

Santa Claus & Snowmen

Re: Re: [Wikipedia-l]...

Jeroen Heijmans

25 Oct 2002 25 Oct '02

9:16 a.m.

Yes, again about the [[September 11, 2001 Terrorist Attack/In Memoriam]] page cs. Is there yet a solution? Last month, when I brought up this subject again (it wasn't the first time), it was said that we should not be hasty. Well, there's been enough time not to be hasty now. I really think this issue should finally be resolved, otherwise I'll be so bold to move everything to meta to let it wait there for the new destination. There's no reason to let these pages be around anymore. Wikipedia is an encyclopedia.

Jeronimo

Show replies by date

tarquin

25 Oct 25 Oct

10:44 a.m.

New subject: 80,000

The Cunctator wrote:

...

By the way, with Ram-Man's automatic article addition, we've just hit 80.000 English entries

Yes. We're big like a telephone directory is big. Please let's not celebrate this milestone. When we got to 50k articles we were saying "but so many are stubs", and we're in a worse situation now, as I don't see how we can exclude these town pages from article counts. They really are stubs, maybe not in terms of length but in terms of content and usefulness.

It's not that I don't care about one-horse US towns (well, okay, I don't) but I feel the entire Wikipedia has tilted. We shouldn't delete them, but to balance we should maybe work on, say, adding 10,000 articles to the Tree of Life project?

Karen AKA Kajikit

11:59 a.m.

New subject: 80,000

tarquin wrote:

...

The Cunctator wrote:

...
By the way, with Ram-Man's automatic article addition, we've just hit 80.000 English entries

Yes. We're big like a telephone directory is big. Please let's not celebrate this milestone. When we got to 50k articles we were saying "but so many are stubs", and we're in a worse situation now, as I don't see how we can exclude these town pages from article counts. They really are stubs, maybe not in terms of length but in terms of content and usefulness.

It's not that I don't care about one-horse US towns (well, okay, I don't) but I feel the entire Wikipedia has tilted. We shouldn't delete them, but to balance we should maybe work on, say, adding 10,000 articles to the Tree of Life project?

It's seriously tipped the balance of the wikipedia. Five times out of ten hitting the 'random' button takes me to one of these new entries... I'm sure they're very useful if you need to find information on 'Dead Horse, Arizona', but they're not very relevant in the grand scheme of things. I'll be glad once they're all in and the recent changes menu can return to usefulness. Out of curiosity, what existing entries are these new articles linked to?

-- Karen AKA Kajikit To err is human... to really foul things up add kitten and stir. Come and visit my part of the web: Kajikit's Corner: http://Kajikit.netfirms.com/ Aussie Support Mailing List: http://groups.yahoo.com/group/AussieSupport Allergyfree Eating Recipe Swap: http://groups.yahoo.com/group/Allergyfree_Eating Ample Aussies Mailing List: http://groups.yahoo.com/group/ampleaussies/

Gareth Owen

5:22 p.m.

New subject: 80,000

tarquin tarquin@planetunreal.com writes:

...

It's not that I don't care about one-horse US towns (well, okay, I don't)

Would it be possible to move the fantastic-waste-of-space that is Ram-Man's latest datadump to gazetteer.wikipedia.com, or somesuch, so they don't clutter up the *worthwhile* *encyclopedic* entries?

-- Gareth Owen "Wikipedia does rock. By the count on the "brilliant prose" page, there are 14 not-bad articles so far" -- Larry Sanger (12 Jan 2001)

The Cunctator

5:41 p.m.

New subject: 80,000

On 10/25/02 1:22 PM, "Gareth Owen" wiki@gwowen.freeserve.co.uk wrote:

...

tarquin tarquin@planetunreal.com writes:

...
It's not that I don't care about one-horse US towns (well, okay, I don't)

Would it be possible to move the fantastic-waste-of-space that is Ram-Man's latest datadump to gazetteer.wikipedia.com, or somesuch, so they don't clutter up the *worthwhile* *encyclopedic* entries?

How do they clutter the "worthwhile" entries? Wikipedia is not paper.

If the problem is with the "Random page" function, then perhaps some tweaking there would be reasonable, as long as it's noted somewhere that "random page" isn't exactly random.

And though I'm not a huge fan of the entries, they are well done.

Gareth Owen

5:53 p.m.

New subject: 80,000

The Cunctator cunctator@kband.com writes:

...

How do they clutter the "worthwhile" entries? Wikipedia is not paper.

RandomPages RecentChanges

...

And though I'm not a huge fan of the entries, they are well done.

What is the point of well done bad articles?

-- Gareth Owen "Wikipedia does rock. By the count on the "brilliant prose" page, there are 14 not-bad articles so far" -- Larry Sanger (12 Jan 2001)

Imran Ghory

26 Oct 26 Oct

12:19 p.m.

New subject: 80,000

On 25 Oct 2002, Gareth Owen wrote:

...

The Cunctator cunctator@kband.com writes:

...
How do they clutter the "worthwhile" entries? Wikipedia is not paper.

RandomPages

RandomPages is one reason for which we really need article classification by topic, as at the moment it roughly assumes that articles are going to spread across all topics equally. If that ever was true it certainly isn't true now. Even if autogenerated articles were excluded, in the long term we would get certain topics having far more articles than others.

To avoid giving a skewed view of the pedia we need some sort of classification systems so that the RandomPage function first randomly chooses a topic and then returns a random article from that topic.

Imran

-- http://bits.bris.ac.uk/imran

Krzysztof P. Jasiutowicz

3:37 p.m.

New subject: 80,000

On 26-10-2002, Imran Ghory wrote thusly :

...

On 25 Oct 2002, Gareth Owen wrote:

...
The Cunctator cunctator@kband.com writes:

...
How do they clutter the "worthwhile" entries? Wikipedia is not paper.

RandomPages

RandomPages is one reason for which we really need article classification by topic, as at the moment it roughly assumes that articles are going to spread across all topics equally. If that ever was true it certainly isn't true now. Even if autogenerated articles were excluded, in the long term we would get certain topics having far more articles than others. To avoid giving a skewed view of the pedia we need some sort of classification systems so that the RandomPage function first randomly chooses a topic and then returns a random article from that topic.

There was some discussion about classification, metadata and solutions to tackle the growing volume of information. Then there were 10,000-15,000 articles. It was either rebuked or the importance was minimized. To classify was thought unwiki and impracticable ("who will classify, according to what criteria, what about multiple categories and wrongly assigned articles who would correct them"). Manning was in favour and I think it is thanks to him we have Wikiprojects. Long ago I created metadata subpages for a few articles as an excercise in this direction but probably they got votes for deletion as a search engine whoring.

Have you worked out some solutions ? How do you think it could be accomplished ?

Regards, Kpjas.

Imran Ghory

5:10 p.m.

New subject: 80,000

On Sat, 26 Oct 2002, Krzysztof P. Jasiutowicz wrote:

...

Have you worked out some solutions ? How do you think it could be accomplished ?

Whoever did [[Wikipedia:Category_experiment]] seems to have found a reasonably good solution, at least well enough for the purpose of random article selection.

Imran

-- http://bits.bris.ac.uk/imran

Toby Bartels

27 Oct 27 Oct

3:50 a.m.

New subject: 80,000

Gareth Owen wrote:

...

The Cunctator cunctator@kband.com wrote:

...

...
How do they clutter the "worthwhile" entries? Wikipedia is not paper.

...

RandomPages

An idea: Can Randompage weight the articles that it picks by their length? This will make *any* stub less likely to be chosen, arguably without giving a distorted picture of what Wikipedia is like.

-- Toby

Brion VIBBER

6:15 a.m.

New subject: 80,000

Toby Bartels wrote:

...

Gareth Owen wrote:

...
The Cunctator cunctator@kband.com wrote:

...
How do they clutter the "worthwhile" entries? Wikipedia is not paper.

RandomPages

An idea: Can Randompage weight the articles that it picks by their length? This will make *any* stub less likely to be chosen, arguably without giving a distorted picture of what Wikipedia is like.

The Ram-bot city entries average somewhere around 2130 bytes each (standard deviation 113), which is *larger* than the English Wikipedia-wide average for article-space non-redirect pages (about 1900 bytes, standard deviation a whopping 3028).

As I recall, the median article size is smaller than the average; if you could cut out the Ram-bot cities by size, you'd cut out most of the rest of Wikipedia with them.

-- brion vibber (brion @ pobox.com)

Toby Bartels

5:05 p.m.

New subject: 80,000

Brion VIBBER wrote:

...

Toby Bartels wrote:

...

...
Can Randompage weight the articles that it picks by their length? This will make *any* stub less likely to be chosen, arguably without giving a distorted picture of what Wikipedia is like.

...

The Ram-bot city entries average somewhere around 2130 bytes each (standard deviation 113), which is *larger* than the English Wikipedia-wide average for article-space non-redirect pages (about 1900 bytes, standard deviation a whopping 3028).

I knew that they weren't short, but I was hoping that they weren't *long*. I guess that they are.

...

As I recall, the median article size is smaller than the average; if you could cut out the Ram-bot cities by size, you'd cut out most of the rest of Wikipedia with them.

(I assume that by "average" you mean <arithmetic mean>, then?) Weighting by size could never cut out articles larger than the mean, so the conclusion is not that most of Wikipedia would be cut out but instead that the Ram-bot entries would not be cut out.

In light of other responses, there seem to be two purposes to Randompage. One is to give visitors an idea of what Wikipedia is like, and if Ram-bot entries are both among the most common and most substantial of Wikipedia articles, that it's only fair that they show up often. But another is to give contributors ideas for new things to work on, and Ram-bot entries are generally useless for this purpose. Perhaps we need 2 versions of Randompage? (I should not really talk about it, since I rarely use Randompage anyway.)

-- Toby

Karen AKA Kajikit

9:04 a.m.

New subject: 80,000

Toby Bartels wrote:

...

Gareth Owen wrote:

...
The Cunctator cunctator@kband.com wrote:

...
...
How do they clutter the "worthwhile" entries? Wikipedia is not paper.

...
RandomPages

An idea: Can Randompage weight the articles that it picks by their length? This will make *any* stub less likely to be chosen, arguably without giving a distorted picture of what Wikipedia is like.

It's not the length of these machine-generated articles that's at issue. These particular ones are GOOD articles - they have all the information we need, they're automatically spellchecked etc. There isn't anything that needs to be done to them that's visible to someone from the other side of the world - and therein lies the reason why they're 'cluttering' the place up. Recent Changes and the Random button are the two primary methods of finding articles that need work, for me at least, and these articles cut down on the number of 'needing work' articles I can see.

I find the long and completed articles on the elements etc just as much of a nuisance, for the same reason. If anything, I'd like an option to search just the SHORT articles and to randomly pull up a stub that I could work on. I know the 'short articles' listing will do it, but I've worked through about the first six pages and the majority of the remaining articles at the beginning of the list are the mythology stubs that I know nothing about and can't extend. A 'random stub' (choosing only articles under about 1000bytes) button would be a handy maintenance tool for me, but I don't know if anyone else would want to use it.

Vicki Rosenzweig

1:11 p.m.

New subject: 80,000

At 08:50 PM 10/26/02 -0700, Toby wrote:

...

Gareth Owen wrote:

...
The Cunctator cunctator@kband.com wrote:

...
...
How do they clutter the "worthwhile" entries? Wikipedia is not paper.

...
RandomPages

An idea: Can Randompage weight the articles that it picks by their length? This will make *any* stub less likely to be chosen, arguably without giving a distorted picture of what Wikipedia is like.

That would be a good option, but shouldn't be the default--one of the ways I use "random page" is to find things to improve and edit, and stubs definitely qualify there.

-- Vicki Rosenzweig vr@redbird.org http://www.redbird.org

Paul Ebermann

26 Oct 26 Oct

10:16 p.m.

New subject: 80,000

"Gareth Owen" skribis:

...

The Cunctator cunctator@kband.com writes:

...
How do they clutter the "worthwhile" entries? Wikipedia is not paper.

RandomPages

I just tested this ...

I twenty times klicked on "Random page", and got

- 3 times these "Village pages" - 3 times year-pages (two of them 507) - 14 times other articles (but not all of them being "worthwile" articles IMHO).

Is this a great problem?

Paul

Karen AKA Kajikit

27 Oct 27 Oct

9:26 a.m.

New subject: 80,000

Paul Ebermann wrote:

...

"Gareth Owen" skribis:

...
The Cunctator cunctator@kband.com writes:

...
How do they clutter the "worthwhile" entries? Wikipedia is not paper.

RandomPages

I just tested this ...

I twenty times klicked on "Random page", and got

3 times these "Village pages"

3 times year-pages (two of them 507)

14 times other articles (but not all of them being "worthwile" articles IMHO).

Is this a great problem?

In itself, no it's not. I did a random page search myself just now and here are my results. I didn't happen to get any year pages this time around, but six out of my twenty clicks took me to a rambot town and the last four of those were in a row directly after each other. One year entry, or one town, or one country demographics entry (etc) by itself isn't a problem, but when you click again and get another one, and again etc, it might make people think that that's all we have in the wikipedia. I could see it dissuading people from trying to use the wikipedia, because they might think that their kind of information wasn't wanted or needed, or that the wikipedia just doesn't have what they're looking for when it probably does.

MY RESULTS

1) Whippet - tiny stub

2) Master station - straightforward definition, drawn from Federal Standard 1037C

3) Topological group - lengthy mathematical article

4) Lexington, Missouri - Rambot entry

5) Worcester Cathedral - tiny stub

6) F-14 Tomcat - lengthy article, with written information and a long list of stats

7) Triboluminescence - stub, basically a definition

8) Antony and Cleopatra - stub on the Shakespeare play

9) Stotts City, Missouri - Rambot entry

10) History of Swaziland - CIA factbook entry, very stubbish

11) Arecales - microstub (family listing)

12) Ripe - disambiguation page, no actual info

13) Der er et yndigt land - (Danish national anthem) lyrics and English translation

14) Cow tipping - short article on an urban legend

15) Hub - definition

16) Feyenoord Rotterdam - short article on the Dutch football team

17) Cleveland, Illinois - rambot entry

18) Marion Heights, Pennsylvania - rambot entry

19) Spring Valley, Ohio - rambot entry

20) New Richmond, Wisconsin - rambot AGAIN

Lightning

2:38 p.m.

New subject: 80,000

...

MY RESULTS

Whippet - tiny stub

Master station - straightforward definition, drawn from Federal

Standard 1037C

Topological group - lengthy mathematical article

Lexington, Missouri - Rambot entry

Worcester Cathedral - tiny stub

F-14 Tomcat - lengthy article, with written information and a long

list of stats

Triboluminescence - stub, basically a definition

Antony and Cleopatra - stub on the Shakespeare play

Stotts City, Missouri - Rambot entry

History of Swaziland - CIA factbook entry, very stubbish

Arecales - microstub (family listing)

Ripe - disambiguation page, no actual info

Der er et yndigt land - (Danish national anthem) lyrics and English

translation

Cow tipping - short article on an urban legend

Hub - definition

Feyenoord Rotterdam - short article on the Dutch football team

Cleveland, Illinois - rambot entry

Marion Heights, Pennsylvania - rambot entry

Spring Valley, Ohio - rambot entry

New Richmond, Wisconsin - rambot AGAIN

Well, uhm, a truly random page selector should be able to pick the same page 20 times, or rambot cities 20 times, or foods that start with the letter q 20 times, because if it can't, then its not truly random. so this is a bad way of doing this. theres 35k rambot articles, out of 80k articles, that just beans that rambot articles have a 7/16 probability of being picked. but that doesnt rule out the possibility that if i click it 50 times i will get years 2-51. Remember that with random picking the last choice has no effect on the next one. soooo lets stop looking for patterns in a random page generator... we could be doing real work... it seems that a lot of people sometimes get carried away in details. dont worry people, once wikipedia grows the randompage gen will even itself out.

Lightning

Zoe

25 Oct 25 Oct

8:40 p.m.

New subject: 80,000

What space are they wasting? What things would be in that space if Ram-Man's articles weren't there? Why does it bother you so much? Zoe Gareth Owen wiki@gwowen.freeserve.co.uk wrote:tarquin writes:

...

It's not that I don't care about one-horse US towns (well, okay, I don't)

Would it be possible to move the fantastic-waste-of-space that is Ram-Man's latest datadump to gazetteer.wikipedia.com, or somesuch, so they don't clutter up the *worthwhile* *encyclopedic* entries?

-- Gareth Owen "Wikipedia does rock. By the count on the "brilliant prose" page, there are 14 not-bad articles so far" -- Larry Sanger (12 Jan 2001) [Wikipedia-l] To manage your subscription to this list, please go here: http://www.nupedia.com/mailman/listinfo/wikipedia-l --------------------------------- Do you Yahoo!? Y! Web Hosting - Let the expert host your web site

Gareth Owen

10:50 p.m.

New subject: 80,000

Zoe zoecomnena@yahoo.com writes:

...

What space are they wasting?

Have you looked at RecentChanges recently?

...

What things would be in that space if Ram-Man's articles weren't there? Why does it bother you so much?

Because they're not encyclopedia articles, they're distorting the article counts and, as anyone who's hit RandomPage recently will tell you, they're giving a skewed view of the project to anyone who's looking at it.

All this information is already available online from the correct places -- US census dept et al. Being informations does not make it worthwhile information. There is little encyclopedic worth in knowing the elevation above sealevel of Shithole, Indiana.

If you want wikiatlas.org or wikicensus.org, or wikidomesdaybook.org, I recommend buying the domain name and setting the damn thing up. And I thought -- after discussion about The Jargon File, the Catholic Encyclopia and Britannica PD -- that we didn't want datadumps. Its the sellout of quality to quantity, with seemingly no human editing.

On top of that, its slowing down responses and is contributing to the overloading of the server.

-- Gareth Owen "Wikipedia does rock. By the count on the "brilliant prose" page, there are 14 not-bad articles so far" -- Larry Sanger (12 Jan 2001)

tarquin

10:47 a.m.

Jeroen Heijmans wrote:

...

Yes, again about the [[September 11, 2001 Terrorist Attack/In Memoriam]] page cs. Is there yet a solution? Last month, when I brought up this subject again (it wasn't the first time), it was said that we should not be hasty. Well, there's been enough time not to be hasty now. I really think this issue should finally be resolved, otherwise I'll be so bold to move everything to meta to let it wait there for the new destination. There's no reason to let these pages be around anymore. Wikipedia is an encyclopedia.

I agree. I've been pondering this issue. Maybe a new "Memorial Wiki", not just for 9/11 for future victims of war & terrorism too. (remember to add the civilian casualties should Dubya bomb Iraq)

Axel Boldt

5:18 p.m.

--- tarquin tarquin@planetunreal.com wrote:

...

Jeroen Heijmans wrote:

...
Yes, again about the [[September 11, 2001 Terrorist Attack/In Memoriam]] page cs. Is there yet a solution? Last month, when I brought up this subject again (it wasn't the first time), it was said that we should not be hasty. Well, there's been enough time not to be hasty now. I really think this issue should finally be resolved, otherwise I'll be so bold to move everything to meta to let it wait there for the new destination. There's no reason to let these pages be around anymore. Wikipedia is an encyclopedia.

I agree.

Me too. I think we have been politically correct in that area for long enough; it's time to go back to our original mission.

Axel

__________________________________________________ Do you Yahoo!? Y! Web Hosting - Let the expert host your web site http://webhosting.yahoo.com/

Zoe

8:15 p.m.

Oh, good. Can we add the names of all of the Iraqi citizens who are dying because of Saddam's refusal to pass around internationally-provided food and medicine to anyone other than his elites? Zoe tarquin tarquin@planetunreal.com wrote:

Jeroen Heijmans wrote:

...

Yes, again about the [[September 11, 2001 Terrorist Attack/In Memoriam]] page cs. Is there yet a solution? Last month, when I brought up this subject again (it wasn't the first time), it was said that we should not be hasty. Well, there's been enough time not to be hasty now. I really think this issue should finally be resolved, otherwise I'll be so bold to move everything to meta to let it wait there for the new destination. There's no reason to let these pages be around anymore. Wikipedia is an encyclopedia.

I agree. I've been pondering this issue. Maybe a new "Memorial Wiki", not just for 9/11 for future victims of war & terrorism too. (remember to add the civilian casualties should Dubya bomb Iraq)

[Wikipedia-l] To manage your subscription to this list, please go here: http://www.nupedia.com/mailman/listinfo/wikipedia-l

--------------------------------- Do you Yahoo!? Y! Web Hosting - Let the expert host your web site

Neil Harris

12:53 p.m.

New subject: 80,000

Karen AKA Kajikit wrote:

...

tarquin wrote:

...
The Cunctator wrote:

...
By the way, with Ram-Man's automatic article addition, we've just

hit 80.000 English entries

...

...
...
Yes. We're big like a telephone directory is big. Please let's not celebrate this milestone. When we got to 50k articles we were saying "but so many are stubs",

and we're in a worse situation now, as I don't see how we can exclude these town pages from article counts.

...

...
They really are stubs, maybe not in terms of length but in terms of

content and usefulness.

...

...
It's not that I don't care about one-horse US towns (well, okay, I

don't) but I feel the entire Wikipedia has tilted.

...

...
We shouldn't delete them, but to balance we should maybe work on,

say, adding 10,000 articles to the Tree of Life project?

...

...
It's seriously tipped the balance of the wikipedia. Five times out of ten hitting the 'random' button takes me to one of these new entries... I'm sure they're very useful if you need to find information on 'Dead Horse, Arizona', but they're not very relevant in the grand scheme of things. I'll be glad once they're all in and the recent changes menu can return to usefulness. Out of curiosity, what existing entries are these new articles linked to?

Can I suggest a "stub flag" so that machine-generated articles (such as Ram-Man's and hundreds of mine) are not marked as stubs until they have been edited by an actual human user. Articles with the explicit "stub flag" set should be * shown as stubs using the stub-detector no matter what the user's setting, if it is non-zero * not shown as "Random articles" * and '''not included in the main page article count'''

Indeed, perhaps all articles should be marked as stubs until they have at least one revision by a contributor who was not the original creator. This will automatically catch any auto-generated articles.

Neil

Anthere

1:47 p.m.

New subject: 80,000

...

Indeed, perhaps all articles should be marked as stubs until they have at least one revision by a contributor who was not the original creator. This will automatically catch any auto-generated articles.

Neil

Maybe I have not understood *at all* what a stub was ???

I thought it was a very short article, in bad need of being expanded ???

Are you not confusing *stub* with *not-reviewed articles* here ?

If such a decision is taken, we'll have to set up a new page "articles to be absolutely reviewed not to be confused with stubs". I have a handful of articles which I believe have never been edited by anybody. I certainly would feel very insulted if they were listed as stubs.

There is a need to catch automatically generated articles, but not that way.

__________________________________________________ Do you Yahoo!? Y! Web Hosting - Let the expert host your web site http://webhosting.yahoo.com/

7944

Age (days ago)

7946

Last active (days ago)

wikipedia-l@lists.wikimedia.org

23 comments

17 participants

tags (0)

participants (17)

Anthere
Axel Boldt
Brion VIBBER
Gareth Owen
Gareth Owen
Imran Ghory
Jeroen Heijmans
Karen AKA Kajikit
Krzysztof P. Jasiutowicz
Lightning
Neil Harris
Paul Ebermann
tarquin
The Cunctator
Toby Bartels
Vicki Rosenzweig
Zoe