Yes, again about the [[September 11, 2001 Terrorist Attack/In Memoriam]] page cs. Is there yet a solution? Last month, when I brought up this subject again (it wasn't the first time), it was said that we should not be hasty. Well, there's been enough time not to be hasty now. I really think this issue should finally be resolved, otherwise I'll be so bold to move everything to meta to let it wait there for the new destination. There's no reason to let these pages be around anymore. Wikipedia is an encyclopedia.
Jeronimo
The Cunctator wrote:
By the way, with Ram-Man's automatic article addition, we've just hit 80.000 English entries
Yes. We're big like a telephone directory is big. Please let's not celebrate this milestone. When we got to 50k articles we were saying "but so many are stubs", and we're in a worse situation now, as I don't see how we can exclude these town pages from article counts. They really are stubs, maybe not in terms of length but in terms of content and usefulness.
It's not that I don't care about one-horse US towns (well, okay, I don't) but I feel the entire Wikipedia has tilted. We shouldn't delete them, but to balance we should maybe work on, say, adding 10,000 articles to the Tree of Life project?
tarquin wrote:
The Cunctator wrote:
By the way, with Ram-Man's automatic article addition, we've just hit 80.000 English entries
Yes. We're big like a telephone directory is big. Please let's not celebrate this milestone. When we got to 50k articles we were saying "but so many are stubs", and we're in a worse situation now, as I don't see how we can exclude these town pages from article counts. They really are stubs, maybe not in terms of length but in terms of content and usefulness.
It's not that I don't care about one-horse US towns (well, okay, I don't) but I feel the entire Wikipedia has tilted. We shouldn't delete them, but to balance we should maybe work on, say, adding 10,000 articles to the Tree of Life project?
It's seriously tipped the balance of the wikipedia. Five times out of ten hitting the 'random' button takes me to one of these new entries... I'm sure they're very useful if you need to find information on 'Dead Horse, Arizona', but they're not very relevant in the grand scheme of things. I'll be glad once they're all in and the recent changes menu can return to usefulness. Out of curiosity, what existing entries are these new articles linked to?
tarquin tarquin@planetunreal.com writes:
It's not that I don't care about one-horse US towns (well, okay, I don't)
Would it be possible to move the fantastic-waste-of-space that is Ram-Man's latest datadump to gazetteer.wikipedia.com, or somesuch, so they don't clutter up the *worthwhile* *encyclopedic* entries?
On 10/25/02 1:22 PM, "Gareth Owen" wiki@gwowen.freeserve.co.uk wrote:
tarquin tarquin@planetunreal.com writes:
It's not that I don't care about one-horse US towns (well, okay, I don't)
Would it be possible to move the fantastic-waste-of-space that is Ram-Man's latest datadump to gazetteer.wikipedia.com, or somesuch, so they don't clutter up the *worthwhile* *encyclopedic* entries?
How do they clutter the "worthwhile" entries? Wikipedia is not paper.
If the problem is with the "Random page" function, then perhaps some tweaking there would be reasonable, as long as it's noted somewhere that "random page" isn't exactly random.
And though I'm not a huge fan of the entries, they are well done.
The Cunctator cunctator@kband.com writes:
How do they clutter the "worthwhile" entries? Wikipedia is not paper.
RandomPages RecentChanges
And though I'm not a huge fan of the entries, they are well done.
What is the point of well done bad articles?
On 25 Oct 2002, Gareth Owen wrote:
The Cunctator cunctator@kband.com writes:
How do they clutter the "worthwhile" entries? Wikipedia is not paper.
RandomPages
RandomPages is one reason for which we really need article classification by topic, as at the moment it roughly assumes that articles are going to spread across all topics equally. If that ever was true it certainly isn't true now. Even if autogenerated articles were excluded, in the long term we would get certain topics having far more articles than others.
To avoid giving a skewed view of the pedia we need some sort of classification systems so that the RandomPage function first randomly chooses a topic and then returns a random article from that topic.
Imran
On 26-10-2002, Imran Ghory wrote thusly :
On 25 Oct 2002, Gareth Owen wrote:
The Cunctator cunctator@kband.com writes:
How do they clutter the "worthwhile" entries? Wikipedia is not paper.
RandomPages
RandomPages is one reason for which we really need article classification by topic, as at the moment it roughly assumes that articles are going to spread across all topics equally. If that ever was true it certainly isn't true now. Even if autogenerated articles were excluded, in the long term we would get certain topics having far more articles than others. To avoid giving a skewed view of the pedia we need some sort of classification systems so that the RandomPage function first randomly chooses a topic and then returns a random article from that topic.
There was some discussion about classification, metadata and solutions to tackle the growing volume of information. Then there were 10,000-15,000 articles. It was either rebuked or the importance was minimized. To classify was thought unwiki and impracticable ("who will classify, according to what criteria, what about multiple categories and wrongly assigned articles who would correct them"). Manning was in favour and I think it is thanks to him we have Wikiprojects. Long ago I created metadata subpages for a few articles as an excercise in this direction but probably they got votes for deletion as a search engine whoring.
Have you worked out some solutions ? How do you think it could be accomplished ?
Regards, Kpjas.
On Sat, 26 Oct 2002, Krzysztof P. Jasiutowicz wrote:
Have you worked out some solutions ? How do you think it could be accomplished ?
Whoever did [[Wikipedia:Category_experiment]] seems to have found a reasonably good solution, at least well enough for the purpose of random article selection.
Imran
Gareth Owen wrote:
The Cunctator cunctator@kband.com wrote:
How do they clutter the "worthwhile" entries? Wikipedia is not paper.
RandomPages
An idea: Can Randompage weight the articles that it picks by their length? This will make *any* stub less likely to be chosen, arguably without giving a distorted picture of what Wikipedia is like.
-- Toby
Toby Bartels wrote:
Gareth Owen wrote:
The Cunctator cunctator@kband.com wrote:
How do they clutter the "worthwhile" entries? Wikipedia is not paper.
RandomPages
An idea: Can Randompage weight the articles that it picks by their length? This will make *any* stub less likely to be chosen, arguably without giving a distorted picture of what Wikipedia is like.
The Ram-bot city entries average somewhere around 2130 bytes each (standard deviation 113), which is *larger* than the English Wikipedia-wide average for article-space non-redirect pages (about 1900 bytes, standard deviation a whopping 3028).
As I recall, the median article size is smaller than the average; if you could cut out the Ram-bot cities by size, you'd cut out most of the rest of Wikipedia with them.
-- brion vibber (brion @ pobox.com)
Brion VIBBER wrote:
Toby Bartels wrote:
Can Randompage weight the articles that it picks by their length? This will make *any* stub less likely to be chosen, arguably without giving a distorted picture of what Wikipedia is like.
The Ram-bot city entries average somewhere around 2130 bytes each (standard deviation 113), which is *larger* than the English Wikipedia-wide average for article-space non-redirect pages (about 1900 bytes, standard deviation a whopping 3028).
I knew that they weren't short, but I was hoping that they weren't *long*. I guess that they are.
As I recall, the median article size is smaller than the average; if you could cut out the Ram-bot cities by size, you'd cut out most of the rest of Wikipedia with them.
(I assume that by "average" you mean <arithmetic mean>, then?) Weighting by size could never cut out articles larger than the mean, so the conclusion is not that most of Wikipedia would be cut out but instead that the Ram-bot entries would not be cut out.
In light of other responses, there seem to be two purposes to Randompage. One is to give visitors an idea of what Wikipedia is like, and if Ram-bot entries are both among the most common and most substantial of Wikipedia articles, that it's only fair that they show up often. But another is to give contributors ideas for new things to work on, and Ram-bot entries are generally useless for this purpose. Perhaps we need 2 versions of Randompage? (I should not really talk about it, since I rarely use Randompage anyway.)
-- Toby
Toby Bartels wrote:
Gareth Owen wrote:
The Cunctator cunctator@kband.com wrote:
How do they clutter the "worthwhile" entries? Wikipedia is not paper.
RandomPages
An idea: Can Randompage weight the articles that it picks by their length? This will make *any* stub less likely to be chosen, arguably without giving a distorted picture of what Wikipedia is like.
It's not the length of these machine-generated articles that's at issue. These particular ones are GOOD articles - they have all the information we need, they're automatically spellchecked etc. There isn't anything that needs to be done to them that's visible to someone from the other side of the world - and therein lies the reason why they're 'cluttering' the place up. Recent Changes and the Random button are the two primary methods of finding articles that need work, for me at least, and these articles cut down on the number of 'needing work' articles I can see.
I find the long and completed articles on the elements etc just as much of a nuisance, for the same reason. If anything, I'd like an option to search just the SHORT articles and to randomly pull up a stub that I could work on. I know the 'short articles' listing will do it, but I've worked through about the first six pages and the majority of the remaining articles at the beginning of the list are the mythology stubs that I know nothing about and can't extend. A 'random stub' (choosing only articles under about 1000bytes) button would be a handy maintenance tool for me, but I don't know if anyone else would want to use it.
At 08:50 PM 10/26/02 -0700, Toby wrote:
Gareth Owen wrote:
The Cunctator cunctator@kband.com wrote:
How do they clutter the "worthwhile" entries? Wikipedia is not paper.
RandomPages
An idea: Can Randompage weight the articles that it picks by their length? This will make *any* stub less likely to be chosen, arguably without giving a distorted picture of what Wikipedia is like.
That would be a good option, but shouldn't be the default--one of the ways I use "random page" is to find things to improve and edit, and stubs definitely qualify there.
"Gareth Owen" skribis:
The Cunctator cunctator@kband.com writes:
How do they clutter the "worthwhile" entries? Wikipedia is not paper.
RandomPages
I just tested this ...
I twenty times klicked on "Random page", and got
- 3 times these "Village pages" - 3 times year-pages (two of them 507) - 14 times other articles (but not all of them being "worthwile" articles IMHO).
Is this a great problem?
Paul
Paul Ebermann wrote:
"Gareth Owen" skribis:
The Cunctator cunctator@kband.com writes:
How do they clutter the "worthwhile" entries? Wikipedia is not paper.
RandomPages
I just tested this ...
I twenty times klicked on "Random page", and got
- 3 times these "Village pages"
- 3 times year-pages (two of them 507)
- 14 times other articles (but not all of them being "worthwile" articles IMHO).
Is this a great problem?
In itself, no it's not. I did a random page search myself just now and here are my results. I didn't happen to get any year pages this time around, but six out of my twenty clicks took me to a rambot town and the last four of those were in a row directly after each other. One year entry, or one town, or one country demographics entry (etc) by itself isn't a problem, but when you click again and get another one, and again etc, it might make people think that that's all we have in the wikipedia. I could see it dissuading people from trying to use the wikipedia, because they might think that their kind of information wasn't wanted or needed, or that the wikipedia just doesn't have what they're looking for when it probably does.
MY RESULTS
1) Whippet - tiny stub
2) Master station - straightforward definition, drawn from Federal Standard 1037C
3) Topological group - lengthy mathematical article
4) Lexington, Missouri - Rambot entry
5) Worcester Cathedral - tiny stub
6) F-14 Tomcat - lengthy article, with written information and a long list of stats
7) Triboluminescence - stub, basically a definition
8) Antony and Cleopatra - stub on the Shakespeare play
9) Stotts City, Missouri - Rambot entry
10) History of Swaziland - CIA factbook entry, very stubbish
11) Arecales - microstub (family listing)
12) Ripe - disambiguation page, no actual info
13) Der er et yndigt land - (Danish national anthem) lyrics and English translation
14) Cow tipping - short article on an urban legend
15) Hub - definition
16) Feyenoord Rotterdam - short article on the Dutch football team
17) Cleveland, Illinois - rambot entry
18) Marion Heights, Pennsylvania - rambot entry
19) Spring Valley, Ohio - rambot entry
20) New Richmond, Wisconsin - rambot AGAIN
MY RESULTS
Whippet - tiny stub
Master station - straightforward definition, drawn from Federal
Standard 1037C
Topological group - lengthy mathematical article
Lexington, Missouri - Rambot entry
Worcester Cathedral - tiny stub
F-14 Tomcat - lengthy article, with written information and a long
list of stats
Triboluminescence - stub, basically a definition
Antony and Cleopatra - stub on the Shakespeare play
Stotts City, Missouri - Rambot entry
History of Swaziland - CIA factbook entry, very stubbish
Arecales - microstub (family listing)
Ripe - disambiguation page, no actual info
Der er et yndigt land - (Danish national anthem) lyrics and English
translation
Cow tipping - short article on an urban legend
Hub - definition
Feyenoord Rotterdam - short article on the Dutch football team
Cleveland, Illinois - rambot entry
Marion Heights, Pennsylvania - rambot entry
Spring Valley, Ohio - rambot entry
New Richmond, Wisconsin - rambot AGAIN
Well, uhm, a truly random page selector should be able to pick the same page 20 times, or rambot cities 20 times, or foods that start with the letter q 20 times, because if it can't, then its not truly random. so this is a bad way of doing this. theres 35k rambot articles, out of 80k articles, that just beans that rambot articles have a 7/16 probability of being picked. but that doesnt rule out the possibility that if i click it 50 times i will get years 2-51. Remember that with random picking the last choice has no effect on the next one. soooo lets stop looking for patterns in a random page generator... we could be doing real work... it seems that a lot of people sometimes get carried away in details. dont worry people, once wikipedia grows the randompage gen will even itself out.
Lightning
What space are they wasting? What things would be in that space if Ram-Man's articles weren't there? Why does it bother you so much? Zoe Gareth Owen wiki@gwowen.freeserve.co.uk wrote:tarquin writes:
It's not that I don't care about one-horse US towns (well, okay, I don't)
Would it be possible to move the fantastic-waste-of-space that is Ram-Man's latest datadump to gazetteer.wikipedia.com, or somesuch, so they don't clutter up the *worthwhile* *encyclopedic* entries?
Zoe zoecomnena@yahoo.com writes:
What space are they wasting?
Have you looked at RecentChanges recently?
What things would be in that space if Ram-Man's articles weren't there? Why does it bother you so much?
Because they're not encyclopedia articles, they're distorting the article counts and, as anyone who's hit RandomPage recently will tell you, they're giving a skewed view of the project to anyone who's looking at it.
All this information is already available online from the correct places -- US census dept et al. Being informations does not make it worthwhile information. There is little encyclopedic worth in knowing the elevation above sealevel of Shithole, Indiana.
If you want wikiatlas.org or wikicensus.org, or wikidomesdaybook.org, I recommend buying the domain name and setting the damn thing up. And I thought -- after discussion about The Jargon File, the Catholic Encyclopia and Britannica PD -- that we didn't want datadumps. Its the sellout of quality to quantity, with seemingly no human editing.
On top of that, its slowing down responses and is contributing to the overloading of the server.
Jeroen Heijmans wrote:
Yes, again about the [[September 11, 2001 Terrorist Attack/In Memoriam]] page cs. Is there yet a solution? Last month, when I brought up this subject again (it wasn't the first time), it was said that we should not be hasty. Well, there's been enough time not to be hasty now. I really think this issue should finally be resolved, otherwise I'll be so bold to move everything to meta to let it wait there for the new destination. There's no reason to let these pages be around anymore. Wikipedia is an encyclopedia.
I agree. I've been pondering this issue. Maybe a new "Memorial Wiki", not just for 9/11 for future victims of war & terrorism too. (remember to add the civilian casualties should Dubya bomb Iraq)
--- tarquin tarquin@planetunreal.com wrote:
Jeroen Heijmans wrote:
Yes, again about the [[September 11, 2001 Terrorist Attack/In Memoriam]] page cs. Is there yet a solution? Last month, when I brought up this subject again (it wasn't the first time), it was said that we should not be hasty. Well, there's been enough time not to be hasty now. I really think this issue should finally be resolved, otherwise I'll be so bold to move everything to meta to let it wait there for the new destination. There's no reason to let these pages be around anymore. Wikipedia is an encyclopedia.
I agree.
Me too. I think we have been politically correct in that area for long enough; it's time to go back to our original mission.
Axel
__________________________________________________ Do you Yahoo!? Y! Web Hosting - Let the expert host your web site http://webhosting.yahoo.com/
Oh, good. Can we add the names of all of the Iraqi citizens who are dying because of Saddam's refusal to pass around internationally-provided food and medicine to anyone other than his elites? Zoe tarquin tarquin@planetunreal.com wrote:
Jeroen Heijmans wrote:
Yes, again about the [[September 11, 2001 Terrorist Attack/In Memoriam]] page cs. Is there yet a solution? Last month, when I brought up this subject again (it wasn't the first time), it was said that we should not be hasty. Well, there's been enough time not to be hasty now. I really think this issue should finally be resolved, otherwise I'll be so bold to move everything to meta to let it wait there for the new destination. There's no reason to let these pages be around anymore. Wikipedia is an encyclopedia.
I agree. I've been pondering this issue. Maybe a new "Memorial Wiki", not just for 9/11 for future victims of war & terrorism too. (remember to add the civilian casualties should Dubya bomb Iraq)
[Wikipedia-l] To manage your subscription to this list, please go here: http://www.nupedia.com/mailman/listinfo/wikipedia-l
--------------------------------- Do you Yahoo!? Y! Web Hosting - Let the expert host your web site
Karen AKA Kajikit wrote:
tarquin wrote:
The Cunctator wrote:
By the way, with Ram-Man's automatic article addition, we've just
hit 80.000 English entries
Yes. We're big like a telephone directory is big. Please let's not celebrate this milestone. When we got to 50k articles we were saying "but so many are stubs",
and we're in a worse situation now, as I don't see how we can exclude these town pages from article counts.
They really are stubs, maybe not in terms of length but in terms of
content and usefulness.
It's not that I don't care about one-horse US towns (well, okay, I
don't) but I feel the entire Wikipedia has tilted.
We shouldn't delete them, but to balance we should maybe work on,
say, adding 10,000 articles to the Tree of Life project?
It's seriously tipped the balance of the wikipedia. Five times out of ten hitting the 'random' button takes me to one of these new entries... I'm sure they're very useful if you need to find information on 'Dead Horse, Arizona', but they're not very relevant in the grand scheme of things. I'll be glad once they're all in and the recent changes menu can return to usefulness. Out of curiosity, what existing entries are these new articles linked to?
Can I suggest a "stub flag" so that machine-generated articles (such as Ram-Man's and hundreds of mine) are not marked as stubs until they have been edited by an actual human user. Articles with the explicit "stub flag" set should be * shown as stubs using the stub-detector no matter what the user's setting, if it is non-zero * not shown as "Random articles" * and '''not included in the main page article count'''
Indeed, perhaps all articles should be marked as stubs until they have at least one revision by a contributor who was not the original creator. This will automatically catch any auto-generated articles.
Neil
Indeed, perhaps all articles should be marked as stubs until they have at least one revision by a contributor who was not the original creator. This will automatically catch any auto-generated articles.
Neil
Maybe I have not understood *at all* what a stub was ???
I thought it was a very short article, in bad need of being expanded ???
Are you not confusing *stub* with *not-reviewed articles* here ?
If such a decision is taken, we'll have to set up a new page "articles to be absolutely reviewed not to be confused with stubs". I have a handful of articles which I believe have never been edited by anybody. I certainly would feel very insulted if they were listed as stubs.
There is a need to catch automatically generated articles, but not that way.
__________________________________________________ Do you Yahoo!? Y! Web Hosting - Let the expert host your web site http://webhosting.yahoo.com/
wikipedia-l@lists.wikimedia.org