[also originally sent yesterday but didn't seem to get through the moderator, perhaps due to my mail server?]
Hi all, I'd like to undertake a more thorough survey of wikipedia referencing standards, but I've started with a quick "pilot" study.
Methodology: Click "random article". Discard results which are not articles. Count the number of "external links", "references", "paragraphs".
Terms: A bit fuzzy, I'm treating a web page which gives more information as an "external link", and a page or book or whatever which is claimed to be the source of the information (or is clearly the source) as a "reference". Paragraphs are, well, paragraphs, but it must be said that longer articles generally have longer paragraphs than shorter ones do. So lines would probably be better...
Preliminary results: Sample size:30 pages, of which 17 were stubs.
Number with no links: 21 Number with no references: 24 Average number of links: 0.67 Average number of references: 0.54
I found very few book references, one of which was patently false ("James Maxwell's book of James Maxwells not as cool as me, by James Maxwell"). Similarly a list of newspaper articles turned out to all have been written by the subject (a journalist). One page (out of 30) actually gave ISBN references (Chepstow Bridge).
Conclusions: None yet, really, since the methodology isn't very solid and the sample set is small. But notably: More than half the articles were stubs. Hardly any articles had any real "references". Most of the external links were band websites, company websites etc. Of the few refernces, one was blatantly false and a few were "bad". So it's probably a little early to be claiming that all material added to Wikipedia MUST be sourced or it will be removed. Because based on this, only around 15% of Wikipedia would survive. (Which is more than I would have predicted).
Any suggestions for improved methodology? It might be nice to harness the wikipedia population to collect some more general article quality metrics...
Steve
Here's another challenge. If you keep a list of the articles you started, go through them and make sure than they have external links or references, even if they are stubs. Gotta start somewhere - might as well start on your own stuff
Ian (Guettarda)
On 1/30/06, Steve Bennett stevage@gmail.com wrote:
[also originally sent yesterday but didn't seem to get through the moderator, perhaps due to my mail server?]
Hi all, I'd like to undertake a more thorough survey of wikipedia referencing standards, but I've started with a quick "pilot" study.
Methodology: Click "random article". Discard results which are not articles. Count the number of "external links", "references", "paragraphs".
Terms: A bit fuzzy, I'm treating a web page which gives more information as an "external link", and a page or book or whatever which is claimed to be the source of the information (or is clearly the source) as a "reference". Paragraphs are, well, paragraphs, but it must be said that longer articles generally have longer paragraphs than shorter ones do. So lines would probably be better...
Preliminary results: Sample size:30 pages, of which 17 were stubs.
Number with no links: 21 Number with no references: 24 Average number of links: 0.67 Average number of references: 0.54
I found very few book references, one of which was patently false ("James Maxwell's book of James Maxwells not as cool as me, by James Maxwell"). Similarly a list of newspaper articles turned out to all have been written by the subject (a journalist). One page (out of 30) actually gave ISBN references (Chepstow Bridge).
Conclusions: None yet, really, since the methodology isn't very solid and the sample set is small. But notably: More than half the articles were stubs. Hardly any articles had any real "references". Most of the external links were band websites, company websites etc. Of the few refernces, one was blatantly false and a few were "bad". So it's probably a little early to be claiming that all material added to Wikipedia MUST be sourced or it will be removed. Because based on this, only around 15% of Wikipedia would survive. (Which is more than I would have predicted).
Any suggestions for improved methodology? It might be nice to harness the wikipedia population to collect some more general article quality metrics...
Steve _______________________________________________ WikiEN-l mailing list WikiEN-l@Wikipedia.org To unsubscribe from this mailing list, visit: http://mail.wikipedia.org/mailman/listinfo/wikien-l
On 1/30/06, Guettarda guettarda@gmail.com wrote:
Here's another challenge. If you keep a list of the articles you started, go through them and make sure than they have external links or references, even if they are stubs. Gotta start somewhere - might as well start on your own stuff
Well, I did keep the list. I'll keep my reasons for not following your suggestion to myself, to avoid offending anyone. Presumably you would like me to email the list to you so you can carry out this work? If you're impatient, feel free to start hitting that "random article" button - you'll find lots more like I did.
Steve
On 1/30/06, Steve Bennett stevage@gmail.com wrote:
Well, I did keep the list. I'll keep my reasons for not following your suggestion to myself, to avoid offending anyone. Presumably you would like me to email the list to you so you can carry out this work? If you're impatient, feel free to start hitting that "random article" button - you'll find lots more like I did.
I think you misunderstood Guettarda - he didn't mean improving the list of random articles you created, he meant that each of us should check all the articles we've created (or significantly rewritten, I'd add) and ensure that they are up to modern referencing standards.
After all, if you created the article, you probably know where to look to reference it. Many of us created articles "back in the day" before strict referencing was being encouraged as strongly as it is now, and they could probably do with some work.
-Matt
Conclusions: None yet, really, since the methodology isn't very solid and the sample set is small. But notably: More than half the articles were stubs. Hardly any articles had any real "references". Most of the external links were band websites, company websites etc. Of the few refernces, one was blatantly false and a few were "bad". So it's probably a little early to be claiming that all material added to Wikipedia MUST be sourced or it will be removed. Because based on this, only around 15% of Wikipedia would survive. (Which is more than I would have predicted).
Any suggestions for improved methodology? It might be nice to harness the wikipedia population to collect some more general article quality metrics...
Steve
I think we can have quite a simply solution to this. Do the same as we did to the images. Stuff which isn't cited within a week, can be speedy deleted like we do with images now. Just make sure it only applies to articles created after date X.
Mgm
I wonder if this won't encourage the addition of crap references, with people throwing up the first google hit they find on a subject as a "reference" in order to save a stub from deletion.
On 1/30/06, MacGyverMagic/Mgm macgyvermagic@gmail.com wrote:
I think we can have quite a simply solution to this. Do the same as we did to the images. Stuff which isn't cited within a week, can be speedy deleted like we do with images now. Just make sure it only applies to articles created after date X.
From: wikien-l-bounces@Wikipedia.org [mailto:wikien-l-bounces@Wikipedia.org] On Behalf Of MacGyverMagic/Mgm
Any suggestions for improved methodology? It might be nice
to harness
the wikipedia population to collect some more general
article quality
metrics...
Steve
I think we can have quite a simply solution to this. Do the same as we did to the images. Stuff which isn't cited within a week, can be speedy deleted like we do with images now. Just make sure it only applies to articles created after date X.
That sounds useful. I once created an article, saved it as one or two sentences and went googling for a source and when I came back to the article, it had, like Mr Wales' bandaid, been ripped off.
May I suggest that if a new article is not sourced within a day, then a warning template be placed on it, and if nothing is forthcoming six days later, then delete it.
Peter (Skyring)
On 1/30/06, Steve Bennett stevage@gmail.com wrote:
Hi all, I'd like to undertake a more thorough survey of wikipedia referencing standards, but I've started with a quick "pilot" study.
Methodology: Click "random article". Discard results which are not articles. Count the number of "external links", "references", "paragraphs".
Terms: A bit fuzzy, I'm treating a web page which gives more information as an "external link", and a page or book or whatever which is claimed to be the source of the information (or is clearly the source) as a "reference". Paragraphs are, well, paragraphs, but it must be said that longer articles generally have longer paragraphs than shorter ones do. So lines would probably be better...
Preliminary results: Sample size:30 pages, of which 17 were stubs.
Number with no links: 21 Number with no references: 24 Average number of links: 0.67 Average number of references: 0.54
I found very few book references, one of which was patently false ("James Maxwell's book of James Maxwells not as cool as me, by James Maxwell"). Similarly a list of newspaper articles turned out to all have been written by the subject (a journalist). One page (out of 30) actually gave ISBN references (Chepstow Bridge).
This compares reasonably well with the results of my survey (http://en.wikipedia.org/wiki/User:Carnildo/The_100): about 15% of all articles are sourced. My methodology was a little different, as I counted anything in a "references" section as a reference, while inline links were collectively counted as one reference.
I'm still working on a survey of biography articles, but preliminary results are that slightly fewer biographies are sourced, but ones that are sourced tend to have more sources.
-- Mark [[User:Carnildo]]
MacGyverMagic/Mgm wrote:
Any suggestions for improved methodology? It might be nice to harness the wikipedia population to collect some more general article quality metrics...
I think we can have quite a simply solution to this. Do the same as we did to the images. Stuff which isn't cited within a week, can be speedy deleted like we do with images now. Just make sure it only applies to articles created after date X.
This isn't simple; it's simplistic.
While an image is a relatively uneditable form the same is not true of text. Any edit can easily invalidate the supporting reference. Also any source can be slapped in that roughly deals with the subject and verifying all the details will not always be easy. I strongly support the need for references, but policy needs to adapt to circumstances. Some facts (such as those about living persons) need to be referenced more urgently than others. Article by article negotiations are more appropriate than arbitrary deadlines.
Ec
MacGyverMagic/Mgm wrote:
I think we can have quite a simply solution to this. Do the same as we did to the images. Stuff which isn't cited within a week, can be speedy deleted like we do with images now. Just make sure it only applies to articles created after date X.
Articles without references are simply works in progress most of the time, and I definitely don't support the deletion of works in progress. Images are different since it's far harder to check them for copyright issues and such.
Instead of deleting such articles, how about simply withholding "stable" flags from all unreferenced versions (ie, all of them)?
On 31 Jan 2006, at 00:52, Bryan Derksen wrote:
MacGyverMagic/Mgm wrote:
I think we can have quite a simply solution to this. Do the same as we did to the images. Stuff which isn't cited within a week, can be speedy deleted like we do with images now. Just make sure it only applies to articles created after date X.
Articles without references are simply works in progress most of the time, and I definitely don't support the deletion of works in progress. Images are different since it's far harder to check them for copyright issues and such.
Instead of deleting such articles, how about simply withholding "stable" flags from all unreferenced versions (ie, all of them)?
Another idea would be to simply add an {{unreferenced}} or {{unsourced}} tag, to alert readers.
Deleting articles because they have no references is draconian.
15 years ago or so I worked on a specialist encyclopaedia. It was undersourced but was the standard reference in the field. Like wikipedia, you could contact the authors of the articles if you wanted better references. I did encourage the addition of sources a bit I seem to remember (even though I was doing proofreading mostly). It is not that important to have a reference for every fact - eventually people will dispute the ones that are unclear. Some sources is really helpful, and I agree that there are far too few references but this is a transitional issue - not enough of the real sources are online and books are not available to all. There are some great online projects (like http://www.british- history.ac.uk/) but most of the stuff I know about isnt online yet. Many things people write are memories from sources that they no longer have access to. There are also some bits of original research around that are not that problematic - one day they will be published elsewhere, I am not going to complain before that.
We also need to be prepared for the way the internet is changing availability of primary sources. It used to be that access to these was rare and available to only a few people. Encyclopaedias were written from secondary or tertiary sources. The "fancruft" of WP is perhaps a primer for what happens when primary sources are available without an academic commentary (webcomics or whatever). Fields that have been less restricted in access have always been more dominated by "non professionals" for example [[Ian Nairn]] as an important architectural critic.
Amateurism (in the old sense) and expertise are very important. "Academicism" where people need to have PhDs and use the Chicago Manual of Style and cite a reference for every sentence are not where we need to be going. They represent a kind of geeky view of knowledge as a pure collection of facts where synthesis is ignored; do we have to reference the order in which we present these referenced facts?
Justinc
On Mon, 30 Jan 2006, Matt Brown wrote:
On 1/30/06, Steve Bennett stevage@gmail.com wrote:
Well, I did keep the list. I'll keep my reasons for not following your suggestion to myself, to avoid offending anyone. Presumably you would like me to email the list to you so you can carry out this work? If you're impatient, feel free to start hitting that "random article" button - you'll find lots more like I did.
I think you misunderstood Guettarda - he didn't mean improving the list of random articles you created, he meant that each of us should check all the articles we've created (or significantly rewritten, I'd add) and ensure that they are up to modern referencing standards.
Er, what is this "modern referencing standards" you are talking about? Is this some new thing that was voted on & made policy without much fanfare, or are you referring to some academic standard like the _MLA Handbook_? If it is the later, I can show you many examples of unsatisfactory referencing in modern publications, & excellent examples in older publications. If it is the former, I need to point out that in the last month I have encountered a wide variety of opinions from other Wikipedians on how this should be done to conclude that most believe that "proper referencing standards" is another way of saying "this is how I think it should be done".
(As for me, I like footnotes & use them when in doubt. But I won't footnote a statement if it is reasonably close to the section I have just footnoted, & try not to overuse them. And I wish the MLA hadn't deprecated the use of "ibid." & "op. cit.")
After all, if you created the article, you probably know where to look to reference it. Many of us created articles "back in the day" before strict referencing was being encouraged as strongly as it is now, and they could probably do with some work.
For the first couple of years I contributed to Wikipedia (up to about the time David Gerard came on board) I was more concerned with avoiding copyright infringements than proper research & reference practices. (For example, if pressed about some of the material I contributed to the Classical & Roman Empire topics, I doubt I could do much more than to point at some of the standard references like the Oxford Classical Dictionary or some other books on my shelves & say "I think it came from somewhere in there.") I know I'm not the only person who did this: I remember encountering the hundreds of mythology-related articles that Tuff-Kat added in those days, & wondering which PD encyclopedia he had copied the information out of. (Not to pick on Tuf-Kat; AFAICS, all of that material looked reliable. And I could mention other, less salubrious examples of material copied into Wikipedia from those days.)
Geoff
-----Original Message----- From: wikien-l-bounces@Wikipedia.org [mailto:wikien-l-bounces@Wikipedia.org] On Behalf Of Geoff Burling
For the first couple of years I contributed to Wikipedia (up to about the time David Gerard came on board)
Ah yes, the good old days!
Pete, hoping someone is writing a history of this marvellous enterprise
On 1/30/06, Geoff Burling llywrch@agora.rdrop.com wrote:
Er, what is this "modern referencing standards" you are talking about? Is this some new thing that was voted on & made policy without much fanfare, or are you referring to some academic standard like the _MLA Handbook_?
Referring to a purely Wikipedia definition of 'modern' - in other words, up til about 2 years ago there was little push to give references. Sometimes we did, but not all that often. More often, we just linked a few relevant external links that were probably among where we got the info from.
There's a lot of articles I wrote in my first year on Wikipedia that I'd extensively reference if I did them now.
-Matt
This compares reasonably well with the results of my survey (http://en.wikipedia.org/wiki/User:Carnildo/The_100): about 15% of all articles are sourced. My methodology was a little different, as I counted anything in a "references" section as a reference, while inline links were collectively counted as one reference.
I'm still working on a survey of biography articles, but preliminary results are that slightly fewer biographies are sourced, but ones that are sourced tend to have more sources.
Nice! Are you interested in expanding it a bit, perhaps to include grades in various categories like adherence to MoS, PoV etc? It would be very interesting to get a few people together and regularly take samples of 1000 or so articles, to see whether things are getting better, worse etc.
Steve
On 1/31/06, Steve Bennett stevage@gmail.com wrote:
This compares reasonably well with the results of my survey (http://en.wikipedia.org/wiki/User:Carnildo/The_100): about 15% of all articles are sourced. My methodology was a little different, as I counted anything in a "references" section as a reference, while inline links were collectively counted as one reference.
I'm still working on a survey of biography articles, but preliminary results are that slightly fewer biographies are sourced, but ones that are sourced tend to have more sources.
Nice! Are you interested in expanding it a bit, perhaps to include grades in various categories like adherence to MoS, PoV etc? It would be very interesting to get a few people together and regularly take samples of 1000 or so articles, to see whether things are getting better, worse etc.
Nice is right; I love this discussion. We should start a proper article improvement project, run spot checks like this every few weeks, highlight 'good articles' and not just FAs, have regular ref and stub fixing contests...
SJ
Hi, There is of course [[WP:GA]] but it's very painful to nominate an article at the moment - you have to add a template to the article talk page, edit the GA page, potentially edit a template referenced in the GA page, and update the count at the top of GA. Simply to say that IMHO this article is "not bad".
Is there a list of reviewing proposals anywhere? I had a thought for one today whereby any user with at least 100 edits can add a review to any page they have never edited, giving scores out of 5 for comprehensiveness, sourcing, NPOV, and MoS adherence. Not a very thoroughly thought-through thought though...:)
(wow...what a last sentence...it must be late...)
Steve
On 1/31/06, SJ 2.718281828@gmail.com wrote:
Nice is right; I love this discussion. We should start a proper article improvement project, run spot checks like this every few weeks, highlight 'good articles' and not just FAs, have regular ref and stub fixing contests...
On 1/31/06, Steve Bennett stevage@gmail.com wrote:
Hi, There is of course [[WP:GA]] but it's very painful to nominate an article at the moment - you have to add a template to the article talk page, edit the GA page, potentially edit a template referenced in the GA page, and update the count at the top of GA. Simply to say that IMHO this article is "not bad".
Is there a list of reviewing proposals anywhere? I had a thought for one today whereby any user with at least 100 edits can add a review to any page they have never edited, giving scores out of 5 for comprehensiveness, sourcing, NPOV, and MoS adherence. Not a very thoroughly thought-through thought though...:)
(wow...what a last sentence...it must be late...)
Steve
Softwear exists. Currently not in use for a number of reasons.
-- geni
"geni" geniice@gmail.com wrote in message news:f80608430601311503x28b13a09v85f5172993c15c5d@mail.gmail.com... [snip]
Softwear exists. Currently not in use for a number of reasons.
I håd a nïce jümper büt the möõse åte it.
Phil Boswell wrote:
"geni" geniice@gmail.com wrote in message news:f80608430601311503x28b13a09v85f5172993c15c5d@mail.gmail.com... [snip]
Softwear exists. Currently not in use for a number of reasons.
I h嶟 a n騃e jper b the m皭se 廞e it.
The moose put pictograms in your jumper?
"Alphax (Wikipedia email)" alphasigmax@gmail.com wrote in message news:43E0A142.2010709@gmail.com...
Phil Boswell wrote: "geni" geniice@gmail.com wrote in message
news:f80608430601311503x28b13a09v85f5172993c15c5d@mail.gmail.com...
[snip] Softwear exists. Currently not in use for a number of reasons.
I h? a n?e j?per b? the m?se ?e it.
The moose put pictograms in your jumper?
Near enough for jazz, or indeed for government work :-)
The people responsible for encoding the previous message have been fired, BTW :-)