[WikiEN-l] Thousands of *awful* articles on websites

Ray Saintonge saintonge at telus.net
Sat Jan 6 22:44:45 UTC 2007


Stan Shebs wrote:

>Steve Block wrote:
>  
>
>>Stan Shebs wrote:
>>    
>>
>>>>How do we know reliable sources aren't lying.  Every newspaper and 
>>>>source in town reports Colin Montgomerie holed the winning putt in the 
>>>>2004 Ryder Cup, but it isn't true, Ian Poulter struck the putt that 
>>>>mathematically won the cup.  Monty's story simply made better press.  If 
>>>>we do, as you say, withhold judgement on whether a source is correct, 
>>>>why do you then say we can't use some sources because they may be lying. 
>>>>  Obviously some judgement is at play.
>>>>        
>>>>
>>>Well, if the source is lying, then by definition it's not reliable. Your 
>>>example is an object lesson in how newspapers are intermediate in 
>>>reliability; they are usually better than Joe Random's blog, but not as 
>>>good as a scholarly monograph that has had multiple layers of review 
>>>spread over multiple years.
>>>      
>>>
>>No, you haven't as yet stated how we define which source is telling the 
>>"truth".  My understanding is that we don't do that.  If we don't do 
>>that, it stands to reason we don't determine which source "lies", since 
>>doing so indicates we have determined one source is telling the truth. 
>>My understanding is that we weigh each source in relation to the topic, 
>>and present it in a NPV.  This means we must judge whether the 
>>information is relevant, not if it is true.  It is relevant to note that 
>>Montgomerie is credited as hitting the winning putt.
>>    
>>
>Dangerously close to epistemology, but I'm not afraid. :-) I think we 
>work with a fundamental underlying assumption that the facts are 
>knowable, even if we don't know them at the moment. Otherwise, if 
>somebody finds a hundred blogs stating that the sky is really green 
>(perhaps as part of a prank), and runs to WP to add stuff about how the 
>color of the sky is in dispute, we wouldn't have any basis for saying 
>"dude, it was an April Fool's joke, we're not going add it". We've 
>actually had a couple people in the past adding totally fictional 
>events, sourced from websites that reported the fictional events as if 
>they were real, and they got irate when we deleted their material, 
>because they were convinced it was all real - or that was part of their 
>prank, or possibly performance art, to this day I can't say for sure. So 
>I exaggerated when I said we're just stenographers; we do have to do 
>"research", if not "original research".
>
The assumption that the facts are knowable is necessary; without it we 
would be paralyzed.  When we say they are known we often give our 
statements an absolutism that may not be warranted; we need to accept 
that some facts may never be known.  There are degrees of knowability 
and balances of probabilities.  The savvy reader needs to be able to 
weigh that information himself.

The sky colour argument is not helpful.  It is based on personal 
empiricism, and the more abstract process of how we define colours.  In 
the context of the philosophy of science definitions are not falsifiable 
statements.

In the example about adding fictional events as if they were real we 
need to assume good faith, but that gives us more worrisome results than 
using Ockham's razor to justify bad faith.  Assuming bad faith makes the 
problem a lot simpler than it actually is; it allows us to say we _know_ 
what happened and to feel smug in that knowledge.  Urban legends are 
often based on good faith interpretations of bad faith comments.  The 
dangers of falling victim to such illusions and delusions are more 
obvious on the internet where the physical act of publishing anything is 
inexpensive, and the quantity of such publication is massive.  We should 
not be lulled into a sense of false security about everything in print.  
"The Da Vinci Code" is a published book.

I regret that we have many editors for whom "original research" is 
interpreted as "any research".  Both require hard work, but that 
similarity does not make them synonymous.  We have yet to face the full 
impact of a race of idiots solely dependent on the internet for all 
their information.

>>>Seeing the references to that article, I can see where you're coming 
>>>from. I personally would be very reluctant to, for instance, use 
>>>rootsweb to source a death date, because not only do we have the problem 
>>>of identifying *which* person of a given name is meant, but genealogy 
>>>sites include all the standard howlers, like descents traced to Julius 
>>>Caesar, which is only plausible if you don't realize how utterly corrupt 
>>>the primary sources are for the Dark Ages. The "California Death 
>>>Records" link comes up blank, the oldpoetry.com link has a single 
>>>unsourced line affecting to be written in the first person, post-mortem 
>>>("I lived from 1887-1972.") - kind of spooky actually. So some of your 
>>>references illustrate well the reasons to be wary of primary sources.
>>>      
>>>
>>No, they reflect your bias regarding those sources.  What you cite as 
>>spooky is simply the presentation style of a website.  What you write 
>>about the rootsweb site doesn't apply to the California Death Records, 
>>which are state records and not genealogical research which links me to 
>>Julius Caesar.  The rootsweb link is to show where I accessed the record 
>>from, not the source.  The source is the death record itself.  I'm open 
>>to hearing someone argue that I haven't used the source properly, but 
>>I'm not open to people declaring it is out of order because it was found 
>>on the web.
>>    
>>
>That's why I said "personally reluctant", not "against policy". After 
>thirty years of doing scholarship, I thought I was pretty savvy, but 
>even so have been caught out a bunch of times while working on WP. Just 
>recently I used a web article that looked good, and it cited four print 
>references, but then we got a note from the webmaster of the site that 
>he was pulling the article because it was inaccurate, and making claims 
>not separated by those references. Ironically, the corresponding de: 
>article is sourced from newspapers of the time, so now I'm planning a 
>little quality time in the microfilm department of the library to try to 
>sort it all out.
>
It's all a matter of how effectively we and the end user can apply 
critical judgement.  No matter how hard we try to avoit it we all 
sometimes get caught up using sources that later prove to be 
unreliable.  No blame should attach to this, and the best we can hope 
for is a graceful exit.  Genealogical sites are a hotbed of good-faith 
incompetence where some "researchers" are pleased with a big collection 
of names.  If someone wants to use Rootsweb as a source that's fine.  
One still has to be mindful that there is a difference between a 
reference in Rootsweb to the California Death Records and actually 
reviewing the relevant record yourself.  Even official records can be 
wrong.  When an older person dies the people reporting the death may not 
have accurate information about the decedent's origins.  I'm sure that 
many birth records include questionable paternity.

I've had occasion to review the efforts of a recognized "professional" 
genealogist that got the generations mixed up.  The report had been 
prepared to support a claim by an individual to status as a Native 
American.  The information was used by a federal museum in Laramie, WY 
to misidentify a person in a photograph.

>Despite all our warnings, people do take WP at face value; I found it 
>very sobering to see some of my early mistakes propagated onto websites 
>all over the net. Maybe I'm the only one bothered by that.
>
This is much deeper than just a Wikipedia problem.  Erroneous reportings 
of information can endure for centuries.  When The Borg assimilates a 
whole civilization it also assimilates its errors.  Can you personally 
afford to be bothered by this?

>>>I bring these up not to try to disparage you, but because to me it's 
>>>what is interesting about scholarship and Wikipedia. Which sources of 
>>>information are good, which not so good, and why? If they're 
>>>inconsistent, which is true, or are they both true if you interpret in a 
>>>different way? I imagine that some day, if it hasn't already happened, a 
>>>heated talk-page argument over some factual detail will inspire an 
>>>expert to do some original research and then publish the findings - 
>>>which we can then incorporate into the article originally in dispute.
>>>      
>>>
>>I find this somewhat at odds with your thrust up until this point.  It 
>>seems to me this debate started when I asserted that web sources have 
>>value since the OED uses them.  If it is not your intention to declare 
>>web sources as inappropriate, then I fail to see how we have ended up 
>>where we are.  And if you didn't intend to disparage me, then I fail to 
>>see what your edit history has to do with anything.  I'm quite capable 
>>of judging an argument on its merits, thanks, and would hope my argument 
>>would be judged similarly too.  Let's not forget, these are only 
>>opinions, not facts, and we shouldn't be basing strict rules on 
>>subjective opinions.  I'm of the opinion sourcing is a "horses for 
>>courses" issue.
>>    
>>
>My argument is that web sources *may* have value, but people need to 
>keep their hands on their wallets - we have a lot of editors getting 
>their pockets picked and they don't even know it. Have you ever looked 
>at the fine print in an unabridged OED? It's great stuff - those guys 
>know about not only the English language proper, but every other human 
>language, and all the historical context too. To them, a web usage is a 
>raw data point, just like somebody's diary, or graffiti seen on a subway 
>wall. You will never find an OED entry that cites a web page as an 
>authority, it will always be in quoted form, as an example of observed 
>usage. That is the crucial difference.
>
I'm sure that the OED will find ways of dealing with words which have 
joined the vocabulary   I certainly had the impression from chatting 
with Erin McKean at Wikimania that she is fully aware of the issues.  
Her "More Weird and Wonderful Words" (published by OUP) includes a 
"Webliography" as well as a more usual bibliography.  The practice of 
adding quotes as evidence did not begin with the OED.  My 1847 Webster 
and my 1817 Johnson have them.

I don't think that any strict general rule about either using or not 
using web sources would be meaningful.  This will be very disappointing 
to those who can't live without clear answers, and who will latch on to 
specific answers as a reason for excluding the opposing POV.

Ec




More information about the WikiEN-l mailing list