Re: [WikiEN-l] Thousands of awful articles on websites

6 Jan 2007


      Stan Shebs wrote:
...
Steve Block wrote:
...
Stan Shebs wrote:
...
...
How do we know reliable sources aren't lying.  Every newspaper and 
source in town reports Colin Montgomerie holed the winning putt in the 
2004 Ryder Cup, but it isn't true, Ian Poulter struck the putt that 
mathematically won the cup.  Monty's story simply made better press.  If 
we do, as you say, withhold judgement on whether a source is correct, 
why do you then say we can't use some sources because they may be lying. 
 Obviously some judgement is at play.
Well, if the source is lying, then by definition it's not reliable. Your 
example is an object lesson in how newspapers are intermediate in 
reliability; they are usually better than Joe Random's blog, but not as 
good as a scholarly monograph that has had multiple layers of review 
spread over multiple years.
No, you haven't as yet stated how we define which source is telling the 
"truth".  My understanding is that we don't do that.  If we don't do 
that, it stands to reason we don't determine which source "lies", since 
doing so indicates we have determined one source is telling the truth. 
My understanding is that we weigh each source in relation to the topic, 
and present it in a NPV.  This means we must judge whether the 
information is relevant, not if it is true.  It is relevant to note that 
Montgomerie is credited as hitting the winning putt.
Dangerously close to epistemology, but I'm not afraid. :-) I think we 
work with a fundamental underlying assumption that the facts are 
knowable, even if we don't know them at the moment. Otherwise, if 
somebody finds a hundred blogs stating that the sky is really green 
(perhaps as part of a prank), and runs to WP to add stuff about how the 
color of the sky is in dispute, we wouldn't have any basis for saying 
"dude, it was an April Fool's joke, we're not going add it". We've 
actually had a couple people in the past adding totally fictional 
events, sourced from websites that reported the fictional events as if 
they were real, and they got irate when we deleted their material, 
because they were convinced it was all real - or that was part of their 
prank, or possibly performance art, to this day I can't say for sure. So 
I exaggerated when I said we're just stenographers; we do have to do 
"research", if not "original research".
The assumption that the facts are knowable is necessary; without it we 
would be paralyzed.  When we say they are known we often give our 
statements an absolutism that may not be warranted; we need to accept 
that some facts may never be known.  There are degrees of knowability 
and balances of probabilities.  The savvy reader needs to be able to 
weigh that information himself.
The sky colour argument is not helpful.  It is based on personal 
empiricism, and the more abstract process of how we define colours.  In 
the context of the philosophy of science definitions are not falsifiable 
statements.
In the example about adding fictional events as if they were real we 
need to assume good faith, but that gives us more worrisome results than 
using Ockham's razor to justify bad faith.  Assuming bad faith makes the 
problem a lot simpler than it actually is; it allows us to say we _know_ 
what happened and to feel smug in that knowledge.  Urban legends are 
often based on good faith interpretations of bad faith comments.  The 
dangers of falling victim to such illusions and delusions are more 
obvious on the internet where the physical act of publishing anything is 
inexpensive, and the quantity of such publication is massive.  We should 
not be lulled into a sense of false security about everything in print.  
"The Da Vinci Code" is a published book.
I regret that we have many editors for whom "original research" is 
interpreted as "any research".  Both require hard work, but that 
similarity does not make them synonymous.  We have yet to face the full 
impact of a race of idiots solely dependent on the internet for all 
their information.
...
...
...
Seeing the references to that article, I can see where you're coming 
from. I personally would be very reluctant to, for instance, use 
rootsweb to source a death date, because not only do we have the problem 
of identifying *which* person of a given name is meant, but genealogy 
sites include all the standard howlers, like descents traced to Julius 
Caesar, which is only plausible if you don't realize how utterly corrupt 
the primary sources are for the Dark Ages. The "California Death 
Records" link comes up blank, the oldpoetry.com link has a single 
unsourced line affecting to be written in the first person, post-mortem 
("I lived from 1887-1972.") - kind of spooky actually. So some of your 
references illustrate well the reasons to be wary of primary sources.
No, they reflect your bias regarding those sources.  What you cite as 
spooky is simply the presentation style of a website.  What you write 
about the rootsweb site doesn't apply to the California Death Records, 
which are state records and not genealogical research which links me to 
Julius Caesar.  The rootsweb link is to show where I accessed the record 
from, not the source.  The source is the death record itself.  I'm open 
to hearing someone argue that I haven't used the source properly, but 
I'm not open to people declaring it is out of order because it was found 
on the web.
That's why I said "personally reluctant", not "against policy". After 
thirty years of doing scholarship, I thought I was pretty savvy, but 
even so have been caught out a bunch of times while working on WP. Just 
recently I used a web article that looked good, and it cited four print 
references, but then we got a note from the webmaster of the site that 
he was pulling the article because it was inaccurate, and making claims 
not separated by those references. Ironically, the corresponding de: 
article is sourced from newspapers of the time, so now I'm planning a 
little quality time in the microfilm department of the library to try to 
sort it all out.
It's all a matter of how effectively we and the end user can apply 
critical judgement.  No matter how hard we try to avoit it we all 
sometimes get caught up using sources that later prove to be 
unreliable.  No blame should attach to this, and the best we can hope 
for is a graceful exit.  Genealogical sites are a hotbed of good-faith 
incompetence where some "researchers" are pleased with a big collection 
of names.  If someone wants to use Rootsweb as a source that's fine.  
One still has to be mindful that there is a difference between a 
reference in Rootsweb to the California Death Records and actually 
reviewing the relevant record yourself.  Even official records can be 
wrong.  When an older person dies the people reporting the death may not 
have accurate information about the decedent's origins.  I'm sure that 
many birth records include questionable paternity.
I've had occasion to review the efforts of a recognized "professional" 
genealogist that got the generations mixed up.  The report had been 
prepared to support a claim by an individual to status as a Native 
American.  The information was used by a federal museum in Laramie, WY 
to misidentify a person in a photograph.
...
Despite all our warnings, people do take WP at face value; I found it 
very sobering to see some of my early mistakes propagated onto websites 
all over the net. Maybe I'm the only one bothered by that.
This is much deeper than just a Wikipedia problem.  Erroneous reportings 
of information can endure for centuries.  When The Borg assimilates a 
whole civilization it also assimilates its errors.  Can you personally 
afford to be bothered by this?
...
...
...
I bring these up not to try to disparage you, but because to me it's 
what is interesting about scholarship and Wikipedia. Which sources of 
information are good, which not so good, and why? If they're 
inconsistent, which is true, or are they both true if you interpret in a 
different way? I imagine that some day, if it hasn't already happened, a 
heated talk-page argument over some factual detail will inspire an 
expert to do some original research and then publish the findings - 
which we can then incorporate into the article originally in dispute.
I find this somewhat at odds with your thrust up until this point.  It 
seems to me this debate started when I asserted that web sources have 
value since the OED uses them.  If it is not your intention to declare 
web sources as inappropriate, then I fail to see how we have ended up 
where we are.  And if you didn't intend to disparage me, then I fail to 
see what your edit history has to do with anything.  I'm quite capable 
of judging an argument on its merits, thanks, and would hope my argument 
would be judged similarly too.  Let's not forget, these are only 
opinions, not facts, and we shouldn't be basing strict rules on 
subjective opinions.  I'm of the opinion sourcing is a "horses for 
courses" issue.
My argument is that web sources *may* have value, but people need to 
keep their hands on their wallets - we have a lot of editors getting 
their pockets picked and they don't even know it. Have you ever looked 
at the fine print in an unabridged OED? It's great stuff - those guys 
know about not only the English language proper, but every other human 
language, and all the historical context too. To them, a web usage is a 
raw data point, just like somebody's diary, or graffiti seen on a subway 
wall. You will never find an OED entry that cites a web page as an 
authority, it will always be in quoted form, as an example of observed 
usage. That is the crucial difference.
I'm sure that the OED will find ways of dealing with words which have 
joined the vocabulary   I certainly had the impression from chatting 
with Erin McKean at Wikimania that she is fully aware of the issues.  
Her "More Weird and Wonderful Words" (published by OUP) includes a 
"Webliography" as well as a more usual bibliography.  The practice of 
adding quotes as evidence did not begin with the OED.  My 1847 Webster 
and my 1817 Johnson have them.
I don't think that any strict general rule about either using or not 
using web sources would be meaningful.  This will be very disappointing 
to those who can't live without clear answers, and who will latch on to 
specific answers as a reason for excluding the opposing POV.
Ec

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [WikiEN-l] Thousands of awful articles on websites

Re: [WikiEN-l] Thousands of *awful* articles on websites

Re: [WikiEN-l] Thousands of awful articles on websites