[WikiEN-l] WP and Deep Web, was Re: Age fabrication and original research
Charles Matthews
charles.r.matthews at ntlworld.com
Thu Oct 8 18:40:10 UTC 2009
David Goodman wrote:
> Quite apart from the incredible range available from a research
> library, the great majority of Wikipedians, even experienced ones, do
> not use even those sources which are made available free from local
> public libraries to residents. Many seem not to even think about using
> anything free on the internet except that reachable through the
> Googles. if Google News reports a newspaper or magazine behind a pay
> wall, they do not even think of looking for it in other databases or
> web sites that they may have available.
David's issue here is something he describes as familiar generally to
librarians. It does seem to me to be a hybrid of that one (leading the
horse to the reference library water is not the same as having the horse
drink), with another one. Tim Berners-Lee is apparently interested in
the [[Deep Web]], which is to a first approximation what you can't
Google for, but is out there. One clear cause is online databases, where
if the webcrawler can't think up a good query, the potential web page
answer won't get reported.
I was thinking about this more obliquely, because of my current
interests: another couple of causes occur to me. There are texts online
which are reference material, but need proof-reading (tell me about it)
before the text is accurate enough for the search term to be there "in
clear". And (as I found out just now) there are texts online that are
downloads that are huge files. I've just looked at a PDF that is over
500 Mb. Both these issues are obvious to me as user of archive.org.
There is a route for information to migrate onto the Web as
book -> scan -> post to archive.org.
Which is fruitful and gets it "out there". It happens that for reference
information our model is more useful by a factor of at least 1000 (you
can check the figures for archive.org downloads).
So, the deeper Web needs "dredging" work before such things turn up on
most people's first page of search engine hits. I'd quite agree with
David that simply using the "shallow Web" and moving information from
one part of it to another is not the only thing research for WP should
be about. It seems to me that during Wikipedia's second decade we'll
need to become more thoughtful about what is involved. (In Wikisource
terms, for example, it would be great to see development of that project
as the "reference Commons", matching the function the Commons serves for
media files. But that's a potentially divisive idea, since it is already
a "free library" with its own mission.)
Charles
More information about the WikiEN-l
mailing list