[WikiEN-l] WP and Deep Web, was Re: Age fabrication and original research

Thu Oct 15 21:42:46 UTC 2009

Sorry to keep the thread going, but there's a couple of things that I
am going to long-windedly expound on... apologies in advance to people
who don't care.

On Sun, Oct 11, 2009 at 11:31 PM, stevertigo <stvrtg at gmail.com> wrote:
> phoebe ayers <phoebe.wiki at gmail.com> wrote:
>> *cough* librarians? *cough*
>> anyway, the way the page above is framed betrays the assumption that
>> finding sources is a much more clear-cut process than it is,
>> and that the only expertise required in neutrally evaluating a wide range of
>> texts about a particular (often obscure) topic is access to a
>> particular database of articles.
>
> Hm. May I also "betray the assumption" that we may want some input
> from those experienced in using some particular copyright access
> search engine? I mean, "clear-cut process" may it may not be.
> Searching through piles of books is certainly never easy, but are you
> saying that searching *cough* online databases of digitized text isn't
> any easier? I have to assume good faith, that people can in a certain
> way act online in a "librarian"-like capacity, or else learn by their
> mistakes how to do so.

They aren't "copyright access search engines". Gah. Library databases
are typically:

* digitized indexes of citations to materials in a particular field of
study, eg. engineering, law, biology, etc.
* those citations are typically to magazine articles, newspaper
articles, journal articles, and sometimes book chapters, conference
proceedings, government reports, and more. The materials referenced
might be in copyright, out of copyright, or somewhere nebulously in
between. Obviously, more recent stuff is generally copyrighted.
* The reason that for-pay library databases are useful is that they
represent the collective, long-term effort to capture citations to
literature in a particular area, often with human indexers manually
going through journal table of contents and the like. Large-scale
automatic efforts like Google Scholar are also useful, but they return
different results because the indexing is done automatically of stuff
available online, which isn't everything. Also they don't feature
things like manually-added indexing terms, etc.
* For-pay library databases may or may not include the full text of
the item to which the citation refers [they usually do not]; the item
may or may not be available in any form to the person doing the
search.

Therefore, as David said, there's a lot of steps involved in doing
good research:

* figuring out what the question is
* figuring out what sort of source might have the answer to that
question (a newspaper article, a book, a handbook?)
* figuring out where to look (e.g., I need books, therefore I shall
check worldcat)
* figuring out if you have access to the best place to look (worldcat
is free to all, hurrah!) [this can be more complicated than it sounds.
My library has access to ~500 databases, plus print stuff, plus the
internet].
* searching in that place; figuring out how to best search for that
question in that particular database (in worldcat, I can use LC
subject headings! but I sure can't in google scholar); iterating your
search until you find something, or not;
* rinse, wash and repeat for each possible database;
* figuring out if you can obtain your results (if the best result is a
book in the Swedish national library, 'm going to have a tough time
getting it; but hey, if this is for a Wikipedian in Sweden, they might
have better luck).

Add in Wikipedia criteria:
* prefer free online sources
* prefer npov sources
* prefer widely accessible sources

And repeat, for each and every referencing question. Which is why it
takes a while to source an article *properly*. And also, why the
answer to our sourcing problems is not really "let's buy some library
databases for everyone to access" [which ones? for what questions?
will it help at all, if you can't then get the materials referenced?].

>> Which is not to say that I wouldn't love to see a broad network of people
>> who love to work on sourcing problems, much in the same way we have a > broad network of copyeditors and speedy-deleters.
>> Perhaps trying to reinvigorate WikiProject Fact & Reference Check would > be a good idea.
>
> You appear to be involved at Project:Resource Exchange, which looks
> like its well on the right track, even though it also seems to be
> somewhat inactive. I and others might have a few ideas for how to
> tweak that project a little bit, and get it up and running. Some of
> the same points I've made above about availability and private
> communications are the obvious requirements -- open availability,
> private requests, code of conduct (works both ways), private returns.

I actually helped start the resource exchange, but am no longer active
because I have no interest in violating the license agreements of the
various publishers I work with [which generally e.g. prohibit doing
research for off-campus people]. I wish there were a better way to
deal with this. Besides pushing for open access, I haven't thought of
anything.

Maybe you could call it WikiProject:Research exchange, instead.

> The thing I suggest is to nuke Project:Librarians (which you also
> appear to be involved with) and merge those people into
> Project:Resources (note, move WP:LIB to ~Project:Resources). The
> reason being is that such a well-qualified group of people needs an
> actual purpose. WP:LIB/WP:REX seem like just that.

Well, Wp:Librarians is pretty inactive, but it's also not really meant
as a research servic; it's instead a group of people with a common
interest and profession (would you ask the wikiproject:lawyers to do
all of the legal support for wikipedia?) So while I'm sure many folks
there would be interested in your ideas and the proposed projects,
there's not 100% overlap.

> Also, these people obviously need a name. And "librarians" just might
> work, assuming that this new meaning can be integrated, or else the
> other meaning deprecated.

I hope the other meaning isn't deprecated. I'd be even closer to being
out of a job than I am already :P

-- phoebe

p.s. as Charles said, "How Wikipedia Works" is all GFDL and should be
100% available online. I got to it fine earlier; let me know if you
had trouble accessing it.

-- 
* I use this address for lists; send personal messages to phoebe.ayers
<at> gmail.com *