At 06:29 PM 8/7/02 +0100, Neil Harris wrote:
This is a trimmed-down version of my earlier
over-length post.
All in all, a fine thing; below are a couple of specific suggestions for
further improvement.
I've now added an extra filter, so that entries
whose titles occur in a
very large list of common words are rejected.
That sounds useful--it'll save us from having to rewrite articles about
spices and such, and provide some useful stuff on less common topics.
Thus, the script will now not attempt to transfer the
entry for "Wheel"
or "Silk" or other common words, regardless of whether Wikipedia has an
entry for that word. This is in addition to the check for not clobbering
existing articles.
I have also eliminated any articles containing the words "modern" or
"current", which seems to catch a lot of stuff that refers to the
author's contemporary information.
Again, a good idea: the Easton stuff seems more useful as a source of
information about the Bible as a document than about the contemporary
Middle East.
I have also pushed the length filter up to 500
characters.
Doing all of these takes the list down to around 640 filtered articles.
Wiki links to the non-imported topics still remain, inviting Wikipedians
to write new articles about these topics. These remaning articles are
almost entirely about obscure figures and places from the Bible.
I intend to add a header to each imported article, reading something like:
''This is an entry from Easton's Bible Dictionary. The material in it
is written from the viewpoint of the 19th century, and may be
out-of-date or biased. Please review and edit this article to bring it
up to date''
Maybe that could be expanded to note what sort of 19th century viewpoint:
it's clearly Christian, but if it's a particular denomination, that's
relevant (I
can tell immediately that it wasn't put together by a 19th-century Jew,
let alone a Buddhist or atheist).
and a trailer:
From [[Easton's Bible Dictionary (1897)]]
I intend to drip-feed the finished articles in at a rate of one every 20
minutes, allowing lots of time for human review and assimilation, once
I think that there is a consensus that this is OK.
Can anyone suggest any further improvements, short of proof-reading all
1200 articles?
One practical fix: when I was editing the [[Amon]] article this morning, I
found that
it linked to itself in a couple of places. Can you tweak the script to not
create
self-links.
Also, someone's going to have to proofread the articles--and I'd rather it
be someone
with more of an interest in the matter than I have, if only because such a
person is
more likely to catch misspellings of names.
And why not resume at Q or something, instead of back at the beginning of the
alphabet?
--
Vicki Rosenzweig
vr(a)redbird.org
http://www.redbird.org