[Wikipedia-l] Easton's Bible Dictionary (resend)

Neil Harris usenet at tonal.clara.co.uk
Wed Aug 7 17:29:16 UTC 2002

This is a trimmed-down version of my earlier over-length post.

I've now added an extra filter, so that entries whose titles  occur in a
very large list of common words are rejected.

Thus, the script will now not attempt to transfer the entry for "Wheel"
or "Silk" or other common words, regardless of whether Wikipedia has an
entry for that word. This is in addition to the check for not clobbering
existing articles.

I have also eliminated any articles containing the words "modern" or
"current", which seems to catch a lot of stuff that refers to the
author's contemporary information.

I have also pushed the length filter up to 500 characters.

Doing all of these takes the list down to around 640 filtered articles.
Wiki links to the non-imported topics still remain, inviting Wikipedians
to write new articles about these topics. These remaning articles are
almost entirely about obscure figures and places from the Bible.

I intend to add a header to each imported article, reading something like:

     ''This is an entry from Easton's Bible Dictionary. The material in it
is written from the viewpoint of the 19th century, and may be
out-of-date or biased. Please review and edit this article to bring it
up to date''

and a trailer:

     From [[Easton's Bible Dictionary (1897)]]

I intend to drip-feed the finished articles in at a rate of one every 20 
minutes, allowing lots of time for human review and assimilation, once I 
  think that there is a consensus that this is OK.

Can anyone suggest any further improvements, short of proof-reading all
1200 articles?



Here are some of the results of the filtering of the original Easton's 

lines with the word TITLE at the start denote articles that passed;
lines with the word BAD represent articles that failed to pass the
filter, with the reason for failure.

BAD A = familiar word
BAD A type Adam = no comma
BAD AEnon = too short
BAD Aaron = familiar word
BAD Aaronites = too short
BAD Abaddon = too short
BAD Abagtha = no comma
BAD Abana = modern
TITLE 1 510 Abarim
BAD Abba = familiar word
BAD Abda = too short
BAD Abdeel = too short
BAD Abdi = no he
BAD Abdiel = familiar word
TITLE 2 745 Abdon
BAD Abednego = too short
BAD Abel = familiar word
BAD Abel-beth-maachah = modern
BAD Abel-cheramim = too short
TITLE 3 551 Abel-meholah
BAD Abel-mizraim = too short
BAD Abel-shittim = too short
BAD Abez = too short
BAD Abi-albon = too short
BAD Abia = too short
BAD Abiasaph = too short
TITLE 4 1872 Abiathar
BAD Abib = too short
BAD Abida = too short
BAD Abidan = too short
BAD Abieezer = too short
BAD Abiel = too short
BAD Abiezrite = too short
BAD Abigail = familiar word
BAD Abihail = too short
TITLE 5 891 Abihu
BAD Abihud = too short
TITLE 6 2766 Abijah
BAD Abijam = too short
BAD Abilene = too short
BAD Abimael = too short
TITLE 7 3025 Abimelech
TITLE 8 817 Abinadab
BAD Abinoam = too short
TITLE 9 574 Abiram
TITLE 10 502 Abishag
TITLE 11 911 Abishai
BAD Abishua = too short
BAD Abishur = too short
BAD Abital = too short
BAD Abitub = too short
BAD Abjects = too short
BAD Ablution = familiar word
BAD Abner = familiar word
BAD Abomination = familiar word
BAD Abomination of Desolation = too short
BAD Abraham = familiar word
BAD Abraham's bosom = too short
BAD Abram = familiar word
BAD Abronah = too short
BAD Absalom = familiar word
BAD Acacia = familiar word
TITLE 12 1943 Accad
TITLE 13 574 Accho
BAD Accuser = familiar word
BAD Aceldama = modern
TITLE 14 767 Achaia
BAD Achaichus = too short
TITLE 15 823 Achan
BAD Achbor = too short
TITLE 16 1026 Achish
BAD Achmetha = familiar word
BAD Achor = familiar word
BAD Achsah = too short
BAD Achshaph = modern
BAD Achzib = modern
BAD Acre = familiar word
TITLE 17 5435 Acts of the Apostles
BAD Adah = too short
BAD Adam = familiar word
BAD Adamah = modern
BAD Adamant = familiar word
BAD Adar = familiar word
BAD Adbeel = too short
BAD Addar = no he
BAD Adder = familiar word
BAD Addi = too short
BAD Addon = too short
BAD Adiel = familiar word
BAD Adin = familiar word
BAD Adina = no he
BAD Adino = too short
BAD Adjuration = familiar word
BAD Admah = too short
BAD Adnah = too short
TITLE 18 1328 Adoni-zedec
BAD Adonibezek = too short
TITLE 19 984 Adonijah
BAD Adonikam = too short
BAD Adoniram = familiar word
BAD Adoption = familiar word
BAD Adoram = no he
BAD Adore = familiar word
BAD Adrammelech = familiar word
BAD Adramyttium = too short
BAD Adria = modern
BAD Adriel = too short
BAD Adullam = familiar word
BAD Adullamite = familiar word
BAD Adultery = familiar word
TITLE 20 571 Adummim
BAD Adversary = familiar word
BAD Advocate = familiar word
BAD Affection = familiar word
BAD Affinity = familiar word
BAD Afflictions = too short
BAD Agabus = too short
BAD Agag = familiar word
BAD Agagite = too short
BAD Agate = familiar word
BAD Age = familiar word
BAD Agee = familiar word
BAD Agony = familiar word
BAD Agriculture = familiar word
TITLE 21 615 Agrippa I
TITLE 22 655 Agrippa II
BAD Ague = familiar word

More information about the Wikipedia-l mailing list