This is a trimmed-down version of my earlier over-length post.
I've now added an extra filter, so that entries whose titles occur in a very large list of common words are rejected.
Thus, the script will now not attempt to transfer the entry for "Wheel" or "Silk" or other common words, regardless of whether Wikipedia has an entry for that word. This is in addition to the check for not clobbering existing articles.
I have also eliminated any articles containing the words "modern" or "current", which seems to catch a lot of stuff that refers to the author's contemporary information.
I have also pushed the length filter up to 500 characters.
Doing all of these takes the list down to around 640 filtered articles. Wiki links to the non-imported topics still remain, inviting Wikipedians to write new articles about these topics. These remaning articles are almost entirely about obscure figures and places from the Bible.
I intend to add a header to each imported article, reading something like:
''This is an entry from Easton's Bible Dictionary. The material in it is written from the viewpoint of the 19th century, and may be out-of-date or biased. Please review and edit this article to bring it up to date''
and a trailer:
From [[Easton's Bible Dictionary (1897)]]
I intend to drip-feed the finished articles in at a rate of one every 20 minutes, allowing lots of time for human review and assimilation, once I think that there is a consensus that this is OK.
Can anyone suggest any further improvements, short of proof-reading all 1200 articles?
Neil
--------------------------------
Here are some of the results of the filtering of the original Easton's topics:
lines with the word TITLE at the start denote articles that passed; lines with the word BAD represent articles that failed to pass the filter, with the reason for failure.
BAD A = familiar word BAD A type Adam = no comma BAD AEnon = too short BAD Aaron = familiar word BAD Aaronites = too short BAD Abaddon = too short BAD Abagtha = no comma BAD Abana = modern TITLE 1 510 Abarim BAD Abba = familiar word BAD Abda = too short BAD Abdeel = too short BAD Abdi = no he BAD Abdiel = familiar word TITLE 2 745 Abdon BAD Abednego = too short BAD Abel = familiar word BAD Abel-beth-maachah = modern BAD Abel-cheramim = too short TITLE 3 551 Abel-meholah BAD Abel-mizraim = too short BAD Abel-shittim = too short BAD Abez = too short BAD Abi-albon = too short BAD Abia = too short BAD Abiasaph = too short TITLE 4 1872 Abiathar BAD Abib = too short BAD Abida = too short BAD Abidan = too short BAD Abieezer = too short BAD Abiel = too short BAD Abiezrite = too short BAD Abigail = familiar word BAD Abihail = too short TITLE 5 891 Abihu BAD Abihud = too short TITLE 6 2766 Abijah BAD Abijam = too short BAD Abilene = too short BAD Abimael = too short TITLE 7 3025 Abimelech TITLE 8 817 Abinadab BAD Abinoam = too short TITLE 9 574 Abiram TITLE 10 502 Abishag TITLE 11 911 Abishai BAD Abishua = too short BAD Abishur = too short BAD Abital = too short BAD Abitub = too short BAD Abjects = too short BAD Ablution = familiar word BAD Abner = familiar word BAD Abomination = familiar word BAD Abomination of Desolation = too short BAD Abraham = familiar word BAD Abraham's bosom = too short BAD Abram = familiar word BAD Abronah = too short BAD Absalom = familiar word BAD Acacia = familiar word TITLE 12 1943 Accad TITLE 13 574 Accho BAD Accuser = familiar word BAD Aceldama = modern TITLE 14 767 Achaia BAD Achaichus = too short TITLE 15 823 Achan BAD Achbor = too short TITLE 16 1026 Achish BAD Achmetha = familiar word BAD Achor = familiar word BAD Achsah = too short BAD Achshaph = modern BAD Achzib = modern BAD Acre = familiar word TITLE 17 5435 Acts of the Apostles BAD Adah = too short BAD Adam = familiar word BAD Adamah = modern BAD Adamant = familiar word BAD Adar = familiar word BAD Adbeel = too short BAD Addar = no he BAD Adder = familiar word BAD Addi = too short BAD Addon = too short BAD Adiel = familiar word BAD Adin = familiar word BAD Adina = no he BAD Adino = too short BAD Adjuration = familiar word BAD Admah = too short BAD Adnah = too short TITLE 18 1328 Adoni-zedec BAD Adonibezek = too short TITLE 19 984 Adonijah BAD Adonikam = too short BAD Adoniram = familiar word BAD Adoption = familiar word BAD Adoram = no he BAD Adore = familiar word BAD Adrammelech = familiar word BAD Adramyttium = too short BAD Adria = modern BAD Adriel = too short BAD Adullam = familiar word BAD Adullamite = familiar word BAD Adultery = familiar word TITLE 20 571 Adummim BAD Adversary = familiar word BAD Advocate = familiar word BAD Affection = familiar word BAD Affinity = familiar word BAD Afflictions = too short BAD Agabus = too short BAD Agag = familiar word BAD Agagite = too short BAD Agate = familiar word BAD Age = familiar word BAD Agee = familiar word BAD Agony = familiar word BAD Agriculture = familiar word TITLE 21 615 Agrippa I TITLE 22 655 Agrippa II BAD Ague = familiar word