On Thu, Jul 24, 2008 at 3:20 PM, John Vandenberg jayvdb@gmail.com wrote:
Hi,
I've been told that a large percentage of the EB1911 sits within the history of English Wikipedia, and a during a recent discussion about EB1911 here few checks indicate that this is possibly true, and that the EB1911 text imported into Wikipedia is from a decent transcription. In the following very long discussion, there are a two tables consisting of five Wikipedia articles starting with "A" and "B", a link to the Wikipedia revision consisting of the EB1911 text, a link to the copy now on Wikisource, and a link to the pagescan (set up by Tim Starling):
http://en.wikipedia.org/wiki/Wikipedia_talk:Plagiarism
I am interested in piecing together the history of the EB1911 import, because if this was as extensive as some claim, hidden in Wikipedia is possibly the best and most complete available transcription of EB1911, and I would like to work out a good algorithm to pull it out and put it on Wikisource, which has slowly been building an online copy that is true to the original. Or maybe we can find whoever imported it, and re-use the import files.
I think most EB1911 text was copied from http://www.1911encyclopedia.org before it became a wiki.
[[Category:Wikipedia articles incorporating text from the 1911 Encyclopædia Britannica]] contains >12.000 articles, there's a start.
To find every such article, get the full dump, extact the earliest version of each article, and grep for "1911" :-)
Magnus