[WikiEN-l] EB1911 in Wikipedia

Magnus Manske magnusmanske at googlemail.com
Thu Jul 24 14:36:42 UTC 2008


On Thu, Jul 24, 2008 at 3:20 PM, John Vandenberg <jayvdb at gmail.com> wrote:
> Hi,
>
> I've been told that a large percentage of the EB1911 sits within the
> history of English Wikipedia, and a during a recent discussion about
> EB1911 here few checks indicate that this is possibly true, and that
> the EB1911 text imported into Wikipedia is from a decent
> transcription.  In the following very long discussion, there are a two
> tables consisting of five Wikipedia articles starting with "A" and
> "B", a link to the Wikipedia revision consisting of the EB1911 text, a
> link to the copy now on Wikisource, and a link to the pagescan (set up
> by Tim Starling):
>
> http://en.wikipedia.org/wiki/Wikipedia_talk:Plagiarism
>
> I am interested in piecing together the history of the EB1911 import,
> because if this was as extensive as some claim, hidden in Wikipedia is
> possibly the best and most complete available transcription of EB1911,
> and I would like to work out a good algorithm to pull it out and put
> it on Wikisource, which has slowly been building an online copy that
> is true to the original.  Or maybe we can find whoever imported it,
> and re-use the import files.

I think most EB1911 text was copied from
http://www.1911encyclopedia.org before it became a wiki.

[[Category:Wikipedia articles incorporating text from the 1911
Encyclopædia Britannica]] contains >12.000 articles, there's a start.

To find every such article, get the full dump, extact the earliest
version of each article, and grep for "1911" :-)

Magnus


More information about the WikiEN-l mailing list