[Foundation-l] Re: Hosting scans of the 1911 Britannica onWikimedia
Robert Scott Horning
robert_horning at netzero.net
Wed Nov 9 18:32:43 UTC 2005
Anthony DiPierro wrote:
>On 11/9/05, Robert Scott Horning <robert_horning at netzero.net> wrote:
>
>
>>Anthony DiPierro wrote:
>>
>>
>>>Wikimedia Commons is the best place for images of text? If that's what
>>>you're saying, I disagree. I think maybe we were talking about two
>>>
>>>
>>different
>>
>>
>>>things, though.
>>>
>>>
>>>
>>No, this is still the same issue. I'm not exactly sure where the best
>>place for scanned pages of historical text ought to go in this case.
>>The images themselves should be in commons, and perhaps as a temporary
>>"Wikiproject" within commons to extract those images might be useful to
>>have the full scanned pages available. Wikisource also has an image
>>repository independent of commons, so that may be more appropriate, but
>>that is something that ought to be decided within the Wikisource
>>community itself. Figures and engravings do need to go to Commons.
>>
>>
>
> I think we're probably all in agreement that the processed images should go
>in the Commons. And the processed text should go into Wikisource. In the
>mean time, well, I don't think it really matters that much.
>
>
>
>
>>>AFAIK Distributed Proofreaders hasn't released the raw images out to the
>>>public. If that's still the case, I'd say *that* is the reason for the
>>>
>>>
>>slow
>>
>>
>>>going. The wiki process would be much more efficient.
>>>
>>>
>>It is not that difficult to get the raw image scans from Distributed
>>Proofreaders if you really want them.
>>
>>
>
> How? I've looked for this before and couldn't find them. I just looked
>again a half hour ago and couldn't find them. If they're easy to get from
>DP, well, then I don't see the point in hosting them somewhere else. I guess
>there's the index files, which apparently DP doesn't have?
>
>They are not of the best quality
>
>
>>(DP has other goals in mind) but they are usable for the purpose of
>>transcription of the text. I also fail to see how using a Wiki for
>>proofreading is going to be any better than what DP is doing.
>>
>>
>
> There's a much lower threshold for editing on a wiki. You don't even have
>to create an account. It might not be better, but I think it would be much
>faster.
> I've used DP before and it seems to be a very closed project compared to
>Wikipedia. Like I said, I can't even find out where I can download a dump of
>all the data.
>
>
It isn't that closed of a project as you think. There is just a
heirarchy that you may not understand, and that you earn "privileges" to
do different aspects of DP much slower than comparable privileges on
Wikimedia projects. I've been able to participate very well and have
made some meaningful contributions, although I am not even close to an
uber geek type of person on the project.
I didn't say that it was easy to find the DP scanned images for
Encyclopedia Britannica, but they are there if you want to get at them.
I don't completely understand the reasoning for why DP doesn't want to
let the scanned images be available widely and easily, but I suspect it
is something to do with bandwidth issues rather than copyright status.
The scanned pages take up quite a bit of server space, as you could
imagine. If you try to do a contribution for one of the current books
being proofread, you can see just how easy it is to get at the scanned
images in general, as that is precisely how they get you to contribute:
You look at the scanned page and then you either transcribe the text or
work on OCR'd content and try to make corrections. It is very tedious
work but something that is rewarding it its own way.
If you dig around a bit more, you can look up any project that is
currently going through review, including Volumes 2-5 of the
Encyclopedia Brittanica. By finding the project page, you can then find
not only the proofread text but also the original scanned pages. It is
not going to be in a nice clean zipped bundle for you to use at random,
but you could download all of the scans onto your own hard drive if you
wanted. Nobody is going to stop you from doing that. Getting access to
these files is simply a matter of registering as a user and then looking
around. You don't even have to make an edit if you don't want to in
order to see these scans. This takes no "permission" from any admin or
any human-to-human interaction. That is why I was saying that it wasn't
too difficult to get at if you really wanted to get at them.
--
Robert Scott Horning
More information about the foundation-l
mailing list