[Foundation-l] Re: Hosting scans of the 1911 Britannica onWikimedia

Wed Nov 9 18:32:43 UTC 2005

Anthony DiPierro wrote:

>On 11/9/05, Robert Scott Horning <robert_horning at netzero.net> wrote:
>  
>
>>Anthony DiPierro wrote:
>>    
>>
>>>Wikimedia Commons is the best place for images of text? If that's what
>>>you're saying, I disagree. I think maybe we were talking about two
>>>      
>>>
>>different
>>    
>>
>>>things, though.
>>>
>>>      
>>>
>>No, this is still the same issue. I'm not exactly sure where the best
>>place for scanned pages of historical text ought to go in this case.
>>The images themselves should be in commons, and perhaps as a temporary
>>"Wikiproject" within commons to extract those images might be useful to
>>have the full scanned pages available. Wikisource also has an image
>>repository independent of commons, so that may be more appropriate, but
>>that is something that ought to be decided within the Wikisource
>>community itself. Figures and engravings do need to go to Commons.
>>    
>>
>
> I think we're probably all in agreement that the processed images should go
>in the Commons. And the processed text should go into Wikisource. In the
>mean time, well, I don't think it really matters that much.
>
>
>  
>
>>>AFAIK Distributed Proofreaders hasn't released the raw images out to the
>>>public. If that's still the case, I'd say *that* is the reason for the
>>>      
>>>
>>slow
>>    
>>
>>>going. The wiki process would be much more efficient.
>>>      
>>>
>>It is not that difficult to get the raw image scans from Distributed
>>Proofreaders if you really want them.
>>    
>>
>
> How? I've looked for this before and couldn't find them. I just looked
>again a half hour ago and couldn't find them. If they're easy to get from
>DP, well, then I don't see the point in hosting them somewhere else. I guess
>there's the index files, which apparently DP doesn't have?
>
>They are not of the best quality
>  
>
>>(DP has other goals in mind) but they are usable for the purpose of
>>transcription of the text. I also fail to see how using a Wiki for
>>proofreading is going to be any better than what DP is doing.
>>    
>>
>
> There's a much lower threshold for editing on a wiki. You don't even have
>to create an account. It might not be better, but I think it would be much
>faster.
> I've used DP before and it seems to be a very closed project compared to
>Wikipedia. Like I said, I can't even find out where I can download a dump of
>all the data.
>  
>
It isn't that closed of a project as you think.  There is just a 
heirarchy that you may not understand, and that you earn "privileges" to 
do different aspects of DP much slower than comparable privileges on 
Wikimedia projects.  I've been able to participate very well and have 
made some meaningful contributions, although I am not even close to an 
uber geek type of person on the project.

I didn't say that it was easy to find the DP scanned images for 
Encyclopedia Britannica, but they are there if you want to get at them. 
 I don't completely understand the reasoning for why DP doesn't want to 
let the scanned images be available widely and easily, but I suspect it 
is something to do with bandwidth issues rather than copyright status. 
 The scanned pages take up quite a bit of server space, as you could 
imagine.  If you try to do a contribution for one of the current books 
being proofread, you can see just how easy it is to get at the scanned 
images in general, as that is precisely how they get you to contribute: 
 You look at the scanned page and then you either transcribe the text or 
work on OCR'd content and try to make corrections.  It is very tedious 
work but something that is rewarding it its own way.

If you dig around a bit more, you can look up any project that is 
currently going through review, including Volumes 2-5 of the 
Encyclopedia Brittanica.  By finding the project page, you can then find 
not only the proofread text but also the original scanned pages.  It is 
not going to be in a nice clean zipped bundle for you to use at random, 
but you could download all of the scans onto your own hard drive if you 
wanted.  Nobody is going to stop you from doing that.  Getting access to 
these files is simply a matter of registering as a user and then looking 
around.  You don't even have to make an edit if you don't want to in 
order to see these scans.  This takes no "permission" from any admin or 
any human-to-human interaction.  That is why I was saying that it wasn't 
too difficult to get at if you really wanted to get at them.

-- 
Robert Scott Horning