[Foundation-l] Re: Hosting scans of the 1911 Britannica onWikimedia

Wed Nov 9 20:14:33 UTC 2005

Anthony DiPierro wrote:

>On 11/9/05, Robert Scott Horning <robert_horning at netzero.net> wrote:
>  
>
>>Anthony DiPierro wrote:
>>    
>>
>>>There's a much lower threshold for editing on a wiki. You don't even
>>>      
>>>
>>have
>>    
>>
>>>to create an account. It might not be better, but I think it would be
>>>      
>>>
>>much
>>    
>>
>>>faster.
>>>I've used DP before and it seems to be a very closed project compared to
>>>Wikipedia. Like I said, I can't even find out where I can download a dump
>>>      
>>>
>>of
>>    
>>
>>>all the data.
>>>
>>>
>>>      
>>>
>>It isn't that closed of a project as you think. There is just a
>>heirarchy that you may not understand, and that you earn "privileges" to
>>do different aspects of DP much slower than comparable privileges on
>>Wikimedia projects. I've been able to participate very well and have
>>made some meaningful contributions, although I am not even close to an
>>uber geek type of person on the project.
>>    
>>
>
> If you're talking about hierarchies and privileges and such I think it's
>clear that it's a much more closed project than a wiki.
> You may very well have been able to make meaningful contributions even
>though you're not an uber-geek. I still think more people would be able to
>contribute more if you didn't even have to for instance create an account.
>
>I didn't say that it was easy to find the DP scanned images for
>  
>
>>Encyclopedia Britannica, but they are there if you want to get at them.
>>    
>>
>
> What does that mean? I want to get at them. They are "there". Where? I want
>them. That's what interested me in the post by Brian. He's got these scans
>that I've wanted for a long time.
>  
>
Nope, they are not "there". Only up to C is there. IICC, there are 23 
other letters in the alphabet, and most of them wont be there for quite 
a long while.

>I don't completely understand the reasoning for why DP doesn't want to
>  
>
>>let the scanned images be available widely and easily, but I suspect it
>>is something to do with bandwidth issues rather than copyright status.
>>The scanned pages take up quite a bit of server space, as you could
>>imagine. If you try to do a contribution for one of the current books
>>being proofread, you can see just how easy it is to get at the scanned
>>images in general, as that is precisely how they get you to contribute:
>>You look at the scanned page and then you either transcribe the text or
>>work on OCR'd content and try to make corrections. It is very tedious
>>work but something that is rewarding it its own way.
>>    
>>
>
> I've used the project before, and from my understanding you could only look
>at the scanned images that they chose to send you. You couldn't download
>whatever you wanted. Maybe that's changed, though.
>
>If you dig around a bit more, you can look up any project that is
>  
>
>>currently going through review, including Volumes 2-5 of the
>>Encyclopedia Brittanica. By finding the project page, you can then find
>>not only the proofread text but also the original scanned pages. It is
>>not going to be in a nice clean zipped bundle for you to use at random,
>>but you could download all of the scans onto your own hard drive if you
>>wanted. Nobody is going to stop you from doing that. Getting access to
>>these files is simply a matter of registering as a user and then looking
>>around. You don't even have to make an edit if you don't want to in
>>order to see these scans. This takes no "permission" from any admin or
>>any human-to-human interaction. That is why I was saying that it wasn't
>>too difficult to get at if you really wanted to get at them.
>>
>>--
>>Robert Scott Horning
>>    
>>
>
> I'll give it a try. Can you get all the images, or just the books still in
>progress, or just the pages from the books still in progress that haven't
>yet been processed?
> Actually, I'll probably just wait until this stuff gets put somewhere on
>Wikimedia.
>  
>
I disagree that the images should be on Commons. I was originally going 
to do that, but when it was suggested by one of the wikimedia-tech that 
they should be put on a separate subdomain, that seemed best. They will 
be a set of 30,000 images which will have html pages listing the indexes 
of every page in 1911EB, and linking to every scanned page. 
Alternatively, there will probably also be the option to download each 
volume. Dumping 30,000 unsorted images on Commons is not the solution. 
Presenting them on their own subdomain in an easily-accepted, 
highly-sorted manner is the solution.