[Foundation-l] Re: Hosting scans of the 1911 Britannica on Wikimedia

Lars Aronsson lars at aronsson.se
Wed Nov 9 23:34:54 UTC 2005


Tim Starling wrote:

> When Brian came on to IRC and asked us "What is the best way to 
> upload 30,000 images requiring 6 GB to commons?" the reaction 
> from Brion and I was a groan. The hardware requirements for 
> commons are rapidly increasing, and uploading and storing such 
> content in MediaWiki is inefficient and non-portable.

While I can understand your reaction, I think we should fix our 
systems so they can handle these volumes without having to create 
new exceptions.  The existence of Wikisource could be questioned, 
since there are already other projects (such as Project Gutenberg 
and Distributed Proofreaders) that do this kind of work.  But if 
Wikisource is to exist, it should be capable of handling large 
volumes (terabytes) of digitized text (and scanned images).  It 
cannot be that every new book requires a new project, because 
Wikisource is unable to handle its size.  Encyclopaedia Britannica 
might be bigger than anything that is currently in Wikisource, but 
just wait til someone suggests we digitize the Spanish 
"Enciclopedia Universal Ilustrada" (70 fat volumes, 1908-1930), 
which makes EB look tiny.

Andreas Grosz' scans of EB1911 have been available on DVD for more 
than two years, so I see no immediate hurry for us to host it.  
As far as I know, PGDP is doing a good work proofreading it, and 
we could benefit from waiting for them to finish more of the work.

> If we had them in a separate directory on a separate domain,

Or if MediaWiki could handle separate directories on the same 
domain...

The recent donation (and import) of 10,000 art images from 
Directmedia GmbH to Wikimedia Commons put the system to its 
limits.  What if the next donation consists of a million images?  
Or a million audio recordings?  Dump them in a directory, supply 
an index description in XML, and let MediaWiki use the data where 
it is, instead of trying to stuff it into the MySQL database 
through the wiki upload form.

> Wikisource has suffered so far due to a lack of specialised 
> software.  This kind of initiative could see it become more 
> usable generally.

Or the specialization could be added to MediaWiki, so anybody 
could benefit from it, not just Wikisource.


-- 
  Lars Aronsson (lars at aronsson.se)
  Aronsson Datateknik - http://aronsson.se



More information about the foundation-l mailing list