[Foundation-l] Re: Hosting scans of the 1911 Britannica on Wikimedia
Lars Aronsson
lars at aronsson.se
Wed Nov 9 23:34:54 UTC 2005
Tim Starling wrote:
> When Brian came on to IRC and asked us "What is the best way to
> upload 30,000 images requiring 6 GB to commons?" the reaction
> from Brion and I was a groan. The hardware requirements for
> commons are rapidly increasing, and uploading and storing such
> content in MediaWiki is inefficient and non-portable.
While I can understand your reaction, I think we should fix our
systems so they can handle these volumes without having to create
new exceptions. The existence of Wikisource could be questioned,
since there are already other projects (such as Project Gutenberg
and Distributed Proofreaders) that do this kind of work. But if
Wikisource is to exist, it should be capable of handling large
volumes (terabytes) of digitized text (and scanned images). It
cannot be that every new book requires a new project, because
Wikisource is unable to handle its size. Encyclopaedia Britannica
might be bigger than anything that is currently in Wikisource, but
just wait til someone suggests we digitize the Spanish
"Enciclopedia Universal Ilustrada" (70 fat volumes, 1908-1930),
which makes EB look tiny.
Andreas Grosz' scans of EB1911 have been available on DVD for more
than two years, so I see no immediate hurry for us to host it.
As far as I know, PGDP is doing a good work proofreading it, and
we could benefit from waiting for them to finish more of the work.
> If we had them in a separate directory on a separate domain,
Or if MediaWiki could handle separate directories on the same
domain...
The recent donation (and import) of 10,000 art images from
Directmedia GmbH to Wikimedia Commons put the system to its
limits. What if the next donation consists of a million images?
Or a million audio recordings? Dump them in a directory, supply
an index description in XML, and let MediaWiki use the data where
it is, instead of trying to stuff it into the MySQL database
through the wiki upload form.
> Wikisource has suffered so far due to a lack of specialised
> software. This kind of initiative could see it become more
> usable generally.
Or the specialization could be added to MediaWiki, so anybody
could benefit from it, not just Wikisource.
--
Lars Aronsson (lars at aronsson.se)
Aronsson Datateknik - http://aronsson.se
More information about the foundation-l
mailing list