Tim Starling wrote:
I wrote:
When Brian came on to IRC and asked us "What is the best way to upload 30,000 images requiring 6 GB to commons?" the reaction from Brion and I was a groan. The hardware requirements for commons are rapidly increasing, and uploading and storing such content in MediaWiki is inefficient and non-portable. If we had them in a separate directory on a separate domain, we could copy them from server to server, make tarballs, run batch conversion jobs -- all with a minimal amount of programming and system administration work. And it wouldn't require writing a bot to create 30,000 index pages, we could just write a hundred lines of PHP to index the whole lot. The collection will be easier to use and more reliable, and it will be easy to maintain and update the index pages.
All of the navigation text, the headers and footers, could be editable in wiki fashion. You could let anyone change the header that will be displayed on 30,000 pages, with no server strain whatsoever. This is in stark contrast to the system requirements of templates which are used on large numbers of wiki pages.
Wikisource has suffered so far due to a lack of specialised software. This kind of initiative could see it become more usable generally.
Come to think of it, I could probably do it as a MediaWiki extension, and embed this content in en.wikisource.org. You'd get all of the same features, but it would also appear to be integrated with the wiki. You wouldn't be able to edit the page images, but I don't think that's a desirable property anyway. It would be easy for someone to download the whole collection, run a processing script (say, automated correction of the scanning quality), and then upload the whole new collection and incorporate it into the wiki. Easy as in no bots, no screen scrapers, no server strain, just a tarball download and a tarball upload.
-- Tim Starling
That sounds like a good alternative to a separate domain or sticking it on Commons, as long as it doesn't require the tech crew to put in too many extra hours.