Samuel Klein wrote:
I think we agree on what needs to happen. The only thing I am not sure of is where you would like to see the work take place.
I'm not so sure we agree. I think we're talking about two different things.
This thread started out with a discussion of why it is so hard to start new projects within the Wikimedia Foundation. My stance is that projects like OpenStreetMap.org and OpenLibrary.org are doing fine as they are, and there is no need to duplicate their effort within the WMF. The example you gave was this:
*A wiki for book metadata, with an entry for every published work, statistics about its use and siblings, and discussion about its usefulness as a citation (a collaboration with OpenLibrary, merging WikiCite ideas)
To me, that sounds exactly as what OpenLibrary already does (or could be doing in the near time), so why even set up a new project that would collaborate with it? Later you added:
I could see this happening on Wikisource.
That's when I asked why this couldn't be done inside OpenLibrary.
I added:
(Plus you would have to motivate why a copy of OpenLibrary should go into the English Wikisource and not the German or French one.)
You replied:
I don't understand what you mean -- English source materials and metadata go on en:ws, German on de:ws, &c. How is this different from what happens today?
I was talking about the metadata for all books ever published, including the Swedish translations of Mark Twain's works, which are part of Mark Twain's bibliography, of the translator's bibliography, of American literature, and of Swedish language literature. In OpenLibrary all of these are contained in one project. In Wikisource, they are split in one section for English and another section for Swedish. That division makes sense for the contents of the book, but not for the book metadata.
Now you write:
However, the project I have in mind for OCR cleaning and translation needs to
That is a change of subject. That sounds just like what Wikisource (or PGDP.net) is about. OCR cleaning is one thing, but it is an entirely different thing to set up "a wiki for book metadata, with an entry for every published work". So which of these two project ideas are we talking about?
Every book ever published means more than 10 million records. (It probably means more than 100 million records.) OCR cleaning attracts hundreds or a few thousand volunteers, which is sufficient to take on thousands of books, but not millions.
Google scanned millions of books already, but I haven't heard of any plans for cleaning all that OCR text.
Let's take a practical example. A classics professor I know (Greg Crane, copied here) has scans of primary source materials, some with approximate or hand-polished OCR, waiting to be uploaded and converted into a useful online resource for editors, translators, and classicists around the world.
Where should he and his students post that material?
On Wikisource. What's stopping them?