On 11/23/2014 02:55 AM, Wiki Billinghurst wrote:
What do we see as the next components for Wikisource?
What are our major hurdles for system development?
If we were offered development help where do people think that we should be making use of that help? Is it incremental fixes, transactional changes, or are we wanting transformational changes, completely new features, and new opportunities?
Ten years ago, Wikipedia was already a given success, and we started to branch out into projects like Wikisource, Wikinews and what not. That was also when Google Book Search started, and when the Internet Archive got its current practices for book scanning (with the "Scribe" scanning stations) in place. Ten years earlier, in the mid 90s, the first large-scale book scanning projects appeared. In the two decades 1990-2010, several books were published on the future of digital libraries. But what has happened in the last decade? What is new, really? Has anything changed in Google Book Search or the Internet Archive in the five years 2010-2014? Yes, more books have been digitized, but are they presented or used differently?
I think a lot more can be done, e.g. algorithmic improvement of OCR engines. Wikisource hasn't looked into that, neither has the Internet Archive, and nobody knows much about what Google does internally. This isn't necessarily "wiki", so it's not clear that it's a task for WMF and its projects. Another thing could be "gamification" of proofreading or mark-up / categorization / analysis of scanned books.
As for new kinds of content, the digitization of entire newspapers is still a new area, where the Australian national library was a pioneer some years ago, but what has happened since then? Potentially, it could become a cross-over between Wikisource and Wikinews, where each event can be found on the same day in many different newspapers. How to link them together? The problem: If we get scanned images + OCR text of 10 different newspapers, 10 years, 10 pages each day, that is 365 × 10 × 10 × 10 = 365,000 large pages to proofread, before we can do any serious analysis. How do we proofread so many pages in any reasonable time? We don't have enough volunteers for that.