On 07/05/2010 10:45 AM, ThomasV wrote:
I do not agree with you on namespaces.
I think that the "Page" namespace is the best way to handle the separattion between the physical object (a book and its pages) and the logical object that we present to readers (the text, divided in sections or chapters)
I agree that having two structures (physical pages and chapters) is a challenge (I even wrote a paper about this, eleven years ago), but the introduction of the Page: namespace is not without problems.
Moving the physical structure to its own namespace is based on the assumption that a separate presentation structure (the chapters of the book) exists and is the more important one. But for odd formats such as dictionaries or newspapers, this is far from obvious.
Should each dictionary entry become a chapter of its own? One dictionary might have 150,000 entries.
For newspapers, it's easy to agree that each major news article can be a chapter of its own. But maybe not each small advertisement? Should the whole ad section be one chapter? People might want to search these ads. They can be far more important to current readers than the news. So treating them as whitespace is not a solution.
In such cases, it might be best to just proofread the physical page and keep it as it is. Even for ordinary books, while they are being proofread, which can extend for months or years, only the physical structure exists. But then the Page: is what will be exposed to the public reader. So maybe it should be dressed up with the green {{header}} to look nicer?
Already today, we have the problem that searches using the site's own search box will show content in the Page: namespace, rather than the transcluded chapters in the main namespace (related to https://bugzilla.wikimedia.org/show_bug.cgi?id=18861 ) and there are no links from the found Page: to the chapters that transclude its text, unless you bother to use "what links here".
Wikistats (Erik Zachte) also reports user activity based on the main namespace. It's odd that on Wikisource, the "other" namespaces have far more editing activity than the main one.
For a dictionary, creating 150,000 tiny pages that each transclude 2 lines of text is not a good match for the current wiki technology. Having dozens of <section.../> tags in each page, will also look very clumsy. It would be comfortable if the section markers were much smaller, and treated like anchor points. Search should also return the closest preceding anchor point (even if that is on a preceding page), rather than the page URL.
The Bible, being one of the oldest texts on Wikisource, is a good test case. It consists of 2 testaments, 66 books, 1189 chapters and 31,103 verses. When printed on paper, it typically fits on 1200 physical pages. Today we typically create one wiki page per book or per chapter, e.g. http://en.wikisource.org/wiki/Bible_(King_James)/Matthew and this is what turns up in searches, since it was imported from existing e-texts, rather than being proofread in Wikisource. These 66 or 1189 wiki pages have headlines for each chapter and anchor points for each verse, but these are not presented in the search results. Imagine you could search "candle under bushel" and up comes "Matthew 5:15", even if you had a proofread but not yet transcluded version divided into 1200 wiki pages in the Page: namespace. Today search turns up things such as "Page:The Granite Monthly Volume 5.djvu/82", which simply isn't pretty.
In my eyes, this means: 1) many problems (e.g. search) are generic problems, not connected with ProofreadPage, and 2) the existing ProofreadPage (PR2) may work okay for traditional books with chapters, but it can also co-exist with an alternative ProofreadPage that works better for dictionaries and newspapers.
Next, consider digitizing old maps with Wikisource, and matching them (through coordinate transformation) with OpenStreetMap.