[Wikisource-l] [Wikitech-l] Wikisource books and web 1.0 pages (was: pas de sujet)

Fri Aug 13 08:27:35 UTC 2010

On 08/11/2010 09:46 PM, Aryeh Gregor wrote:
> This seems like a very weird way to do things.  Why is the book being
> split up by page to begin with?  For optimal reading, you should put a
> lot more than one book-page's worth of content on each web page.

ThomasV will give the introduction to ProofreadPage and its
purpose. I will take a step back. A book is typically 40-400 pages,
because that is how much you can comfortably bind in one
volume (one spine) and sell as a commercial product. A web 1.0
(plain HTML + HTTP) page is typically a smaller chunk of
information, say 1-100 kbytes. To match (either in Wikisource
or Wikibooks) the idea of a book with web technology, the book
needs to split up, either according to physical book pages
(Wikisource with the ProofreadPage extension) or chapters
(Wikisource without ProofreadPage or Wikibooks).

In either case, the indiviual pages have a sequential relationship.
If you print the pages, you can glue them together and the sequence
makes sense, which is not the case with Wikipedia. Such pages have
links to the previous and next page in sequence (which Wikipedia
articles don't have).

Wikipedia, Wikibooks and Wikisource mostly use web 1.0 technology.
A very different approach to web browsing was taken when Google
Maps was launched in 2005, the poster project for the "web 2.0".
You arrive at the map site with a coordinate. From there, you can
pan in any direction and new parts of the map (called "tiles") are
downloaded by advanced JavaScript and XML (AJAX) calls as
you go. Your browser will never hold the entire map. It doesn't
matter how big the entire map is, just like it doesn't matter how
big the entire Wikipedia website is. The unit of information to fetch
is the "tile", just like the web 1.0 unit was the HTML page.

If we applied this web 2.0 principle to Wikibooks and Wikisource,
we wouldn't need to have pages with previous/next links. We could
just have smooth, continuous scrolling in one long sequence. Readers
could still arrive at a given coordinate (chapter or page), but
continue from there in any direction.

Examples of such user interfaces for books are Google Books and the
Internet Archive online reader. You can link to page 14 like this:
http://books.google.com/books?id=Z_ZLAAAAMAAJ&pg=PA14
and then scroll up (to page 13) or down (to page 15). The whole
book is never in your browser. New pages are AJAX loaded as they
are needed. It's like Google maps except that you can only pan in
two directions (one dimensions), not in the four cardinal directions.
And the zoom is more primitive here. After you have scrolled to page
19, you need to use the "Link" tool to know the new URL to link to.

At the Internet Archive, the user interface is similar, but the URL
in your browser is updated as you scroll (for better or worse),
http://www.archive.org/stream/devisesetembleme00lafeu#page/58/mode/1up

If we only have scanned images of book pages, this is simple enough,
because each scanned image is like a "tile" in Google maps. But in
Wikisource, we have also run OCR software to extract a text layer for
each page, and we have proofread that text to make it searchable.
I still have not learned JavaScript, but I guess you could make AJAX
calls for a chunk of text and add that to the scrollable web page, just
like you can add tiled images. Google has not done this, however. If
you switch to "plain text" viewing mode,
http://books.google.com/books?pg=PA14&id=Z_ZLAAAAMAAJ&output=text
you get traditional web 1.0 "pages" with links to the previous and
next web page. (Each of Google's text pages contains text from 5 book
pages, e.g. page 11-15, only to make things more confusing.)

But the real challenge comes when you want to wiki edit one such
chunk of scrollable text. I think it could work similar to our section
editing of a long Wikipedia article. But to be really elegant, I should
be able, when editing a section, to scroll up or down beyond the current
section, in an eternal textarea.

If we can solve this, "section editing 2.0" that goes outside of the box
(or maybe we should skip directly to WYSIWYG editing), then we can
have the beginning of a whole new Wikisource interface.

-- 
   Lars Aronsson (lars at aronsson.se)
   Aronsson Datateknik - http://aronsson.se