I would like to extend the syntax of the <ref> tag (Cite extension), in order to deal with footnotes that are spread on several transcluded pages. Since the Cite extension is widely used, I guess I better ask here first.
Here is an illustration of the problem : http://en.wikisource.org/wiki/Page:Robert_the_Bruce_and_the_struggle_for_Sco...
On the bottom of the scan you can see the second half of a footnote. That footnote begins at the previous page : http://en.wikisource.org/wiki/Page:Robert_the_Bruce_and_the_struggle_for_Sco...
Wikisourcers currently have no way to deal with these cases in a clean way. I have written a patch for this (the code is here : http://dpaste.org/QOMH/ ). This patch extends the "ref" syntax by adding a "follow" parameter, like this :
<ref follow="foo">bar</ref>
After two pages are transcluded, the wikitext passed to the parser will look like this :
blah blah blah blah blah blah<ref name="note1">beginning of note 1</ref> blah blah blah blah blah blah blah blah blah<ref follow="note1">end of note</ref> blah blah blah
This wikitext is rendered as a single footnote, located in the text at the position of the parent <ref>. If the parent <ref> is not found (as is the case when you render only the second page), then the text inside the tag is rendered at the beginning of the list of references, with no number and no link.
does this make sense ?
Thomas
That seems like a pretty radical modification for a relatively rare use case.
Would it make more sense to just refer to multiple pages with a single URL?
e.g. use this as the reference for the whole lot:
http://en.wikisource.org/wiki/Page:Robert_the_Bruce_and_the_struggle_for_Sco...
As for what you get when you click on that ref link, the simplest thing is to just do nothing with the pp=48-49, since you already have a "next page" link. It's effectively just an indication to the reader.
If you want to get extra fancy perhaps you can use the value of pp= to insert a template above the page that says "Reference is spread over pages _48_, _49_". The simplest thing would be just to list out the pages, and if you ever get more than 20 or something you can start doing the kind of pagination links that you see on sites like Google. For obvious reasons only consecutive pages should be allowed here.
On 8/11/10 8:43 AM, ThomasV wrote:
I would like to extend the syntax of the<ref> tag (Cite extension), in order to deal with footnotes that are spread on several transcluded pages. Since the Cite extension is widely used, I guess I better ask here first.
Here is an illustration of the problem : http://en.wikisource.org/wiki/Page:Robert_the_Bruce_and_the_struggle_for_Sco...
On the bottom of the scan you can see the second half of a footnote. That footnote begins at the previous page : http://en.wikisource.org/wiki/Page:Robert_the_Bruce_and_the_struggle_for_Sco...
Wikisourcers currently have no way to deal with these cases in a clean way. I have written a patch for this (the code is here : http://dpaste.org/QOMH/ ). This patch extends the "ref" syntax by adding a "follow" parameter, like this :
<ref follow="foo">bar</ref>
After two pages are transcluded, the wikitext passed to the parser will look like this :
blah blah blah blah blah blah<ref name="note1">beginning of note 1</ref> blah blah blah blah blah blah blah blah blah<ref follow="note1">end of note</ref> blah blah blah
This wikitext is rendered as a single footnote, located in the text at the position of the parent<ref>. If the parent<ref> is not found (as is the case when you render only the second page), then the text inside the tag is rendered at the beginning of the list of references, with no number and no link.
does this make sense ?
Thomas
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
"Neil Kandalgaonkar" neilk@wikimedia.org wrote in message news:4C62E7E2.3040904@wikimedia.org...
For obvious reasons only consecutive pages should be allowed here.
Not true. Sometimes a footnote will start on one page, and the next page will have a full-page illustration, and then the footnote will continue on the subsequent pages. In books where there all the photgraphic plates are next to each other (which is very common, due to the printing/binding process) the gap could be of an arbitrary length.
- Mark Clements (HappyDog)
On Wed, Aug 11, 2010 at 11:43 AM, ThomasV thomasV1@gmx.de wrote:
Here is an illustration of the problem : http://en.wikisource.org/wiki/Page:Robert_the_Bruce_and_the_struggle_for_Sco...
On the bottom of the scan you can see the second half of a footnote. That footnote begins at the previous page : http://en.wikisource.org/wiki/Page:Robert_the_Bruce_and_the_struggle_for_Sco...
This seems like a very weird way to do things. Why is the book being split up by page to begin with? For optimal reading, you should put a lot more than one book-page's worth of content on each web page. It's hard to say what an appropriate fix is if I don't know why this is being done to begin with.
Is the idea is that the pages should later be transcluded into one big page, and they're only temporarily on separate pages for proofreading purposes? If so, why not just have the extension that displays the wikitext and Djvu pages side-by-side (ProofreadPage?) display a bunch of pages at once? You could then put all indivisible content on the page where it begins, so put the full ref text on the first page. And likewise you could put a word that's hyphenated across pages on the page where it begins. If you can see multiple pages at once, this isn't much harder to proofread, since you can just look down a bit.
This seems like a very weird way to do things. Why is the book being split up by page to begin with? For optimal reading, you should put a lot more than one book-page's worth of content on each web page. It's hard to say what an appropriate fix is if I don't know why this is being done to begin with.
This is being done in order to proofread the text, and in order to make it more trustable than text without scan.
Is the idea is that the pages should later be transcluded into one big page, and they're only temporarily on separate pages for proofreading purposes? If so, why not just have the extension that displays the wikitext and Djvu pages side-by-side (ProofreadPage?) display a bunch of pages at once?
yes, this is precisely what the extension does. It has a tag that transcludes a contiguous list of pages. Single pages displayed with scans are used for proofreading, but they are not the final result.
You could then put all indivisible content on the page where it begins, so put the full ref text on the first page.
This is one of the workarounds that we have been using ; in the example that I have posted, you can see that we did exactly that.
However, this solution is not satisfying, and it is too difficult for many users. Even without this multiple pages problem, it is difficult to convince contributors to use the <ref> tag, because it implies to move the footnote's text away from its original location (remember that we start from OCR text). Now, when they see a footnote that spans over two or more pages, they tend to refuse <ref>, and to favor templates instead, combined with section transclusion (which makes the transclusion work very complicated)
Some contributors have designed tricks that combine the <ref> tag with section transclusion, in order to leave the footnote text in front of the scan. However, this results in complicated formulas, that are unacceptably difficult for most users.
If you can see multiple pages at once, this isn't much harder to proofread, since you can just look down a bit.
I guess you also mean "see multiple scans at once". This would be another workaround, but it would not deter contributors from trying to leave the text where it was in the first place, and where we think it belongs.
We have been dealing with this problem for several years now, and all the solutions that we have found have drawbacks. I do not think that we can solve this without extending the tool that manages references.
Thomas
On 08/11/2010 09:46 PM, Aryeh Gregor wrote:
This seems like a very weird way to do things. Why is the book being split up by page to begin with? For optimal reading, you should put a lot more than one book-page's worth of content on each web page.
ThomasV will give the introduction to ProofreadPage and its purpose. I will take a step back. A book is typically 40-400 pages, because that is how much you can comfortably bind in one volume (one spine) and sell as a commercial product. A web 1.0 (plain HTML + HTTP) page is typically a smaller chunk of information, say 1-100 kbytes. To match (either in Wikisource or Wikibooks) the idea of a book with web technology, the book needs to split up, either according to physical book pages (Wikisource with the ProofreadPage extension) or chapters (Wikisource without ProofreadPage or Wikibooks).
In either case, the indiviual pages have a sequential relationship. If you print the pages, you can glue them together and the sequence makes sense, which is not the case with Wikipedia. Such pages have links to the previous and next page in sequence (which Wikipedia articles don't have).
Wikipedia, Wikibooks and Wikisource mostly use web 1.0 technology. A very different approach to web browsing was taken when Google Maps was launched in 2005, the poster project for the "web 2.0". You arrive at the map site with a coordinate. From there, you can pan in any direction and new parts of the map (called "tiles") are downloaded by advanced JavaScript and XML (AJAX) calls as you go. Your browser will never hold the entire map. It doesn't matter how big the entire map is, just like it doesn't matter how big the entire Wikipedia website is. The unit of information to fetch is the "tile", just like the web 1.0 unit was the HTML page.
If we applied this web 2.0 principle to Wikibooks and Wikisource, we wouldn't need to have pages with previous/next links. We could just have smooth, continuous scrolling in one long sequence. Readers could still arrive at a given coordinate (chapter or page), but continue from there in any direction.
Examples of such user interfaces for books are Google Books and the Internet Archive online reader. You can link to page 14 like this: http://books.google.com/books?id=Z_ZLAAAAMAAJ&pg=PA14 and then scroll up (to page 13) or down (to page 15). The whole book is never in your browser. New pages are AJAX loaded as they are needed. It's like Google maps except that you can only pan in two directions (one dimensions), not in the four cardinal directions. And the zoom is more primitive here. After you have scrolled to page 19, you need to use the "Link" tool to know the new URL to link to.
At the Internet Archive, the user interface is similar, but the URL in your browser is updated as you scroll (for better or worse), http://www.archive.org/stream/devisesetembleme00lafeu#page/58/mode/1up
If we only have scanned images of book pages, this is simple enough, because each scanned image is like a "tile" in Google maps. But in Wikisource, we have also run OCR software to extract a text layer for each page, and we have proofread that text to make it searchable. I still have not learned JavaScript, but I guess you could make AJAX calls for a chunk of text and add that to the scrollable web page, just like you can add tiled images. Google has not done this, however. If you switch to "plain text" viewing mode, http://books.google.com/books?pg=PA14&id=Z_ZLAAAAMAAJ&output=text you get traditional web 1.0 "pages" with links to the previous and next web page. (Each of Google's text pages contains text from 5 book pages, e.g. page 11-15, only to make things more confusing.)
But the real challenge comes when you want to wiki edit one such chunk of scrollable text. I think it could work similar to our section editing of a long Wikipedia article. But to be really elegant, I should be able, when editing a section, to scroll up or down beyond the current section, in an eternal textarea.
If we can solve this, "section editing 2.0" that goes outside of the box (or maybe we should skip directly to WYSIWYG editing), then we can have the beginning of a whole new Wikisource interface.
On 13 August 2010 10:27, Lars Aronsson lars@aronsson.se wrote: ...
If we applied this web 2.0 principle to Wikibooks and Wikisource, we wouldn't need to have pages with previous/next links. We could just have smooth, continuous scrolling in one long sequence. Readers could still arrive at a given coordinate (chapter or page), but continue from there in any direction.
Examples of such user interfaces for books are Google Books and the Internet Archive online reader. You can link to page 14 like this: http://books.google.com/books?id=Z_ZLAAAAMAAJ&pg=PA14 and then scroll up (to page 13) or down (to page 15). The whole book is never in your browser. New pages are AJAX loaded as they are needed.
You are not thinking "web" here.
The "web" way to solve a problem like easy access to "next page" or "different chapters" is to have a "next page" link or have all the chapters as tabs, or something like that. Make the wiki aware of the structure of a book, and make it render these nextpage link / chapters tabs.
Web 2.0 is obsolete now, the future is Web 3.5 ( CSS3, HTML5) (-:
On 13 August 2010 12:23, Tei oscar.vives@gmail.com wrote:
On 13 August 2010 10:27, Lars Aronsson lars@aronsson.se wrote: ...
If we applied this web 2.0 principle to Wikibooks and Wikisource, we wouldn't need to have pages with previous/next links. We could just have smooth, continuous scrolling in one long sequence. Readers could still arrive at a given coordinate (chapter or page), but continue from there in any direction.
Examples of such user interfaces for books are Google Books and the Internet Archive online reader. You can link to page 14 like this: http://books.google.com/books?id=Z_ZLAAAAMAAJ&pg=PA14 and then scroll up (to page 13) or down (to page 15). The whole book is never in your browser. New pages are AJAX loaded as they are needed.
You are not thinking "web" here.
The "web" way to solve a problem like easy access to "next page" or "different chapters" is to have a "next page" link or have all the chapters as tabs, or something like that. Make the wiki aware of the structure of a book, and make it render these nextpage link / chapters tabs.
Web 2.0 is obsolete now, the future is Web 3.5 ( CSS3, HTML5) (-:
What you suggest seems already implemented in some browsers: http://www.apple.com/safari/whats-new.html#reader
It create a distraction free enviroment to read and only read. Much like a ebook reader on your comp! :-)
On Fri, Aug 13, 2010 at 07:23, Tei oscar.vives@gmail.com wrote:
On 13 August 2010 10:27, Lars Aronsson lars@aronsson.se wrote: ...
If we applied this web 2.0 principle to Wikibooks and Wikisource, we wouldn't need to have pages with previous/next links. We could just have smooth, continuous scrolling in one long sequence. Readers could still arrive at a given coordinate (chapter or page), but continue from there in any direction.
Examples of such user interfaces for books are Google Books and the Internet Archive online reader. You can link to page 14 like this: http://books.google.com/books?id=Z_ZLAAAAMAAJ&pg=PA14 and then scroll up (to page 13) or down (to page 15). The whole book is never in your browser. New pages are AJAX loaded as they are needed.
You are not thinking "web" here.
The "web" way to solve a problem like easy access to "next page" or "different chapters" is to have a "next page" link or have all the chapters as tabs, or something like that. Make the wiki aware of the structure of a book, and make it render these nextpage link / chapters tabs.
Well, "to make the wiki aware of the structure of a book" is essentially what is requested in bug 15071 [1], which is open since 2008 and blocking 6 other requests which would solve Wikisource/Wikibooks (but non-Wikipedia) specific issues...
Helder
[1] Wikibooks/Wikisource needs means to associate separate pages with books: https://bugzilla.wikimedia.org/show_bug.cgi?id=15071
Interesting.
You might want to have a look at Microsoft's Seadragon technology : http://www.dailymotion.com/video/x2738e_sea-dragon-and-photosynth-demo_tech (check at 1min20s if you don't want to watch the whole video)
Now, getting back to your proposal : A javascript interface similar to the ones at IA or Google Books, that downloads only the few scans that need to be shown to the user, would be fairly easy to write using the API. We could even do it for text, as long as it is rendered as well-separated physical pages.
However, it would be more complicated to apply the same principle if text is to be rendered without page separations, and preserving its logical structure. We would need to either pre-parse the whole document and develop an API that lets us download small bits of it, or to parse the current page together with previous and next pages. I am not sure if it is really worth the effort ; the bandwidth saving would be less significant than for scans.
Thomas
Lars Aronsson a écrit :
On 08/11/2010 09:46 PM, Aryeh Gregor wrote:
This seems like a very weird way to do things. Why is the book being split up by page to begin with? For optimal reading, you should put a lot more than one book-page's worth of content on each web page.
ThomasV will give the introduction to ProofreadPage and its purpose. I will take a step back. A book is typically 40-400 pages, because that is how much you can comfortably bind in one volume (one spine) and sell as a commercial product. A web 1.0 (plain HTML + HTTP) page is typically a smaller chunk of information, say 1-100 kbytes. To match (either in Wikisource or Wikibooks) the idea of a book with web technology, the book needs to split up, either according to physical book pages (Wikisource with the ProofreadPage extension) or chapters (Wikisource without ProofreadPage or Wikibooks).
In either case, the indiviual pages have a sequential relationship. If you print the pages, you can glue them together and the sequence makes sense, which is not the case with Wikipedia. Such pages have links to the previous and next page in sequence (which Wikipedia articles don't have).
Wikipedia, Wikibooks and Wikisource mostly use web 1.0 technology. A very different approach to web browsing was taken when Google Maps was launched in 2005, the poster project for the "web 2.0". You arrive at the map site with a coordinate. From there, you can pan in any direction and new parts of the map (called "tiles") are downloaded by advanced JavaScript and XML (AJAX) calls as you go. Your browser will never hold the entire map. It doesn't matter how big the entire map is, just like it doesn't matter how big the entire Wikipedia website is. The unit of information to fetch is the "tile", just like the web 1.0 unit was the HTML page.
If we applied this web 2.0 principle to Wikibooks and Wikisource, we wouldn't need to have pages with previous/next links. We could just have smooth, continuous scrolling in one long sequence. Readers could still arrive at a given coordinate (chapter or page), but continue from there in any direction.
Examples of such user interfaces for books are Google Books and the Internet Archive online reader. You can link to page 14 like this: http://books.google.com/books?id=Z_ZLAAAAMAAJ&pg=PA14 and then scroll up (to page 13) or down (to page 15). The whole book is never in your browser. New pages are AJAX loaded as they are needed. It's like Google maps except that you can only pan in two directions (one dimensions), not in the four cardinal directions. And the zoom is more primitive here. After you have scrolled to page 19, you need to use the "Link" tool to know the new URL to link to.
At the Internet Archive, the user interface is similar, but the URL in your browser is updated as you scroll (for better or worse), http://www.archive.org/stream/devisesetembleme00lafeu#page/58/mode/1up
If we only have scanned images of book pages, this is simple enough, because each scanned image is like a "tile" in Google maps. But in Wikisource, we have also run OCR software to extract a text layer for each page, and we have proofread that text to make it searchable. I still have not learned JavaScript, but I guess you could make AJAX calls for a chunk of text and add that to the scrollable web page, just like you can add tiled images. Google has not done this, however. If you switch to "plain text" viewing mode, http://books.google.com/books?pg=PA14&id=Z_ZLAAAAMAAJ&output=text you get traditional web 1.0 "pages" with links to the previous and next web page. (Each of Google's text pages contains text from 5 book pages, e.g. page 11-15, only to make things more confusing.)
But the real challenge comes when you want to wiki edit one such chunk of scrollable text. I think it could work similar to our section editing of a long Wikipedia article. But to be really elegant, I should be able, when editing a section, to scroll up or down beyond the current section, in an eternal textarea.
If we can solve this, "section editing 2.0" that goes outside of the box (or maybe we should skip directly to WYSIWYG editing), then we can have the beginning of a whole new Wikisource interface.
On Fri, Aug 13, 2010 at 4:27 AM, Lars Aronsson lars@aronsson.se wrote:
Wikipedia, Wikibooks and Wikisource mostly use web 1.0 technology. A very different approach to web browsing was taken when Google Maps was launched in 2005, the poster project for the "web 2.0". You arrive at the map site with a coordinate. From there, you can pan in any direction and new parts of the map (called "tiles") are downloaded by advanced JavaScript and XML (AJAX) calls as you go. Your browser will never hold the entire map. It doesn't matter how big the entire map is, just like it doesn't matter how big the entire Wikipedia website is. The unit of information to fetch is the "tile", just like the web 1.0 unit was the HTML page.
I have doubts about whether this is the right approach for books. Offering the book as plain HTML pages, one for each chapter and also one for the whole book (for printing and searching), seems more useful. Browsers can cope with such long pages just fine, and it preserves all the functionality people are used to: links work as expected, back and forward work as expected, all without extra work on our part. This isn't an option for Google Maps, because
1) They're dealing with way more data. They can't possibly send you a map of the whole world in full detail on every page load. It would be many megabytes even compressed.
2) The page model doesn't fit their needs. Even if they served the whole map, they'd need complicated JavaScript to have it scroll and zoom and so forth as the user expects. This isn't needed for transcribed text, and trying to reimplement all the scrolling/bookmarking/navigation/search/etc. functionality that users are used to for regular web pages would be counterproductive.
Traditional web pages are designed to present formatted text, possibly with some interspersed images, without particular layout precision. Anything in that format is probably best distributed as plain HTML, not some fancy web app thing. Not to mention, the latter is a lot more work to implement.
On 08/13/2010 07:36 PM, Aryeh Gregor wrote:
I have doubts about whether this is the right approach for books. Offering the book as plain HTML pages, one for each chapter and also one for the whole book (for printing and searching), seems more useful. Browsers can cope with such long pages just fine,
One web page per chapter, yes, but not for whole books, especially not for the thicker and larger books. Web pages beyond 100 kbytes still load slowly, especially when you're on a wireless network in a crowded conference room. The problem is, after you scan a book you only know where the physical pages begin and end. The chapter boundaries can only be detected by manual proofreading and markup. The sequence from beginning to end of the book is the same for both pages and chapters (except for complicated cases with footnotes, as discussed recently). A smooth web 2.0, map-style scrolling through that sequence can be a way to overcome the delay between fast mechanical scanning and slow manual proofreading.
Why not make a demo? I think this idea has come up a couple of times here in the last year. People find it easy to argue about mere proposals but an actual demo gives people a vision of what you are going for. Just look at what's happened with WYSIWYG just in the last week.
OpenLayers seems like a good place to start although it's obviously more designed for maps.
http://wiki.openstreetmap.org/wiki/OpenLayers
On 8/13/10 11:36 AM, Lars Aronsson wrote:
On 08/13/2010 07:36 PM, Aryeh Gregor wrote:
I have doubts about whether this is the right approach for books. Offering the book as plain HTML pages, one for each chapter and also one for the whole book (for printing and searching), seems more useful. Browsers can cope with such long pages just fine,
One web page per chapter, yes, but not for whole books, especially not for the thicker and larger books. Web pages beyond 100 kbytes still load slowly, especially when you're on a wireless network in a crowded conference room. The problem is, after you scan a book you only know where the physical pages begin and end. The chapter boundaries can only be detected by manual proofreading and markup. The sequence from beginning to end of the book is the same for both pages and chapters (except for complicated cases with footnotes, as discussed recently). A smooth web 2.0, map-style scrolling through that sequence can be a way to overcome the delay between fast mechanical scanning and slow manual proofreading.
There you go then: http://toolserver.org/~magnus/book2scroll/index.html
Just one demo book for the moment, but it'll be easy to allow all wikisource books tomorrow, when I have had some sleep...
(page jumping is broken too, but scrolling works just fine...)
Cheers, Magnus
On Fri, Aug 13, 2010 at 7:59 PM, Neil Kandalgaonkar neilk@wikimedia.org wrote:
Why not make a demo? I think this idea has come up a couple of times here in the last year. People find it easy to argue about mere proposals but an actual demo gives people a vision of what you are going for. Just look at what's happened with WYSIWYG just in the last week.
OpenLayers seems like a good place to start although it's obviously more designed for maps.
http://wiki.openstreetmap.org/wiki/OpenLayers
On 8/13/10 11:36 AM, Lars Aronsson wrote:
On 08/13/2010 07:36 PM, Aryeh Gregor wrote:
I have doubts about whether this is the right approach for books. Offering the book as plain HTML pages, one for each chapter and also one for the whole book (for printing and searching), seems more useful. Browsers can cope with such long pages just fine,
One web page per chapter, yes, but not for whole books, especially not for the thicker and larger books. Web pages beyond 100 kbytes still load slowly, especially when you're on a wireless network in a crowded conference room. The problem is, after you scan a book you only know where the physical pages begin and end. The chapter boundaries can only be detected by manual proofreading and markup. The sequence from beginning to end of the book is the same for both pages and chapters (except for complicated cases with footnotes, as discussed recently). A smooth web 2.0, map-style scrolling through that sequence can be a way to overcome the delay between fast mechanical scanning and slow manual proofreading.
-- Neil Kandalgaonkar ) neilk@wikimedia.org
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org