Hi all,
after some discussion on wikitech-l, I made a Google Books-like display demo for wikisource content. It should work on any multipage djvu or PDF. To link to it, you'll need: * The file name for the original (needs to be on commons) * The total number of pages (couldn't find a way to get that automatically anywhere...) * The page to start on
Thus armed, you can construct a URL like this: http://toolserver.org/~magnus/book2scroll/index.html?file=Transactions_of_th...
The default parameter-less URL will fall back on the DNB vol. 11: http://toolserver.org/~magnus/book2scroll/index.html
Note that this is HTML/CSS/JS only; no toolserver backend script/database is involved.
Awaiting onslaught of critique, Magnus
2010/8/14 Magnus Manske magnusmanske@googlemail.com:
Hi all,
after some discussion on wikitech-l, I made a Google Books-like display demo for wikisource content.
Please do not confuse en Wikisource with Wikisource. BTW: I don't understand the sense of this viewer.
Klaus Graf
On Sat, Aug 14, 2010 at 7:26 PM, Klaus Graf klausgraf@googlemail.com wrote:
2010/8/14 Magnus Manske magnusmanske@googlemail.com:
Hi all,
after some discussion on wikitech-l, I made a Google Books-like display demo for wikisource content.
Please do not confuse en Wikisource with Wikisource.
Is that your way of saying "please enable other languages"?
BTW: I don't understand the sense of this viewer.
To browse books more fluently.
Cheers, Magnus
nice demo ; it would be nice to make it more efficient.
I see that you are generating images with a size that is adapted to the user's screen. If you do this, hundreds of thumbnails of all possible sizes will be generated at commons for each page. To avoid this, you should quantize the size : use a width that is a multiple of 100 pixels, as ProofreadPage does (Tim asked me to do this). In addition, if you use this restricted set of widths, then thumbnails will be more likely to already exist and will load faster.
To gain some speed, you could also preload pages p+1 and p-1, as google books does.
Also, in on_body_scroll, you could avoid the for loop : divide $('#body').position()['scrollTop'] by the height of an image
Thomas
-------- Original-Nachricht --------
Datum: Sat, 14 Aug 2010 19:15:47 +0100 Von: Magnus Manske magnusmanske@googlemail.com An: wikisource-l@lists.wikimedia.org Betreff: [Wikisource-l] Back to the Scroll
Hi all,
after some discussion on wikitech-l, I made a Google Books-like display demo for wikisource content. It should work on any multipage djvu or PDF. To link to it, you'll need:
- The file name for the original (needs to be on commons)
- The total number of pages (couldn't find a way to get that
automatically anywhere...)
- The page to start on
Thus armed, you can construct a URL like this: http://toolserver.org/~magnus/book2scroll/index.html?file=Transactions_of_th...
The default parameter-less URL will fall back on the DNB vol. 11: http://toolserver.org/~magnus/book2scroll/index.html
Note that this is HTML/CSS/JS only; no toolserver backend script/database is involved.
Awaiting onslaught of critique, Magnus
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
On Sat, Aug 14, 2010 at 8:00 PM, thomasV1@gmx.de wrote:
nice demo ; it would be nice to make it more efficient.
I see that you are generating images with a size that is adapted to the user's screen. If you do this, hundreds of thumbnails of all possible sizes will be generated at commons for each page. To avoid this, you should quantize the size : use a width that is a multiple of 100 pixels, as ProofreadPage does (Tim asked me to do this). In addition, if you use this restricted set of widths, then thumbnails will be more likely to already exist and will load faster.
To gain some speed, you could also preload pages p+1 and p-1, as google books does.
Both good ideas, I'll do that.
Also, in on_body_scroll, you could avoid the for loop : divide $('#body').position()['scrollTop'] by the height of an image
'fraid not - sometimes the rendered text runs longer than the image, so the "row" can be higher than the image. Example: http://toolserver.org/~magnus/book2scroll/index.html (scroll down and you'll see it)
Cheers, Magnus
Also, in on_body_scroll, you could avoid the for loop : divide
$('#body').position()['scrollTop'] by the height of an image
'fraid not - sometimes the rendered text runs longer than the image, so the "row" can be higher than the image. Example: http://toolserver.org/~magnus/book2scroll/index.html (scroll down and you'll see it)
hmm, you are right ; I had a "pure scan" version in mind.
But it would be nice to have a version that does not load the text, just in order to see if the WMF servers are fast enough to provide the same fluidity as in the Google Books interface.
For the size quantization, I think it is better to request a desired width than a desired height ; the API does not exactly give you the height you request. In addition, if you quantize the width you will be likely to request thumbs that are already created by ProofreadPage.
Also, for the text, I just had a crazy idea : instead of requesting the text of each page, you can do a single request for the whole book, using &action=parse (pass the <pages/> command to it, as in this script : http://wikisource.org/wiki/MediaWiki:Dictionary.js ).
Then we can split the returned string with a regexp that detects the page breaks (they are in a special span element), and place it in the corresponding divs ; things will break whenever a html formatting element ends on a different page than where it begins, but we could write a function that balances the missing elements.
Thomas
On Sat, Aug 14, 2010 at 8:49 PM, Thomas Voegtlin thomasV1@gmx.de wrote:
Also, in on_body_scroll, you could avoid the for loop : divide
$('#body').position()['scrollTop'] by the height of an image
'fraid not - sometimes the rendered text runs longer than the image, so the "row" can be higher than the image. Example: http://toolserver.org/~magnus/book2scroll/index.html (scroll down and you'll see it)
hmm, you are right ; I had a "pure scan" version in mind.
But it would be nice to have a version that does not load the text, just in order to see if the WMF servers are fast enough to provide the same fluidity as in the Google Books interface.
I don't think the text retrieval is the slow step here...
For the size quantization, I think it is better to request a desired width than a desired height ; the API does not exactly give you the height you request. In addition, if you quantize the width you will be likely to request thumbs that are already created by ProofreadPage.
I've switched to specifying width rounded to 100s; however, the API still gives me one-off images (599 instead of 600 px). I could hack the API thumbnail URL, though. Better yet, I can probably skip that step entirely after the first one...
Also, for the text, I just had a crazy idea : instead of requesting the text of each page, you can do a single request for the whole book, using &action=parse (pass the <pages/> command to it, as in this script : http://wikisource.org/wiki/MediaWiki:Dictionary.js ).
Then we can split the returned string with a regexp that detects the page breaks (they are in a special span element), and place it in the corresponding divs ; things will break whenever a html formatting element ends on a different page than where it begins, but we could write a function that balances the missing elements.
Why load a giant text and then hack around on broken HTML, when I can just query each page individually? It's not really slow, at least not in Google Chrome.
Meanwhile, I added a feature to hide "header elements" like the proofread line, which kind of disrupts the reading flow. There's a checkbox to toggle header display.
And for Klaus, I added de.wikisource: http://toolserver.org/~magnus/book2scroll/index.html?lang=de&numlen=3&am...
Cheers, Magnus
2010/8/15 Magnus Manske magnusmanske@googlemail.com:
And for Klaus, I added de.wikisource: http://toolserver.org/~magnus/book2scroll/index.html?lang=de&numlen=3&am...
Thank you!
Klaus Graf
On Sun, Aug 15, 2010 at 4:20 PM, Klaus Graf klausgraf@googlemail.com wrote:
2010/8/15 Magnus Manske magnusmanske@googlemail.com:
And for Klaus, I added de.wikisource: http://toolserver.org/~magnus/book2scroll/index.html?lang=de&numlen=3&am...
Thank you!
And now with "search in this book" function (abusing page search through toolserver), with in-text highlighting, page markers, etc. ! :-)
Cheers, Magnus
Magnus Manske a écrit :
On Sat, Aug 14, 2010 at 8:49 PM, Thomas Voegtlin thomasV1@gmx.de wrote:
Also, in on_body_scroll, you could avoid the for loop : divide
$('#body').position()['scrollTop'] by the height of an image
'fraid not - sometimes the rendered text runs longer than the image, so the "row" can be higher than the image. Example: http://toolserver.org/~magnus/book2scroll/index.html (scroll down and you'll see it)
hmm, you are right ; I had a "pure scan" version in mind.
But it would be nice to have a version that does not load the text, just in order to see if the WMF servers are fast enough to provide the same fluidity as in the Google Books interface.
I don't think the text retrieval is the slow step here...
No, but the for loop in the scroll handler makes it a bit slow.
Another problem occurs when you are viewing page p, and when p-1 is not loaded yet : if you scroll up, at the moment where p-1 is loaded, the size of its container div increases, and the text you are viewing (page p) is pushed towards the bottom. On the Dictionary of National Biography this offset can be quite big, so you lose track of the text you are viewing.
I don't really know how to solve this ; but it seems to me that using divs with variable size is part of the problem here too.
I've switched to specifying width rounded to 100s; however, the API still gives me one-off images (599 instead of 600 px). I could hack the API thumbnail URL, though. Better yet, I can probably skip that step entirely after the first one...
I can see that too (599 instead of 600); but that's not a problem, because the filename does not change, it is "600px-"
Why load a giant text and then hack around on broken HTML, when I can just query each page individually? It's not really slow, at least not in Google Chrome.
oh, that was in order to display the text without headers, footers and page breaks ; but I guess it's ok to show headers, because they are in the scans too. (here I'm not talking about the headers that you hide with your button ; I mean the other elements that are in this field : running title, references, etc.)
Thomas
On Sun, Aug 15, 2010 at 6:46 PM, ThomasV thomasV1@gmx.de wrote:
Magnus Manske a écrit :
On Sat, Aug 14, 2010 at 8:49 PM, Thomas Voegtlin thomasV1@gmx.de wrote:
Also, in on_body_scroll, you could avoid the for loop : divide
$('#body').position()['scrollTop'] by the height of an image
'fraid not - sometimes the rendered text runs longer than the image, so the "row" can be higher than the image. Example: http://toolserver.org/~magnus/book2scroll/index.html (scroll down and you'll see it)
hmm, you are right ; I had a "pure scan" version in mind.
But it would be nice to have a version that does not load the text, just in order to see if the WMF servers are fast enough to provide the same fluidity as in the Google Books interface.
I don't think the text retrieval is the slow step here...
No, but the for loop in the scroll handler makes it a bit slow.
Another problem occurs when you are viewing page p, and when p-1 is not loaded yet : if you scroll up, at the moment where p-1 is loaded, the size of its container div increases, and the text you are viewing (page p) is pushed towards the bottom. On the Dictionary of National Biography this offset can be quite big, so you lose track of the text you are viewing.
I don't really know how to solve this ; but it seems to me that using divs with variable size is part of the problem here too.
I tried to solve that by fixing the div height to the same as the image div and using overflow-y to have per-page "sub-scroll". However, it does not seem to work with divs "display:table-cell", and altering that breaks the entire layout. I suppose I could go back to good 'ol table, but that would be a shame...
I've switched to specifying width rounded to 100s; however, the API still gives me one-off images (599 instead of 600 px). I could hack the API thumbnail URL, though. Better yet, I can probably skip that step entirely after the first one...
I can see that too (599 instead of 600); but that's not a problem, because the filename does not change, it is "600px-"
Why load a giant text and then hack around on broken HTML, when I can just query each page individually? It's not really slow, at least not in Google Chrome.
oh, that was in order to display the text without headers, footers and page breaks ; but I guess it's ok to show headers, because they are in the scans too. (here I'm not talking about the headers that you hide with your button ; I mean the other elements that are in this field : running title, references, etc.)
Yes, I know what you mean, but they're not really in that way...
Anyway, I've added a permalink.
Also, I've fiddled with the scrolling for loop; it should be much quicker now.
Cheers, Magnus
Last one for today : Automatic retrieval of max page number and "number length" (e.g. "001" instead of just "1"). Manual parameters will override (and save a query).
Cheers, Magnus
Thanks!
I just tried to build a template like this:
[[File:Library-logo-blue-outline.png|30px|link= http://toolserver.org/~magnus/book2scroll/index.html ?lang=it&file={{urlencode:{{PAGENAME}}}}]], to be used into Index: pages.
But urlencode converts spaces into +, where your script doesn't like the output of urlencode... it only likes names where spaces are replaced by underscores.
How can I obtain this transformation by a parser function/by a template? Or: can you modify the scripts, so that url encoded titles are accepted too?
I apologyze, if my question is banal.
Alex
On Mon, Aug 16, 2010 at 1:02 PM, Alex Brollo alex.brollo@gmail.com wrote:
Thanks!
I just tried to build a template like this:
[[File:Library-logo-blue-outline.png|30px|link=http://toolserver.org/~magnus/book2scroll/index.html ?lang=it&file={{urlencode:{{PAGENAME}}}}]], to be used into Index: pages.
But urlencode converts spaces into +, where your script doesn't like the output of urlencode... it only likes names where spaces are replaced by underscores.
How can I obtain this transformation by a parser function/by a template? Or: can you modify the scripts, so that url encoded titles are accepted too?
I apologyze, if my question is banal.
Not at all, though http://toolserver.org/~magnus/book2scroll/index.html?lang=it&file=De%27+... seems to work fine. You can also try {{PAGENAMEE}} (note the two E).
Note that sometimes, the Index: name is not the same at the actual .djvu file, so you should allow attribute {{{1}}} to be the filename (PAGENAME by default), and maybe the start page as {{{2}}}, default "1". That would also allow for the template to be used anywhere, not just on the index page.
Cheers, Magnus
2010/8/16 Magnus Manske magnusmanske@googlemail.com
Not at all, though
http://toolserver.org/~magnus/book2scroll/index.html?lang=it&file=De%27+...http://toolserver.org/%7Emagnus/book2scroll/index.html?lang=it&file=De%27+matematici+italiani+anteriori+all%27invenzione+della+stampa.djvu seems to work fine. You can also try {{PAGENAMEE}} (note the two E).
My question was definitely banal. With {{PAGENAMEE}} the template http://it.wikisource.org/wiki/Template:BackToScroll runs perfectly: see http://it.wikisource.org/wiki/Indice:Rime_%28Vittorelli%29.djvu . Thanks Magnus (even if some of our it.source users poined my mistake, I appreciated a lot your attention and suggestions!)
Alex
On Mon, Aug 16, 2010 at 5:35 PM, Alex Brollo alex.brollo@gmail.com wrote:
2010/8/16 Magnus Manske magnusmanske@googlemail.com
Not at all, though
http://toolserver.org/~magnus/book2scroll/index.html?lang=it&file=De%27+... seems to work fine. You can also try {{PAGENAMEE}} (note the two E).
My question was definitely banal. With {{PAGENAMEE}} the template http://it.wikisource.org/wiki/Template:BackToScroll runs perfectly: see http://it.wikisource.org/wiki/Indice:Rime_%28Vittorelli%29.djvu . Thanks Magnus (even if some of our it.source users poined my mistake, I appreciated a lot your attention and suggestions!)
Nice! Thanks!
Magnus
Hi Magnus,
The tool has choked on something for http://toolserver.org/~magnus/book2scroll/index.html?lang=en&file=Mrs_Ca... might it be the apostrophe in the url? This url just shows the one page, and no subsequent pages, and that is for whichever starting page one feeds into the url.
Regards Andrew
On 16 Aug 2010 at 13:22, Magnus Manske wrote:
On Mon, Aug 16, 2010 at 1:02 PM, Alex Brollo alex.brollo@gmail.com wrote:
Thanks!
I just tried to build a template like this:
[[File:Library-logo-blue-outline.png|30px|link=http://toolserver.org/~magnus/book2scroll/index.html ?lang=it&file={{urlencode:{{PAGENAME}}}}]], to be used into Index: pages.
But urlencode converts spaces into +, where your script doesn't like the output of urlencode... it only likes names where spaces are replaced by underscores.
How can I obtain this transformation by a parser function/by a template? Or: can you modify the scripts, so that url encoded titles are accepted too?
I apologyze, if my question is banal.
Not at all, though http://toolserver.org/~magnus/book2scroll/index.html?lang=it&file=De%27+... seems to work fine. You can also try {{PAGENAMEE}} (note the two E).
Note that sometimes, the Index: name is not the same at the actual .djvu file, so you should allow attribute {{{1}}} to be the filename (PAGENAME by default), and maybe the start page as {{{2}}}, default "1". That would also allow for the template to be used anywhere, not just on the index page.
Cheers, Magnus
2010/8/17 Billinghurst billinghurst@gmail.com
Hi Magnus,
The tool has choked on something for
http://toolserver.org/~magnus/book2scroll/index.html?lang=en&file=Mrs_Ca...http://toolserver.org/%7Emagnus/book2scroll/index.html?lang=en&file=Mrs_Caudle%2527s_curtain_lectures.djvu&startpage=1 might it be the apostrophe in the url? This url just shows the one page, and no subsequent pages, and that is for whichever starting page one feeds into the url.
Regards Andrew
I'd got a similar problem, solved simply *avoiding urlencode* and using the plain output of {{PAGENAMEE}} or similar variables.
http://toolserver.org/~magnus/book2scroll/index.html?lang=en&file=Mrs_Ca...http://toolserver.org/%7Emagnus/book2scroll/index.html?lang=en&file=Mrs_Caudle%27s_curtain_lectures.djvu&startpage=1runs.
Alex
2010/8/17 Alex Brollo alex.brollo@gmail.com
Marcus, a challenge for you...
http://toolserver.org/~magnus/book2scroll/index.html?lang=it&file=Hymnus...http://toolserver.org/%7Emagnus/book2scroll/index.html?lang=it&file=Hymnus_in_Romam.djvu&startpage=9
This is a text from it.source using {{Iwpage}} ThomasV's trick, linking many pages into la.source . It would be great to see the content coming from Iwpage interwiki transclusion!
Alex
On Tue, Aug 17, 2010 at 2:52 PM, Alex Brollo alex.brollo@gmail.com wrote:
2010/8/17 Alex Brollo alex.brollo@gmail.com
Marcus, a challenge for you...
http://toolserver.org/~magnus/book2scroll/index.html?lang=it&file=Hymnus...
This is a text from it.source using {{Iwpage}} ThomasV's trick, linking many pages into la.source . It would be great to see the content coming from Iwpage interwiki transclusion!
I'm not sure how to detect interwiki transclusion universally (in all languages) with a single system, so I fall back to "manual": http://toolserver.org/~magnus/book2scroll/index.html?lang=it&file=Hymnus...
Cheers, Magnus
On Tue, Aug 17, 2010 at 3:34 PM, Magnus Manske magnusmanske@googlemail.com wrote:
On Tue, Aug 17, 2010 at 2:52 PM, Alex Brollo alex.brollo@gmail.com wrote:
2010/8/17 Alex Brollo alex.brollo@gmail.com
Marcus, a challenge for you...
http://toolserver.org/~magnus/book2scroll/index.html?lang=it&file=Hymnus...
This is a text from it.source using {{Iwpage}} ThomasV's trick, linking many pages into la.source . It would be great to see the content coming from Iwpage interwiki transclusion!
I'm not sure how to detect interwiki transclusion universally (in all languages) with a single system, so I fall back to "manual": http://toolserver.org/~magnus/book2scroll/index.html?lang=it&file=Hymnus...
ARGH! Only /some/ of the pages are transcluded!
On Tue, Aug 17, 2010 at 2:52 PM, Alex Brollo alex.brollo@gmail.com wrote:
2010/8/17 Alex Brollo alex.brollo@gmail.com
Marcus, a challenge for you...
http://toolserver.org/~magnus/book2scroll/index.html?lang=it&file=Hymnus...
This is a text from it.source using {{Iwpage}} ThomasV's trick, linking many pages into la.source . It would be great to see the content coming from Iwpage interwiki transclusion!
Now reloads transcluded pages from the correct language automatically, no textlang parameter neccessary!
Magnus
2010/8/17 Magnus Manske magnusmanske@googlemail.com
Now reloads transcluded pages from the correct language automatically, no textlang parameter neccessary!
Magnus
Great! :-) Alex
wikisource-l@lists.wikimedia.org