Just a reminder.... this is in about 11 hours :)
Philippe
On Aug 30, 2010, at 10:08 AM, Philippe Beaudette wrote:
> Hi all,
>
> Sue Gardner, the Executive Director of the Wikimedia Foundation,
> will be having office hours this Tuesday (Aug 31) at 23:00 UTC
> (16:00 PT, 19:00 ET) on IRC in #wikimedia-office.
>
> If you do not have an IRC client, there are two ways you can come chat
> using a web browser: First is using the Wikizine chat gateway at
> <http://chatwikizine.memebot.com/cgi-bin/cgiirc/irc.cgi>. Type a
> nickname, select irc.freenode.net from the top menu and
> #wikimedia-office from the following menu, then login to join.
>
> Also, you can access Freenode by going to http://
> webchat.freenode.net/,
> typing in the nickname of your choice and choosing wikimedia-office as
> the channel. You may be prompted to click through a security
> warning,
> which you can click to accept.
>
> Please feel free to forward (and translate!) this email to any other
> relevant email lists you happen to be on.
>
> ____________________
> Philippe Beaudette
> Head of Reader Relations
> Wikimedia Foundation
>
> philippe(a)wikimedia.org
>
> Imagine a world in which every human being can freely share in
> the sum of all knowledge. Help us make it a reality!
>
> http://wikimediafoundation.org/wiki/Donate
>>
Hi all,
Sue Gardner, the Executive Director of the Wikimedia Foundation, will
be having office hours this Thursday at 17:00 UTC (10:00 PT, 13:00 ET)
on IRC in #wikimedia-office.
If you do not have an IRC client, there are two ways you can come chat
using a web browser: First is using the Wikizine chat gateway at
<http://chatwikizine.memebot.com/cgi-bin/cgiirc/irc.cgi>. Type a
nickname, select irc.freenode.net from the top menu and
#wikimedia-office from the following menu, then login to join.
Also, you can access Freenode by going to http://webchat.freenode.net/,
typing in the nickname of your choice and choosing wikimedia-office as
the channel. You may be prompted to click through a security warning,
which you can click to accept.
Please feel free to forward (and translate!) this email to any other
relevant email lists you happen to be on.
____________________
Philippe Beaudette
Head of Reader Relations
Wikimedia Foundation
philippe(a)wikimedia.org
Imagine a world in which every human being can freely share in
the sum of all knowledge. Help us make it a reality!
http://wikimediafoundation.org/wiki/Donate
On 08/07/2010 02:23 AM, Andreas Kolbe wrote:
> Word-processing the Google output to arrive at a readable, written text creates more work than it saves.
This is where our experience differs. I'm working faster with the Google
Translator Toolkit than without.
> If Google want to build up their translation memory, I suggest they pay publishers for permission to analyse existing, published translations, and read those into their memory. This will give them a database of translations that the market judged good enough to publish, written by people who (presumably) understood the subject matter they were working in.
If we forget Google for a while, this is actually something that we could do
on our own. There are enough texts in Wikisource (out of copyright books)
that are available in more than one language. In some cases, we will run
into old spelling and use of language, but it will be better than nothing.
The result could be good input to Wiktionary.
Here is the Norwegian original of Nansen's Eskimoliv,
http://no.wikisource.org/wiki/Indeks:Nansen-Eskimoliv.djvu
And here is the Swedish translation, both from 1891,
http://sv.wikisource.org/wiki/Index:Eskimålif.djvu
Norwegian: Grønland er paa en eiendommelig vis knyttet til vort land og
folk.
Swedish: Grönland är på ett egendomligt sätt knutet till vårt land och
vårt folk.
As you can see, there is one difference already in this first
sentence: The original ends "to our country and people",
while the translation ends "to our country and our people".
Is there any good free software for aligning parallel texts and
extracting translations? Looking around, I found NAtools,
TagAligner, and Bitextor, but they require texts to be marked
up already. Are these the best and most modern tools available?
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
Hi all,
after some discussion on wikitech-l, I made a Google Books-like
display demo for wikisource content. It should work on any multipage
djvu or PDF. To link to it, you'll need:
* The file name for the original (needs to be on commons)
* The total number of pages (couldn't find a way to get that
automatically anywhere...)
* The page to start on
Thus armed, you can construct a URL like this:
http://toolserver.org/~magnus/book2scroll/index.html?file=Transactions_of_t…
The default parameter-less URL will fall back on the DNB vol. 11:
http://toolserver.org/~magnus/book2scroll/index.html
Note that this is HTML/CSS/JS only; no toolserver backend
script/database is involved.
Awaiting onslaught of critique,
Magnus
On 08/13/2010 07:36 PM, Aryeh Gregor wrote:
> I have doubts about whether this is the right approach for books.
> Offering the book as plain HTML pages, one for each chapter and also
> one for the whole book (for printing and searching), seems more
> useful. Browsers can cope with such long pages just fine,
One web page per chapter, yes, but not for whole books,
especially not for the thicker and larger books.
Web pages beyond 100 kbytes still load slowly, especially
when you're on a wireless network in a crowded conference
room. The problem is, after you scan a book you only
know where the physical pages begin and end. The chapter
boundaries can only be detected by manual proofreading
and markup. The sequence from beginning to end of the
book is the same for both pages and chapters (except for
complicated cases with footnotes, as discussed recently).
A smooth web 2.0, map-style scrolling through that sequence
can be a way to overcome the delay between fast mechanical
scanning and slow manual proofreading.
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
On 08/11/2010 09:46 PM, Aryeh Gregor wrote:
> This seems like a very weird way to do things. Why is the book being
> split up by page to begin with? For optimal reading, you should put a
> lot more than one book-page's worth of content on each web page.
ThomasV will give the introduction to ProofreadPage and its
purpose. I will take a step back. A book is typically 40-400 pages,
because that is how much you can comfortably bind in one
volume (one spine) and sell as a commercial product. A web 1.0
(plain HTML + HTTP) page is typically a smaller chunk of
information, say 1-100 kbytes. To match (either in Wikisource
or Wikibooks) the idea of a book with web technology, the book
needs to split up, either according to physical book pages
(Wikisource with the ProofreadPage extension) or chapters
(Wikisource without ProofreadPage or Wikibooks).
In either case, the indiviual pages have a sequential relationship.
If you print the pages, you can glue them together and the sequence
makes sense, which is not the case with Wikipedia. Such pages have
links to the previous and next page in sequence (which Wikipedia
articles don't have).
Wikipedia, Wikibooks and Wikisource mostly use web 1.0 technology.
A very different approach to web browsing was taken when Google
Maps was launched in 2005, the poster project for the "web 2.0".
You arrive at the map site with a coordinate. From there, you can
pan in any direction and new parts of the map (called "tiles") are
downloaded by advanced JavaScript and XML (AJAX) calls as
you go. Your browser will never hold the entire map. It doesn't
matter how big the entire map is, just like it doesn't matter how
big the entire Wikipedia website is. The unit of information to fetch
is the "tile", just like the web 1.0 unit was the HTML page.
If we applied this web 2.0 principle to Wikibooks and Wikisource,
we wouldn't need to have pages with previous/next links. We could
just have smooth, continuous scrolling in one long sequence. Readers
could still arrive at a given coordinate (chapter or page), but
continue from there in any direction.
Examples of such user interfaces for books are Google Books and the
Internet Archive online reader. You can link to page 14 like this:
http://books.google.com/books?id=Z_ZLAAAAMAAJ&pg=PA14
and then scroll up (to page 13) or down (to page 15). The whole
book is never in your browser. New pages are AJAX loaded as they
are needed. It's like Google maps except that you can only pan in
two directions (one dimensions), not in the four cardinal directions.
And the zoom is more primitive here. After you have scrolled to page
19, you need to use the "Link" tool to know the new URL to link to.
At the Internet Archive, the user interface is similar, but the URL
in your browser is updated as you scroll (for better or worse),
http://www.archive.org/stream/devisesetembleme00lafeu#page/58/mode/1up
If we only have scanned images of book pages, this is simple enough,
because each scanned image is like a "tile" in Google maps. But in
Wikisource, we have also run OCR software to extract a text layer for
each page, and we have proofread that text to make it searchable.
I still have not learned JavaScript, but I guess you could make AJAX
calls for a chunk of text and add that to the scrollable web page, just
like you can add tiled images. Google has not done this, however. If
you switch to "plain text" viewing mode,
http://books.google.com/books?pg=PA14&id=Z_ZLAAAAMAAJ&output=text
you get traditional web 1.0 "pages" with links to the previous and
next web page. (Each of Google's text pages contains text from 5 book
pages, e.g. page 11-15, only to make things more confusing.)
But the real challenge comes when you want to wiki edit one such
chunk of scrollable text. I think it could work similar to our section
editing of a long Wikipedia article. But to be really elegant, I should
be able, when editing a section, to scroll up or down beyond the current
section, in an eternal textarea.
If we can solve this, "section editing 2.0" that goes outside of the box
(or maybe we should skip directly to WYSIWYG editing), then we can
have the beginning of a whole new Wikisource interface.
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
Sorry, I forgot a topic ; reposting the previous message:
I would like to extend the syntax of the <ref> tag (Cite extension), in
order to deal with footnotes that are spread on several transcluded
pages. Since the Cite extension is widely used, I guess I better ask
here first.
Here is an illustration of the problem :
http://en.wikisource.org/wiki/Page:Robert_the_Bruce_and_the_struggle_for_Sc…
On the bottom of the scan you can see the second half of a footnote.
That footnote begins at the previous page :
http://en.wikisource.org/wiki/Page:Robert_the_Bruce_and_the_struggle_for_Sc…
Wikisourcers currently have no way to deal with these cases in a clean
way. I have written a patch for this (the code is here :
http://dpaste.org/QOMH/ ). This patch extends the "ref" syntax by adding
a "follow" parameter, like this :
<ref follow="foo">bar</ref>
After two pages are transcluded, the wikitext passed to the parser will
look like this :
blah blah blah
blah blah blah<ref name="note1">beginning of note 1</ref>
blah blah blah
blah blah blah
blah blah blah<ref follow="note1">end of note</ref>
blah blah blah
This wikitext is rendered as a single footnote, located in the text at
the position of the parent <ref>. If the parent <ref> is not found (as
is the case when you render only the second page), then the text inside
the tag is rendered at the beginning of the list of references, with no
number and no link.
does this make sense ?
Thomas
And what about the LST template combined to section tags? One example:
- Page 1 is
http://ca.wikisource.org/wiki/P%C3%A0gina:Cansons_de_la_terra_%281866%29.dj…
and contains some text with one full reference, also the beginnig of a second
ref, and we can also see the transclusion of the following portion of the second
ref (from Page 2, thanks to "Lst" template). We see the full second ref, but
just proofread it at its "physical" page.
- Page 2 is
http://ca.wikisource.org/wiki/P%C3%A0gina:Cansons_de_la_terra_%281866%29.dj…
and contains 2 sections: some text with one reference, and the following portion
of the reference (transcluded at Page 1).
So, the final transclusion uses the "Page" template with the "section" parameter
when necessary:
http://ca.wikisource.org/wiki/Cansons_de_la_terra_-_Volum_I/Introducci%C3%B2
It's a bit uncomfortable, but also in books made of paper!! Why do they break
refs??
Regards,
Aleator
------------------------------
Message: 8
Date: Wed, 11 Aug 2010 17:43:35 +0200
From: ThomasV <thomasV1(a)gmx.de>
Subject: [Wikisource-l] (pas de sujet)
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>,
"discussion list for Wikisource, the free library"
<wikisource-l(a)lists.wikimedia.org>
Message-ID: <4C62C527.2000204(a)gmx.de>
Content-Type: text/plain; charset=ISO-8859-1
I would like to extend the syntax of the <ref> tag (Cite extension), in
order to deal with footnotes that are spread on several transcluded
pages. Since the Cite extension is widely used, I guess I better ask
here first.
Here is an illustration of the problem :
http://en.wikisource.org/wiki/Page:Robert_the_Bruce_and_the_struggle_for_Sc…
On the bottom of the scan you can see the second half of a footnote.
That footnote begins at the previous page :
http://en.wikisource.org/wiki/Page:Robert_the_Bruce_and_the_struggle_for_Sc…
Wikisourcers currently have no way to deal with these cases in a clean
way. I have written a patch for this (the code is here :
http://dpaste.org/QOMH/ ). This patch extends the "ref" syntax by adding
a "follow" parameter, like this :
<ref follow="foo">bar</ref>
After two pages are transcluded, the wikitext passed to the parser will
look like this :
blah blah blah
blah blah blah<ref name="note1">beginning of note 1</ref>
blah blah blah
blah blah blah
blah blah blah<ref follow="note1">end of note</ref>
blah blah blah
This wikitext is rendered as a single footnote, located in the text at
the position of the parent <ref>. If the parent <ref> is not found (as
is the case when you render only the second page), then the text inside
the tag is rendered at the beginning of the list of references, with no
number and no link.
does this make sense ?
Thomas
"Neil Kandalgaonkar" <neilk(a)wikimedia.org> wrote in message
news:4C62E7E2.3040904@wikimedia.org...
> For obvious reasons only consecutive pages should be allowed here.
>
Not true. Sometimes a footnote will start on one page, and the next page
will have a full-page illustration, and then the footnote will continue on
the subsequent pages. In books where there all the photgraphic plates are
next to each other (which is very common, due to the printing/binding
process) the gap could be of an arbitrary length.
- Mark Clements (HappyDog)