Maybe we'll see a good free software OCR solution yet ..
---------- Forwarded message ----------
From: Bill Janssen <bill(a)janssen.org>
Date: Mar 10, 2007 10:51 AM
Subject: [BP] OCRopus OCR shows up on Google Code...
To: Book People Mailing List <bookpeople(a)pobox.upenn.edu>
Those of us interested in OCR will want to keep an eye on
http://code.google.com/p/ocropus/.
>From that Web page: ``We are planning on an alpha-release of the
system in Q1. It's a technology preview that shows the architecture
and is already quite useful...''
This is the follow-on to the Tesseract work that came out last year.
Bill
-----------------------------------------------------------------------------
This message was sent via the Book People mailing list.
Posting address: bookpeople(a)pobox.upenn.edu
Admin. & unsubscribe address: bookpeople-request(a)pobox.upenn.edu
Charter & archive: http://onlinebooks.library.upenn.edu/bplist/
--
Peace & Love,
Erik
DISCLAIMER: This message does not represent an official position of
the Wikimedia Foundation or its Board of Trustees.
"An old, rigid civilization is reluctantly dying. Something new, open,
free and exciting is waking up." -- Ming the Mechanic
---------- Forwarded message ----------
From: Juliet Sutherland <vze3rknp(a)verizon.net>
Date: Mar 10, 2007 2:12 AM
Subject: [BP] DP Releases 10,000th Book
To: Book People <bookpeople(a)pobox.upenn.edu>
Today Distributed Proofreaders <http://www.pgdp.net> (DP) posted a
package of texts that takes us over 10,000 completed titles. I'm very
proud of our community of volunteers who have accomplished this
beautiful number. The 10K package (listed below) showcases the wide
range of our volunteers' interests and talents.
There are examples of a number of our on-going large projects including
the Slave Narratives (now over half done), the Bureau of American
Ethnography reports, periodicals (this issue of Punch is the 280th that
we've done), and the beginning of a new project, Linnaeus' Species
Plantarum. Children's literature and Science Fiction are two popular
areas and both are represented on this list. Shakespeare in French and
John Evelyn's classic work on the trees of England represent the
classics. The Shanty Book has music to listen to, and the Encyclopedia
of Needlework deserves its name, being so large and full of illustrations.
It's hard to argue which of these titles is the most significant, but I
lean towards The annals of the Cakchiquels. Here is what one of the
people involved with its production told me about it, "The Annals of the
Cakchiquels contains the text and translation of a document written in
the 17th century in Cakchiquel Maya, a Mayan language of highland
Guatemala. The document begins with a history of the Cakchiquel people
and continues with the history of the writer's family through the
arrival of the Spanish in the area and to the date of the writing. It is
one of the few documents written in Mesoamerican indigenous languages
that contain the history of the people before the Spanish Conquest. The
author, Daniel G. Brinton, was an early archaeologist, ethnologist, and
linguist."
There are thousands of DP volunteers who have made this milestone
possible. They all deserve thanks for their contributions!
JulietS
The DP 10,000 Collection
20771 Species Plantarum: Monandria, Diandria and Triandria by Carolus
Linnaeus (Carl von Linni) 1753 Latin
20772 Agriculture for beginners, Rev. ed. by Charles William Burkett,
Frank Lincoln Stevens, and Daniel Harvey Hill. 1914
20773 Marchand de Venise by Shakespeare, trans. by M. Guizot. original
1821, ed. transcribed.1862 French
20774 The Shanty Book, Part I, Sailor Shanties by Richard Runciman Terry
(1864-1938) 1921
20775 The annals of the Cakchiquels: The original text, with a
translation, notes, and introduction by Francisco Ernantez Arana (fl.
1582), trans. by and edit. by Daniel G. Brinton (1837-1899) 1885
English/Cakchiquel Mayan
20776 Encyclopedia of Needlework, by Therese de Dillmont originally
from 1884
20777 R. Caldecott's First Collection of Pictures and Songs by Randolph
Caldecott. [1900-1909?]
20778 Sylva, or, A Discourse of Forest-Trees by John Evelyn
(1620-1706) 1664
20779 Heimatlos by Johanna Spyri, 1890 German
20780 Punch, or the London Charivari, Volume 159, October 27, 1920
20781 Heidi by Johanna Spyri, trans. Elisabeth P. Stork, with an intro
by Charles Wharton Stork, A.M. PhD, Illustrations by Maria L. Kirk. Gift
edition. 1919
20782 Triplanetary, by E.E. Smith 1934
20783 Como atravessei @frica (v. II), by Alexandre Alberto da Rocha de
Serpa Pinto (aka Serpa Pinto) 1881 Portuguese
20784 Eighth Annual Report of the Bureau of Ethnology to the Secretary
of the Smithsonian Institution, 1886-1887, ed. John Wesley Powell
20785 Slave Narratives, Oklahoma (A Folk History of Slavery in the
United States From Interviews with Former Slaves) Works Project
Administration Federal Writer's Project 1936-1938
-----------------------------------------------------------------------------
This message was sent via the Book People mailing list.
Posting address: bookpeople(a)pobox.upenn.edu
Admin. & unsubscribe address: bookpeople-request(a)pobox.upenn.edu
Charter & archive: http://onlinebooks.library.upenn.edu/bplist/
--
Peace & Love,
Erik
DISCLAIMER: This message does not represent an official position of
the Wikimedia Foundation or its Board of Trustees.
"An old, rigid civilization is reluctantly dying. Something new, open,
free and exciting is waking up." -- Ming the Mechanic
We have just launched http://planet.wikimedia.org/ , which is an
aggregator for all on-topic wiki-related weblog (blog) posts by
participants in Wikimedia projects. The planet can be found at:
http://planet.wikimedia.org/
To get added, please follow the instructions at:
http://meta.wikimedia.org/wiki/Planet_Wikimedia
This is a kind of beta test, and right now, the planet is in the
English language; however, I have prepared a process for requesting
new languages to be set up here:
http://meta.wikimedia.org/wiki/Planet_Wikimedia/New_language
So please add your support if you wish to aggregate blog posts in
another language.
Again, this is for on-topic posts, not for diary entries. All feeds
must either point to a blog which is almost exclusively about wikis,
or filtered (WordPress, Blogger and other common blog engines all
support filtered feeds by categorizing your posts, e.g., adding the
"wiki" category to all posts which you want to be included in the
planet). If this makes you feel uncomfortable, you can (in addition or
in substitution) add your blog to http://wikiblogplanet.com/ , which
does not filter posts for on-topicness. WikiBlogPlanet is run
independently by Nick Jenkins.
I hope that this new tool will allow us to share useful and
interesting information, as well as opinions, more effectively across
project boundaries.
--
Peace & Love,
Erik
DISCLAIMER: This message does not represent an official position of
the Wikimedia Foundation or its Board of Trustees.
"An old, rigid civilization is reluctantly dying. Something new, open,
free and exciting is waking up." -- Ming the Mechanic
[Crossposted on many lists]
Hello,
The press team of the Foundation is currently building a press list file and
gathering contact information of journalists and editors of all the major
media (newspapers, magazines, radio, tv) around the world. The aim is to
improve our communication and be able to reach people from all over the
world (not only people from the US and not only people from English-speaking
countries).
We are requesting help from the whole community to gather this information.
We have the chance to have a global community of users and volunteers and we
need to take advantage of this chance. Please help us by sending contact
information of journalists and editors you know or from your country.
The basic information are :
Name of the journalist:
First name:
Email:
Name and type of the media: (for instance "CNN, worldwide news TV channel")
Any other information about each contact is welcome (area of concentration,
coverage, schedule...).
Please send the information you have got to: press at wikimedia dot org and
*do not answer this email*, I will probably not get the emails since I am
not subscribed to all lists I have posted to.
Thanks for your help!
--
Guillaume Paumier
[[m:User:guillom]]
http://www.wikimedia.org
Wikimania, Wikimedia's annual global conference for the community, has
today released its call for participation.
http://wikimania2007.wikimedia.org/wiki/Call_for_Particpation
We will shortly be accepting submissions for presentations, panels,
discussions, posters, and more. I would like to encourage all
community members to make a submission. Wikimania is the wiki
conference aimed at the Wikimedia community, so it's essential that
you are represented here. If there is something interesting about your
wiki that you want to share with 400 Wikimedians in Taipei this
summer, please respond to this call for participation.
We are also looking for volunteers to review the submissions made and
help us to choose what will be presented at Wikimania. Depending on
what you would like to review, please contact one of the following
people to let them know you would like to help.
Wikimedia Communities presentations: Angela Beesley:
http://wikimania2007.wikimedia.org/wiki/User:Angela
Free Content presentations: Phoebe Ayers:
http://wikimania2007.wikimedia.org/wiki/User:Brassratgirl
Technical infrastructure presentations: James Forrester:
http://wikimania2007.wikimedia.org/wiki/User:Jdforrester
Posters and Panels (all themes): Jakob Voss:
http://wikimania2007.wikimedia.org/wiki/User:JakobVoss
Artistic artifacts, Workshops, and Birds-of-a-Feather (all themes):
TzuChiang Liou:
http://wikimania2007.wikimedia.org/wiki/User:TzuChiang_Liou
Angela Beesley
---------- Forwarded message ----------
From: David Monniaux <David.Monniaux(a)free.fr>
Date: Feb 28, 2007 8:04 PM
Subject: [Commons-l] recent legal issues in France
To: wikipedia-l(a)lists.wikimedia.org
Cc: ca(a)wikimedia.fr, Wikimedia Commons Discussion List <
commons-l(a)lists.wikimedia.org>
Two legal changes in France, one working for us, one against us:
1) Wartime copyright extensions
Before 1995, the normal duration of copyright in France was 50 years
after the death of the author, or after publication in the case of
collective or anonymous works. However, there were special extensions
meant to compensate the world wars. Due to European "harmonization" of
laws, the normal duration was extended to 70 years (following Germany, I
think). The lingering question was whether the war extensions still
applied. The French Court of Cassation
<http://en.wikipedia.org/wiki/Court_of_Cassation> said they didn't (at
least in the case where extensions didn't start to elapse before 1995,
but those who did will be over in 2009 or so). A yet unsolved question
is the case of the 30 year extensions for authors killed in action (the
only major author that comes to mind is Antoine de Saint Exupéry
<http://en.wikipedia.org/wiki/Antoine_de_Saint_Exup%C3%A9ry>, of /Little
Prince/ fame).
Wikimedia France is having its counsel investigate the exact
implications of this evolution. In any case, it seems that the situation
is better for us.
For the curious, the relevant articles are in the Code of intellectual
property, L123-1, -8, -9 and -10, and in rulings 280 and 281 of the
Court of Cassation, first civil chamber, for 2007.
2) Repression of 'happy slapping'
The French parliament has just passed a law aimed at preventing
delinquence. Among a gazillion of measures on diverse issues more or
less related to delinquence, the Senate passed an amendment aimed at
repressing 'happy slapping'. 'Happy slapping' is basically youngsters
beating up people, filming the scene with cell phones, and broadcasting
the movie in order to humiliate the victim.
Unfortunately, the wording of the amendment was broad, and basically
criminalized against filming or broadcasting the film of certain kinds
of violences, unless one does so for gathering evidence for legal
proceedings, or as part of the normal work of a *profession* whose goal
is to inform the public.
In short, they have outlawed normal citizens (not professional
journalists) video reporting on certain kinds of violences. This could
prove a problem for Wikipedia, Commons and Wikinews contributors; for
instance, if reporting on police violence.
Wikimedia France contacted officials, who claimed that this was not the
intent; the only target was happy slapping, and that if we had called
them earlier they would have the amendment altered.
Now, several good things can still happen :
* The opposition has had the law sent for constitutional review. It is
possible that this article will get constitutional "reservations of
interpretation" that will clarify the situation of non professionals.
* The government will perhaps clarify the issue; that is, make it clear
that the intent of this article is not to punish non professionals
reporting on events with a goal to inform the public.
* Even if the law is accepted as is, prosecutors may get orders not to
enforce it against non professionals who merely meant to inform.
* Judges may also decide the same, if they feel there is a superior need
of freedom of speech.
Finally, do not forget that there are legislative elections 3 months
away or so, and this law may go 'pschitt' if the current opposition wins.
In any case, I don't expect actual prosecution of "citizen journalists",
though I envision as a credible possibility that overzealous police may
want to get rid of undesirable reporters using this law. Even if you are
prosecuted or sentenced in the end, being taken into police custody is
intimidating enough.
Wikimedia France tries to get informed on that issue.
_______________________________________________
Commons-l mailing list
Commons-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/commons-l
Jim Hu wrote:
> For example, the web service at Pubmed provide the abstract and
> links to full text (at yet another website) for a publication.
> My users would want to add things like: "This paper describes a
> resource that turned out to be useful for doing X" or "Figure 1
> in this paper shows this thing that the authors didn't notice"
> or "The xxx gene described in this paper is also known as yyy;
> they were shown to be the same 10 years later" etc.
I have a similar problem. At http://runeberg.org/ I digitize old
books, among them several encyclopedias. For the sake of
familiarity, you can think about scanned books in Wikisource
rather than my website.
In many cases an encyclopedia from 1889 is useful for knowing the
population of Aberdeen in 1889. It could be nice to report what
the current population is, but in some cases it is also important
to point out that the reported number for 1889 was indeed wrong.
But if scanning and OCRing one page takes 3 seconds and
proofreading takes 3 minutes, how long does it take to check all
the facts? Not knowing how this should best be addressed, it
seemed like a stupid idea to digitize more old works that are full
of errors.
When Wikipedia was started in 2001 and started to get off the
ground, this became the obvious place to put information on the
current and historic population of Aberdeen. The scanning of old
texts no longer had to carry this role. It was really only in
2002 and 2003 that I got the energy to scan more works for my own
site, and in 2005 I scanned this for Wikisource,
http://en.wikisource.org/wiki/The_New_Student%27s_Reference_Work
Turns out Aberdeen's population in 1911 was 163,084,
http://en.wikisource.org/wiki/The_New_Student%27s_Reference_Work/1-0016http://en.wikisource.org/wiki/The_New_Student%27s_Reference_Work/Aberdeen
but this bit of information is not linked to or included in
http://en.wikipedia.org/wiki/Aberdeen#Population
So one problem still exists: From the scanned book page, there is
no link to the Wikipedia article that provides more up-to-date
information. The reader of the scanned page can of course use a
search engine, and will often find the Wikipedia article. But is
this really the ultimate solution? And even if the Wikipedia
article is found, the other scanned pages that link to the same
article are not found from there.
Should each scanned book page include a list of links to Wikipedia
articles that are relevant for the page? Could such lists be
compiled (or suggested) automatically?
Should Wikisource have a [[category:Aberdeen]] that collects all
pages, chapters and books that pertain to this town? Today the
English Wikisource has one [[Category:Works by subject]], but
under this is a very small tree, compared to all articles in
Wikipedia. There is no category for Aberdeen, but one for
Scotland that has 15 links of which 4 are to articles in the 1911
Encyclopaedia Britannica. The 1911 EB article "Aberdeen (burgh)"
is not among these four,
http://en.wikisource.org/wiki/1911_Encyclop%C3%A6dia_Britannica/Aberdeen_%2…
Wikisource also has a [[Category:Ottoman Empire]] that contains
four articles from the 1911 Encyclopaedia Britannica, one other
chapter and two other works. But the corresponding category on
the English Wikipedia has 56 pages and 12 immediate subcategories.
Even the sub-subcategory Ottoman railways has 6 Wikipedia
articles. On Wikisource there seem to be 6 mentions of the
"Orient Express", but these are found through Google and not
through links on the website,
http://www.google.com/search?q=%22orient+express%22+site%3Aen.wikisource.org
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
I went through the application information and I think we could
definitely be a strong contender, if we had a good partner and a sound
project.
Please contribute ideas here:
http://meta.wikimedia.org/wiki/NEH_Advancing_Knowledge_grant
Basically, we can't use it to purchase existing collections, and we
need to have a specific project with a partner such as a library,
museum or archive. So we need to brainstorm on either what good
projects would be (and then, who would be good to approach as a
partner?) or on who good partners would be (and then, what would make
sense as a collaborative project?).
After mulling over this a bit, it just screams LIBRARY + WIKISOURCE to
me. We need a partner who is sympathetic to open content and not
putting restrictions on public domain material.
If you have ideas for good partners and/or projects, please put them
on the meta page!
thanks,
Brianna
user:pfctdayelise
Hi,
I think that there are several areas where the projects would benefit
from a more proactive help from the Foundation. I will speak about two here:
1. Legal counselling.
Some projects (mainly Commons and Wikisource) need more input about
copyright issues from knowledgeable persons. I have asked twice to
juriwiki-l specific issues for Wikisource without receiving an answer,
even an acknowledgement that my request was received. Another concrete
example: Commons and Wikisource would benefit most from a cross table
about copyright rules, which countries use "most favourable rule apply",
etc.
2. Helping small projects. It seems that some projects have a difficult
start. Indian languages projects are my first examples. I will go to
India in January, and I would like to use this opportunity to recruit
new contributors. The Foundation could help coordinate this kind of
recruitement.
Best regards,
Yann
--
http://www.non-violence.org/ | Site collaboratif sur la non-violence
http://www.forget-me.net/ | Alternatives sur le Net
http://fr.wikipedia.org/ | Encyclopédie libre
http://fr.wikisource.org/ | Bibliothèque libre
http://wikilivres.info | Documents libres