On wikipedia-l Florence Devouard wrote:
> During that event, I mentionned that the French chapter has
> several ongoing discussions with various museums to set up
> content partnerships.
Wikisource is really a much larger project than Wikipedia.
Consider any public library: The encyclopedia shelf or quick
reference section (Wikipedia) is less than one percent of the
whole library (Wikisource). After seven years of writing
Wikipedia, we are now getting useful results in many languages.
Wikisource might take 70 years.
What we can expect during 2009 is some small step forward on this
longer path. Taking a single step might sound easy, but it's hard
enough to know which direction is forward.
If you can achieve real, practical, pragmatic cooperations that
actually result in more free content, even if it is not very much,
that is probably the best step forward. But you must be prepared
that infighting and prestige among public institutions can be
tough, especially when it comes to competing for funding.
> In Europe, at least in some countries, we meet several problems
> * many scholars have a rather bad image of Wikipedia (because
> written by amateurs, anonymous members, plagued by vandals
> etc...)
There is a clear risk that this bad image is enforced. Our
message that "anybody can contribute" is hard to combine with the
prestigeous thinking among the institutions where you seek
cooperation.
----
I'd like to recommend an article in the October 2008 issues of the
open access journal "First Monday", "Mass book digitization: The
deeper story of Google Books and the Open Content Alliance" by
Kalev Leetaru,
http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2101/2037
This article is just one in a ton of literature on how to scan (or
microfilm) books, that have appeared in the last 20 years. But it
is interesting because it evaluates two large-scale projects of
the last few years, and compares them to each other. Even though
"digital libraries" is a new science, it is already full of
established truths. Perhaps this is due to the high involvement
of public institutions. One such truth is that image compression
(with JPEG artifacts) must be avoided at all cost.
Both Google Books and the Open Content Alliance (Internet Archive)
break this rule, by using consumer-grade digital cameras and JPEG
compression, and should thus be considered a waste of time,
according to conventional wisdom (or "best current practices").
Still, nobody can avoid being impressed with their results, and so
the scientific world needs to revise its understanding of the
current state of the art. The author of this article goes to
great lengths (in the "Discussion" section) to explain that what
these projects do is "access digitization", which is described as
something completely different than traditional book scanning:
"Before one can compare the two projects, it is important to
first realize that both projects are really only access
digitization projects, despite the common assertion of OCA
captures as preservation digitization. Neither initiative uses
an imaging pipeline or capture environment suitable for true
preservation scanning. The OCA project outputs
variable–resolution JPEG2000 files built from lossy
camera–generated JPEG files. A consumer area array digital
camera is used to produce images ..."
Needless to say, neither Project Gutenberg nor Wikisource are
mentioned in this article. Their goals are just too different
(what? free content?), their achievements not impressive enough.
They are not potential future employers of "digital library"
scholars. If you help them or cooperate with them, you will only
help mankind in an altruistic fashion (what fools!), you will not
help your own professional or academic career.
In the article, the Open Content Alliance already plays the role
of the fools. They have only (!) digitized 100,000 books, while
Google Books has millions. They do not provide the same search
capability. And so it goes on. The next time the Internet Archive
(OCA) applies for funding or tries to establish cooperations with
more institutions, such arguments might be used against them.
----
What Wikisource really needs to do, is to provide an explanation
of what it does, and how this goes beyond Google Books' "access
digitization". In Europe, this must be set in the perspective of
ongoing French, German and EU initiatives (Gallica, Theseus,
Quaero, Europeana, ...). When one of those projects applies for
funding, it will need to show that it is successful in attracting
cooperation partners and that it is a leader among similar
projects. We should be prepared that they take any opportunity to
define Wikisource as a loser, amateurish, clueless project. This
is not because they are evil, only because they do what they can
to get the funding they need.
Why should museum X or library Y or archive Z cooperate with
Wikisource, when it risks being associated with such descriptions
of failure? The alternative for that institution might be to
cooperate with the successful Google or Gallica. So why is
Wikisource superior? This is what we need to explain.
> * develop arguments for museums etc...
Exactly.
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
---------- Forwarded message ----------
From: John at Darkstar <vacuum(a)jeb.no>
Date: Wed, Sep 24, 2008 at 10:27 PM
Subject: [Foundation-l] Old newspapers going to destruction
To: Wikimedia Foundation Mailing List <foundation-l(a)lists.wikimedia.org>
In Norway a university has a large collection of newspapers, the
collection is claimed to cover around 3000 running meters in the store
house - without the norwegian and nordic newspapers, whats left is
international newspapers from the last 150 years. If no one is coming up
with a solution the collection is going to be destructed (actually burned)
I think the best thing to do is to scan them and make them publicly
available. Of course neither I or WM Norway can set forth to do such a
task, but if there should be some wealthy person out there that might be
able to involve himself in such a task, I think it would be a very
worthy gift to the mankind (where is the women!) to do such a thing.
When I heard of this I was shocked. Most of us are. I've infact studied
with the university that attempted tu burn the newspapers. The plans
have been stalled for now, but some permanent solution has to be found.
John
_______________________________________________
foundation-l mailing list
foundation-l(a)lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> De: "John Vandenberg" <jayvdb(a)gmail.com>
> A: "discussion list for Wikisource, the free library" <wikisource-l(a)lists.wikimedia.org>
> Objet: Re: [Wikisource-l] Changing the Wikisource main page
> Date: Thu, 11 Sep 2008 09:40:56 +1000
> On Thu, Sep 11, 2008 at 4:04 AM, Syagrius <<a href=mailto:"syagrius(a)caramail.com">syagrius(a)caramail.com</a>> wrote:
> > Hello,
> >
> >
> > As Wikipedia decided to change its main page presentation, I think that
> > Wikisource maybe should do the same. The "War" between the spanish and
> the
> > chinese wikisources demonstrates that the article count does not reflect
> the
> > true depht of a Wikisource, as someone can create thousands of very small
> > articles. What should the main page present ? I don't think that, as
> > Wikipedia did, chosing the number of visitors would be a good idea, since
> > Wikisource is very less known than Wikipedia, and the figures may be not
> > reliable. If these stats from Erik Zachte can be trusted (and renewed), I
> > would suggest that the number of words
> > <a target="_blank"
> href='http://stats.wikimedia.org/wikisource/EN/TablesDatabaseWords.htm'>http
> ://stats.wikimedia.org/wikisource/EN/TablesDatabaseWords.htm</a> may be the
> > fairest figure to present. The only problem would be : is the number of
> > words given for the Chinese (and Japanese, also) wikisource correct ?
>
> It is an interesting statistic, and I havent investigated the
> algorithm being used. Is there a mathematical description of the
> algorithm used?
>
> My first guess is that we would need to weight it according to the
> entropy of each language. For example, Chinese and Japanese have a
> much higher entropy, so they need to be weighted higher.
>
> > What do you think of this proposition ?
>
> If we are going to change the front page of the portal, it is these
> stats that I would like to see used and improved:
>
> <a target="_blank"
> href='http://wikisource.org/wiki/Wikisource:ProofreadPage_Statistics'>http:/
> /wikisource.org/wiki/Wikisource:ProofreadPage_Statistics</a>
>
> We need to present ourselves as a _serious_ project, doing top quality
> work.
>
> I suggest that we also feature two texts on the main portal each month:
> - one work that is hosted on wikisource.org - i.e. from a language
> which is _not_ on a subdomain
> - one work from a subdomain, from a different sub-domain each month,
> _after_ it has been selected as a featured text on the subdomain.
>
> --
> John Vandenberg
>
> _______________________________________________
> Wikisource-l mailing list
> Wikisource-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
For more informations about the statistics, I think that you may contact Erik Zachte himself. I still think that the number of words is the best solution, because a Wikisource could easily create a lot of "pages" with no correction at all. I don't really understand what you mean by "entropy of each langage", but it seems that the number of words of the Chinese Wikisource is ok : 29M and 42M for the Spanish Wikisource...
Xavier
(Enmerkar on French wikisource)
---------- Forwarded message ----------
From: Rebecca Hargrave Malamud <webchick(a)invisible.net>
Date: Thu, Sep 11, 2008 at 10:16 AM
Subject: [ol-discuss] Announcement: Internet Archive "Using Digital
Collections" Conference, October 27-28, 2008
To: ol-discuss(a)archive.org
On October 27-28 the Internet Archive will host its annual conference
in San Francisco with the theme "Using Digital Collections". The
meeting is being expanded from its previous one-day format in response
to participant feedback from the 2007 meeting.
This year we will build on prior themes of creating and accessing
digital content by spotlighting how digital collections are being used
to promote scholarship and to bridge institutional and geographic
boundaries. The agenda will also include technical updates on
scanning projects and a meeting of the Open Content Alliance.
A detailed agenda will be distributed in September; in the meantime,
please block out the dates and plan to join us in San Francisco!
Meeting Specifics:
Monday, October 27 - Tuesday, October 28
Golden Gate Club, Presidio of San Francisco
If you have questions or have suggestions for others who should be
invited, please contact Casey Nelson at casey(a)archive.org or Linda
Frueh at linda(a)archive.org.
RSVP by October 1 to casey(a)archive.org
_______________________________________________
Ol-discuss mailing list
Ol-discuss(a)archive.org
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
Dear everyone.
I am in possession of a book, "The University of Glasgow Old and New".
(Partial details:
http://special.lib.gla.ac.uk/exhibns/month/july2008.html). My copy is
#22 of 50, I believe there was also a cheaper edition of 150
or 250. I really want to make this available to Wikipedia, but need help:
The book is relatively tightly bound, and the spine gets in the way of the
scanner. I need someone with either a scanner that doesn't have a one-inch
bit of plastic between the scanner bed and the edge of the scanner, thus
cutting off the photos, or a photographer able to take high-quality photos
of the photos.
I would also like to donate copies of some very large (A0 and A1) medical
posters, from the series "Supplement to the Anatomy of Labour". These are a
bit delicate, though not ridiculously so, but I'd be uncomfortable running
them through a strip scanner. If someone has a sufficiently large flatbed,
or can take photos of them, let's do this.
Please help, I've been trying to get these to Wikipedia for 6 months now.
Thank you,
Adam Cuerden