Gregory Maxwell wrote:
> For digitizing what?
Exactly, that's the first question.
> Archive.org digitizes books using a pair of canon 1Ds (? perhaps
> it was a 5D? In any case the 5DII would be sufficient now) on a
> custom stand with a hacked up copy of gphoto2 to actuate the
> cameras.
That's Brewster Kahle doing things many years ago (2002? 2003?).
Today, a much cheaper low-end digital SLR, or even compact cameras
will give you the needed 10 or so megapixels. But again, if you
need to pay your staff, a ten times more expensive camera might
easily pay its own cost in increased speed, or increased shutter
lifespan.
> I'm not sure how they're dealing with curvature (I think they
> just may lay a glass plate on the pages), but it would be easy
> enough to solve using a laser pointer with a pattern generating
> holographic grating and a second exposure to capture the page
> distortion and some fairly simple software processing after the
> fact.
The Internet Archive apparently uses a fixed glass, and lowers the
book cradle to turn pages, http://aipengineering.com/scribe/
Other designs have a fixed book cradle and lifts the glass, e.g.
the Atiz DIY, http://diy.atiz.com/
I thought the Internet Archive design was very clever, since it
keeps a fixed distance from lens to book surface (beneath the
glass), until I saw the bkrpr.org where you just lift everything.
That's a design for 2009! I haven't tried to build one myself yet.
----
However, you can capture lots of books (that can be opened fully)
with a single camera, laying the book flat on a table with a glass
on top. That's just like a flatbed scanner (but much faster)
turned upside down.
In January 2008, I used a 10 megapixel Canon EOS 400D (Digital
Rebel XTi) with a 50 mm lens to shoot this, laying flat on a table
under a glass, http://runeberg.org/stridfin/0226.html
On that webpage, the image is reduced to 120 dpi (1.2 megapixel),
but the original is 300 dpi (7.5 megapixel). The map shown is
reused in http://en.wikipedia.org/wiki/Battle_of_Alavus
That's an example of how one specialized book can be very useful
for a limited Wikiproject. This book was published in 1909 for the
100th anniversary of the Finnish War (1808-1809), and digitized in
2008 for the 200th anniversary.
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
Project Runeberg - free Nordic literature - http://runeberg.org/
It concerns mainly WS. Yann
-------- Original Message --------
Subject: Re: [Foundation-l] Universal Library
Date: Sat, 5 Sep 2009 02:29:28 -0400
From: David Goodman
To: Yann Forget
More accurately, the number of items there with adequate and complete
information comes close to zero--or perhaps is actually zero-- One
example for now, the first I looked at:
http://fr.wikisource.org/wiki/Vie_d’Alexandre_le_Grand
Plutarque traduction Ricard, 1840
There were many eds. of his translation. The 1840 is not the first ed,
which was 1798-1803, but presumably the 1837-1841 published by F.A.
Dubois, Vol. and pages not specified.
It makes a difference whether a translation was done in the 18th c. or
the 19th.
http://en.wikisource.org/wiki/Lives_(Dryden_translation)/Alexander
Lives by Plutarch , translated by John Dryden
-- but it doesn't specify the edition at all.
The French comes out ahead, but not by much.
Neither specified just what copy was used, or even what printing, a
basic necessity for checking the transcription.
IAS and Google Book Search do. Not in the metadata, unfortunately, but
they do show it in the scan. They also scan multiple copies from
multiple libraries, a basic necessity for scholarly work. No
responsible academic would prepare a text from a single copy.
As for translation links, the enWS links to the frWS, the frWS links
to the enWS, but incorrectly. They both link to the Greek, which gives
no indication at all of which edition is being followed. It is very
unlikely to be the one use by Ricard or the one used by Dryden.
If you want to know, I do not work for the enWS because the accepted
standards are so low I have no hope of fixing it; for the enWP I can
at least have some effect.
As for the frWP sourcing, I checked 20 articles. Half were unsourced
entirely, or to primary sources from the subject of the article only.
The frWP does an excellent job of sourcing to primary documentary
sources--much better than the en. Neither do all that well otherwise,
except for scattered articles worked on by good people. The deWP is
the one that comes closer to adequacy.
David Goodman, Ph.D, M.L.S.
http://en.wikipedia.org/wiki/User_talk:DGG
On Thu, Sep 3, 2009 at 5:19 PM, Yann Forget<yann(a)forget-me.net> wrote:
> David Goodman wrote:
> ...
>> information) The accuracy & adequacy -- let alone completeness-- of
>> the bibliographic information in WS is close to zero,
>
> This alone shows that you know very little of this project, where I have
> never seen you. You claim to be an expert, but you talk about things
> which you don't know. So I won't pursue this discussion, it is quite
> useless in that context.
>
> ...
>> David Goodman, Ph.D, M.L.S.
>> http://en.wikipedia.org/wiki/User_talk:DGG
>
> Regards,
>
> Yann
--
http://www.non-violence.org/ | Site collaboratif sur la non-violence
http://www.forget-me.net/ | Alternatives sur le Net
http://fr.wikisource.org/ | Bibliothèque libre
http://wikilivres.info | Documents libres
You two seem to be talking past each other. Might I suggest that perhaps the quality of information on OPL and/or Wikipdia/Wikisource sites is rather different depending on whether you are reading in French or English? I don't know if this is the case but it could explain the discrepancies between your experiences.
Birgitte SB
--- On Thu, 9/3/09, David Goodman <dgoodmanny(a)gmail.com> wrote:
> From: David Goodman <dgoodmanny(a)gmail.com>
> Subject: Re: [Foundation-l] Universal Library
> To: "Wikimedia Foundation Mailing List" <foundation-l(a)lists.wikimedia.org>
> Date: Thursday, September 3, 2009, 2:19 PM
> I have been re-reading their
> documentation, and they have it well in
> hand. We would do very well to confine ourselves to
> matching up the
> entries in the WMF projects alone. Some of the data in WMF
> is more
> accurate than some of the OL data, but I would not
> say this to be a
> general rule. Far from it: the proportion of incomplete or
> inaccurate
> entires in enWP is probably well over 50% for books. (for
> journal
> articles it is better, because of a project to link to the
> pubmed
> information) The accuracy & adequacy -- let
> alone completeness-- of
> the bibliographic information in WS is close to zero,
> except where
> there is a IA scan of the cover and title page, from which
> full
> bibliographic information might be derived, but cannot
> necessarily be
> taken at face value.
>
> The unification of editions is non-trivial, as using the
> algorithm you
> suggest, you will also have all works related to Verne,
> and
> additionally a combination of general and partial
> translations,
> children's books, comic adaptation, and whatever.
> Modern library metadata provides for this to a certain
> limited
> extent--unfortunately most of the entries in current online
> catalogs
> do not show full modern data--many catalogs never had more
> than
> minimal records; Dublin core is probably not
> generally considered to
> be fully up to the problem either, at least in any current
> implementation.
>
> Those working on the OL side are fully aware of this. They
> have made
> the decision to work towards inclusion of all usable &
> obtainable data
> sets, rather than only the ones that can be immediately
> fully
> harmonized. This was very wise decision, as the way in
> which the
> information is to be combined & related is not fully
> developed, and ,
> if they were to wait for that, nothing would be entered.
> There will
> therefore be the problem of upgrading the records and the
> record
> structure in place--a problem that no large bibliographic
> system has
> ever fully handled properly--not that this incarnation of
> OL is likely
> to either. Bibliographers work for their time, not for all
> time to
> come.
>
>
> David Goodman, Ph.D, M.L.S.
> http://en.wikipedia.org/wiki/User_talk:DGG
>
>
>
> On Thu, Sep 3, 2009 at 6:38 AM, Yann Forget<yann(a)forget-me.net>
> wrote:
> > David Goodman wrote:
> >> I have read your proposal. I continue to be of the
> opinion that we are
> >> not competent to do this. Since the proposal
> says, that "this project
> >> requires as much database management knowledge as
> librarian
> >> knowledge," it confirms my opinion. You will never
> merge the data
> >> properly if you do not understand it.
> >
> > That's all the point that it needs to be join project:
> database gurus
> > with librarians. What I see is that OpenLibrary lacks
> some basic
> > features that Wikimedia projects have since a long
> time (in Internet
> > scale): easy redirects, interwikis, mergings, deletion
> process, etc.
> > Some of these are planned for the next version of
> their software, but I
> > still feel that sometimes they try to reinvent the
> wheel we already have.
> >
> > OL claims to have 23 million book and author entries.
> However many
> > entries are duplicates of the same edition, not to
> mention the same
> > book, so the real number of unique entries is much
> lower. I also see
> > that Wikisource has data which are not included in
> their database (and
> > certainly also Wikipedia, but I didn't really check).
> >
> >> You suggest 3 practical steps
> >> 1. an extension for finding a book in OL is
> certainly doable--and it
> >> has been done, see
> >> [http://en.wikipedia.org/wiki/Wikipedia:Book_sources].
> >> 2. an OL field, link to WP -- as you say, this
> is already present.
> >> 3. An OL field, link to Wikisource. A very good
> project. It will be
> >> they who need to do it.
> >
> > Yes, but I think we should fo further than that.
> OpenLibrary has an API
> > which would allow any relevant wiki article to be
> dynamically linked to
> > their data, or that an entry could be created every
> time new relevant
> > data is added to a Wikipedia projects. This is all
> about avoiding
> > duplicate work between Wikimedia and OpenLibrary. It
> could also increase
> > accuracy by double checking facts (dates, name and
> title spelling, etc.)
> > between our projects.
> >
> >> Agreed we need translation information--I think
> this is a very
> >> important priority. It's not that hard to do a
> list or to add links
> >> that will be helpful, though not exact enough to
> be relied on in
> >> further work. That's probably a reasonable
> project, but it is very
> >> far from "a database of all books ever published"
> >>
> >> But some of this is being done--see the frWP page
> for Moby Dick:
> >> http://fr.wikipedia.org/wiki/Moby_Dick
> >> (though it omits a number of the translations
> listed in the French Union
> >> Catalog, http://corail.sudoc.abes.fr/xslt/DB=2.1/CMD?ACT=SRCHA&IKT=8063&SRT=RLV&TRM=…]
> >> I would however not warrant without seeing the
> items in hand, or
> >> reading an authoritative review, that they are all
> complete
> >> translations.
> >> The English page on the novel lists no
> translations; perhaps we could
> >> in practice assume that the interwiki links are
> sufficient. Perhaps
> >> that could be assumed in Wiksource also?
> >
> > That's another possible benefit: automatic list of
> > works/editions/translations in a Wikipedia article.
> >
> > You could add {{OpenLibrary|author=Jules
> Verne|lang=English}} and you
> > have a list of English translations of Jules Verne's
> works directly
> > imported from their database. The problem is that,
> right now, Wikimedia
> > projects have often more accurate and more detailed
> information than
> > OpenLibrary.
> >
> >> David Goodman, Ph.D, M.L.S.
> >> http://en.wikipedia.org/wiki/User_talk:DGG
> >
> > Regards,
> >
> > Yann
> > --
> > http://www.non-violence.org/ | Site collaboratif sur la
> non-violence
> > http://www.forget-me.net/ | Alternatives sur le Net
> > http://fr.wikisource.org/ | Bibliothèque libre
> > http://wikilivres.info | Documents libres
> >
> > _______________________________________________
> > foundation-l mailing list
> > foundation-l(a)lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> >
>
> _______________________________________________
> foundation-l mailing list
> foundation-l(a)lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
On Fri, Sep 4, 2009 at 8:58 PM, Nikola Smolenski<smolensk(a)eunet.yu> wrote:
> John Vandenberg wrote:
>>> I want to encourage wikipedians and wikisourcerers to join the
>>> OpenLibrary project, just like you should also join OpenStreetMap
>>> and other good projects for free knowledge and information. Bring
>>> your experience. If you get tired of one project, as I do
>>> sometimes, work on another one for a while.
>>
>> Tell me _one_ thing that I can do at OpenLibrary that I can not do at
>> Wikisource.
>
> Are you suggesting that in addition to collecting free texts, Wikisource
> should also collect information about texts, free and nonfree, like
> OpenLibrary does? If so, that is a very interesting suggestion, and I
> support it.
Yes, that is my vision. We should have bibliographic information,
copyright details, list of chapter and summaries, list of older works
which are referenced and list of later works which reference it, etc.
However, the Wikisource community is not yet large enough to manage
that. A year ago the English Wikisource community changed the
restrictions on who can have an Author page.
Previously our rule was: the author must have at least one "free" work.
It changed to: the author must either have one "free" work, or they
must be deceased.
English Wikisource often includes modern works on the Author page of
deceased people, listing biographies, posthumous collections, etc.
As our community grows, managed by people who are focused on old
works, we can relax the inclusion criteria.
This is like the English Wikipedia becoming more inclusive as it has
grown, because there are more people policing the edges.
Organic growth.
If this doesn't happen, I wont fret as there are more than enough
public domain works to keep me learning for a few lifetimes. :-) I
think it is much more important that we revive interest in old works
which dont have a commercial publisher pushing new copies into
bookstores.
--
John Vandenberg
Yann -
A nice draft. you might want to add "collaborative, editable,
versioned, multilingual, annotated" database of all published "works"
(you may want more than just books).
To Lars and DGG: OL is doing just fine for some definitions of the
terms involved. But it needs ways for crowds to help, and larger
crowds helping. Short pithy statements about 'a universal library'
and how this will help research, education, and other projects will
help.
To general skepticism that Wikimedia should consider trying to
collaborate with or help other projects that are doing 'just fine on
their own' : helping others learn what we have learned is a good way
to improve the world of free knowledge. Most projects struggle with
some of the elements of public collaboration that have come switfly to
Wikipedia. [and vice versa: there are many things our projects can
learn from others, if we develop better relationships of sharing ideas
and expertise without sticking on points where we disagree]
SJ
On Tue, Sep 1, 2009 at 4:40 PM, Mary Murrell<murrell(a)berkeley.edu> wrote:
> I totally dig it.
>
>> Hello,
>>
>> I started a proposal on the Strategy Wiki:
>> http://strategy.wikimedia.org/wiki/Proposal:Building_a_database_of_all_book…
>>
>> IMO this should be a join project between Openlibrary and Wikimedia.
>> Both have an interest and a capacity to work on this.
>>
>> Regards,
>>
>> Yann
>> --
>> http://www.non-violence.org/ | Site collaboratif sur la non-violence
>> http://www.forget-me.net/ | Alternatives sur le Net
>> http://fr.wikisource.org/ | Bibliothèque libre
>> http://wikilivres.info | Documents libres
>> _______________________________________________
>> Ol-discuss mailing list
>> Ol-discuss(a)archive.org
>> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
>>
>
>
> _______________________________________________
> Ol-discuss mailing list
> Ol-discuss(a)archive.org
> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
>
David Goodman wrote:
> I have read your proposal. I continue to be of the opinion that we are
> not competent to do this. Since the proposal says, that "this project
> requires as much database management knowledge as librarian
> knowledge," it confirms my opinion. You will never merge the data
> properly if you do not understand it.
That's all the point that it needs to be join project: database gurus
with librarians. What I see is that OpenLibrary lacks some basic
features that Wikimedia projects have since a long time (in Internet
scale): easy redirects, interwikis, mergings, deletion process, etc.
Some of these are planned for the next version of their software, but I
still feel that sometimes they try to reinvent the wheel we already have.
OL claims to have 23 million book and author entries. However many
entries are duplicates of the same edition, not to mention the same
book, so the real number of unique entries is much lower. I also see
that Wikisource has data which are not included in their database (and
certainly also Wikipedia, but I didn't really check).
> You suggest 3 practical steps
> 1. an extension for finding a book in OL is certainly doable--and it
> has been done, see
> [http://en.wikipedia.org/wiki/Wikipedia:Book_sources].
> 2. an OL field, link to WP -- as you say, this is already present.
> 3. An OL field, link to Wikisource. A very good project. It will be
> they who need to do it.
Yes, but I think we should fo further than that. OpenLibrary has an API
which would allow any relevant wiki article to be dynamically linked to
their data, or that an entry could be created every time new relevant
data is added to a Wikipedia projects. This is all about avoiding
duplicate work between Wikimedia and OpenLibrary. It could also increase
accuracy by double checking facts (dates, name and title spelling, etc.)
between our projects.
> Agreed we need translation information--I think this is a very
> important priority. It's not that hard to do a list or to add links
> that will be helpful, though not exact enough to be relied on in
> further work. That's probably a reasonable project, but it is very
> far from "a database of all books ever published"
>
> But some of this is being done--see the frWP page for Moby Dick:
> http://fr.wikipedia.org/wiki/Moby_Dick
> (though it omits a number of the translations listed in the French Union
> Catalog, http://corail.sudoc.abes.fr/xslt/DB=2.1/CMD?ACT=SRCHA&IKT=8063&SRT=RLV&TRM=…]
> I would however not warrant without seeing the items in hand, or
> reading an authoritative review, that they are all complete
> translations.
> The English page on the novel lists no translations; perhaps we could
> in practice assume that the interwiki links are sufficient. Perhaps
> that could be assumed in Wiksource also?
That's another possible benefit: automatic list of
works/editions/translations in a Wikipedia article.
You could add {{OpenLibrary|author=Jules Verne|lang=English}} and you
have a list of English translations of Jules Verne's works directly
imported from their database. The problem is that, right now, Wikimedia
projects have often more accurate and more detailed information than
OpenLibrary.
> David Goodman, Ph.D, M.L.S.
> http://en.wikipedia.org/wiki/User_talk:DGG
Regards,
Yann
--
http://www.non-violence.org/ | Site collaboratif sur la non-violence
http://www.forget-me.net/ | Alternatives sur le Net
http://fr.wikisource.org/ | Bibliothèque libre
http://wikilivres.info | Documents libres
Hello, I have already answered some of these arguments earlier.
David Goodman wrote:
> Not only can the OpenLibrary do it perfect well without us.
> considering our rather inconsistent standards, they can probably do it
> better without us. We will just get in the way.
The issue is not if OpenLibrary is "doing it perfect well without us",
even if that were true. Currently what OpenLibrary does is not very
useful for Wikimedia, and partly duplicate what we do. Wikimedia has
also important assets which OL doesn't have, and therefore a
collaboration seems obviously beneficial for both.
> There is sufficient missing material in every Wikipedia, sufficient
> lack of coverage of areas outside the primary language zone and in
> earlier periods, sufficient unsourced material; sufficient need for
> updating articles, sufficient potentially free media to add,
> sufficient needed imagery to get; that we have more than enough work
> for all the volunteers we are likely to get.
>
> To duplicate an existing project is particularly unproductive when the
> other project is doing it better than we are ever going to be able to.
> Yes, there are people here who could do it or learn to do it--but I
> think everyone here with that degree of bibliographic knowledge would
> be much better occupied in sourcing articles.
It is clear that you didn't even read my proposal.
Please do before emitting objections.
http://strategy.wikimedia.org/wiki/Proposal:Building_a_database_of_all_book…
I specifically wrote that my proposal is not necessarily starting a new
project. I agree that working with Open Library is necessary for such
project, but I also say if Wikimedia gets involved, it would be much
more successful.
What you say here is completely the opposite how Wikimedia projects
work, i.e. openness, and that's just what is missing in Open Library.
> David Goodman, Ph.D, M.L.S.
Regards,
Yann
--
http://www.non-violence.org/ | Site collaboratif sur la non-violence
http://www.forget-me.net/ | Alternatives sur le Net
http://fr.wikisource.org/ | Bibliothèque libre
http://wikilivres.info | Documents libres
Lars,
I think we agree on what needs to happen. The only thing I am not
sure of is where you would like to see the work take place. I have
raised versions of this issue with the Open Library list, which I copy
again here (along with the people I know who work on that fine project
- hello, Peter and Rebecca). This is why I listed it below as a good
group to collaborate with.
However, the project I have in mind for OCR cleaning and translation needs to
- accept public comments and annotation about the substance or use of
a work (the wiki covering their millions of metadata entries is very
low traffic and used mainly to address metadata issues in their
records)
- handle OCR as editable content, or translations of same
- provide a universal ID for a work, with which comments and
translations can be associated (see
https://blueprints.launchpad.net/openlibrary/+spec/global-work-ids)
- handle citations, with the possibility of developing something like WikiCite
Let's take a practical example. A classics professor I know (Greg
Crane, copied here) has scans of primary source materials, some with
approximate or hand-polished OCR, waiting to be uploaded and converted
into a useful online resource for editors, translators, and
classicists around the world.
Where should he and his students post that material?
Wherever they end up, the primary article about each article would
surely link out to the OL and WS pages for each work (where one
exists).
> (Plus you would have to motivate why a copy of OpenLibrary should
> go into the English Wikisource and not the German or French one.)
I don't understand what you mean -- English source materials and
metadata go on en:ws, German on de:ws, &c. How is this different from
what happens today?
SJ
On Mon, Aug 3, 2009 at 1:18 PM, Lars Aronsson<lars(a)aronsson.se> wrote:
> Samuel Klein wrote (in two messages):
>
>> >> *A wiki for book metadata, with an entry for every published
>> >> work, statistics about its use and siblings, and discussion
>> >> about its usefulness as a citation (a collaboration with
>> >> OpenLibrary, merging WikiCite ideas)
>
>> I could see this happening on Wikisource.
>
> Why could you not see this happening within the existing
> OpenLibrary? Is there anything wrong with that project? It sounds
> to me as you would just copy (fork) all their book data, but for
> what gain?
>
> (Plus you would have to motivate why a copy of OpenLibrary should
> go into the English Wikisource and not the German or French one.)
>
>
> --
> Lars Aronsson (lars(a)aronsson.se)
> Aronsson Datateknik - http://aronsson.se
>
> _______________________________________________
> foundation-l mailing list
> foundation-l(a)lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>