You two seem to be talking past each other. Might I suggest that perhaps the quality of information on OPL and/or Wikipdia/Wikisource sites is rather different depending on whether you are reading in French or English? I don't know if this is the case but it could explain the discrepancies between your experiences.
Birgitte SB
--- On Thu, 9/3/09, David Goodman dgoodmanny@gmail.com wrote:
From: David Goodman dgoodmanny@gmail.com Subject: Re: [Foundation-l] Universal Library To: "Wikimedia Foundation Mailing List" foundation-l@lists.wikimedia.org Date: Thursday, September 3, 2009, 2:19 PM I have been re-reading their documentation, and they have it well in hand. We would do very well to confine ourselves to matching up the entries in the WMF projects alone. Some of the data in WMF is more accurate than some of the OL data, but I would not say this to be a general rule. Far from it: the proportion of incomplete or inaccurate entires in enWP is probably well over 50% for books. (for journal articles it is better, because of a project to link to the pubmed information) The accuracy & adequacy -- let alone completeness-- of the bibliographic information in WS is close to zero, except where there is a IA scan of the cover and title page, from which full bibliographic information might be derived, but cannot necessarily be taken at face value.
The unification of editions is non-trivial, as using the algorithm you suggest, you will also have all works related to Verne, and additionally a combination of general and partial translations, children's books, comic adaptation, and whatever. Modern library metadata provides for this to a certain limited extent--unfortunately most of the entries in current online catalogs do not show full modern data--many catalogs never had more than minimal records; Dublin core is probably not generally considered to be fully up to the problem either, at least in any current implementation.
Those working on the OL side are fully aware of this. They have made the decision to work towards inclusion of all usable & obtainable data sets, rather than only the ones that can be immediately fully harmonized. This was very wise decision, as the way in which the information is to be combined & related is not fully developed, and , if they were to wait for that, nothing would be entered. There will therefore be the problem of upgrading the records and the record structure in place--a problem that no large bibliographic system has ever fully handled properly--not that this incarnation of OL is likely to either. Bibliographers work for their time, not for all time to come.
David Goodman, Ph.D, M.L.S. http://en.wikipedia.org/wiki/User_talk:DGG
On Thu, Sep 3, 2009 at 6:38 AM, Yann Forgetyann@forget-me.net wrote:
David Goodman wrote:
I have read your proposal. I continue to be of the
opinion that we are
not competent to do this. Since the proposal
says, that "this project
requires as much database management knowledge as
librarian
knowledge," it confirms my opinion. You will never
merge the data
properly if you do not understand it.
That's all the point that it needs to be join project:
database gurus
with librarians. What I see is that OpenLibrary lacks
some basic
features that Wikimedia projects have since a long
time (in Internet
scale): easy redirects, interwikis, mergings, deletion
process, etc.
Some of these are planned for the next version of
their software, but I
still feel that sometimes they try to reinvent the
wheel we already have.
OL claims to have 23 million book and author entries.
However many
entries are duplicates of the same edition, not to
mention the same
book, so the real number of unique entries is much
lower. I also see
that Wikisource has data which are not included in
their database (and
certainly also Wikipedia, but I didn't really check).
You suggest 3 practical steps
- an extension for finding a book in OL is
certainly doable--and it
has been done, see [http://en.wikipedia.org/wiki/Wikipedia:Book_sources]. 2. an OL field, link to WP -- as you say, this
is already present.
- An OL field, link to Wikisource. A very good
project. It will be
they who need to do it.
Yes, but I think we should fo further than that.
OpenLibrary has an API
which would allow any relevant wiki article to be
dynamically linked to
their data, or that an entry could be created every
time new relevant
data is added to a Wikipedia projects. This is all
about avoiding
duplicate work between Wikimedia and OpenLibrary. It
could also increase
accuracy by double checking facts (dates, name and
title spelling, etc.)
between our projects.
Agreed we need translation information--I think
this is a very
important priority. It's not that hard to do a
list or to add links
that will be helpful, though not exact enough to
be relied on in
further work. That's probably a reasonable
project, but it is very
far from "a database of all books ever published"
But some of this is being done--see the frWP page
for Moby Dick:
http://fr.wikipedia.org/wiki/Moby_Dick (though it omits a number of the translations
listed in the French Union
Catalog, http://corail.sudoc.abes.fr/xslt/DB=2.1/CMD?ACT=SRCHA&IKT=8063&SRT=R...] I would however not warrant without seeing the
items in hand, or
reading an authoritative review, that they are all
complete
translations. The English page on the novel lists no
translations; perhaps we could
in practice assume that the interwiki links are
sufficient. Perhaps
that could be assumed in Wiksource also?
That's another possible benefit: automatic list of works/editions/translations in a Wikipedia article.
You could add {{OpenLibrary|author=Jules
Verne|lang=English}} and you
have a list of English translations of Jules Verne's
works directly
imported from their database. The problem is that,
right now, Wikimedia
projects have often more accurate and more detailed
information than
OpenLibrary.
David Goodman, Ph.D, M.L.S. http://en.wikipedia.org/wiki/User_talk:DGG
Regards,
Yann
http://www.non-violence.org/ | Site collaboratif sur la
non-violence
http://www.forget-me.net/ | Alternatives sur le Net http://fr.wikisource.org/ | Bibliothèque libre http://wikilivres.info | Documents libres
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On Fri, Sep 4, 2009 at 6:58 AM, Birgitte SBbirgitte_sb@yahoo.com wrote:
You two seem to be talking past each other. Might I suggest that perhaps the quality of information on OPL and/or Wikipdia/Wikisource sites is rather different depending on whether you are reading in French or English? I don't know if this is the case but it could explain the discrepancies between your experiences.
That could be it. We cant hide the fact that the French Wikisource is leaps and bounds ahead of English Wikisource. ;-)
I also suspect that David is heavily biased due to his predominately English Wikipedia experience.
The underlying problem is that OL is approaching this from a traditional library perspective, and so is opening up slowly, and progress is slow and methodical. Wikisource is approaching the same goal with openness as a core philosophy, and progress is rapidly increasing.
To some, it seems that OL will reach the holy grail first, however they have seeded their database with lots of junk records, and they don't have digital items for these. The reality is that there is a lot of bibliographic entries which are wrong, and this data is usually fixed once the object represented has be reviewed. Without digital objects, there is no way for the world to know which are duplicates and which are slightly different editions which should have different records. Even if someone out in the real world knows that there was only one edition in a given year, there is no mechanism for the "community" to merge records. Without digitial objects, OL is _directory_ of works held in other locations; but it is not a library.
OTOH, Wikisource only has records for items that it has the full text for, which means it rarely has duplicates, and is much more like a "library" because people can actually read the text. And of course it has already has figured out a lot of the community process problems.
I dont think Wikisource is on a logarithmic growth yet overall, however there are spurts of logarithmic growth like you can see on the Hebrew Wikisource.
http://stats.wikimedia.org/wikisource/EN/PlotsPngArticlesTotal.htm
Keep in mind that the stats for Wikisource domains need to be _combined_, as French works are on the French WS, and English works are on the English WS. The total growth is the sum of all of the projects - this isnt like Wikipedia where each project is intending to have the same content in different languages.
-- John Vandenberg
John Vandenberg wrote:
The underlying problem is that OL is approaching this from a traditional library perspective, and so is opening up slowly, and progress is slow and methodical.
But they are not. They are starting from the Internet Archive (Brewster Kahle) perspective. "Real" archivists and librarians have complained that the Internet Archive is not enough of an archive, and OpenLibrary is not enough of a library. This is of course very similar to people complaining that Wikipedia is not enough of an encyclopedia. Both OpenLibrary and Wikipedia are primarily Internet projects. Perhaps the most interesting criticism of OpenLibrary was launched by Tim Spalding, founder of LibraryThing.com (another Internet project, but a commercial one, albeit with some volunteer vibes). He meant (my interpretation) that OpenLibrary asks a lot from libraries (a copy of their catalog database) but doesn't give much back, and giving something back would help OpenLibrary to win more allies among libraries, http://mail.archive.org/pipermail/ol-discuss/2009-August/000638.html
The first website to appear on the domain www.openlibrary.org was an online viewer for books scanned by/for the Internet Archive, so if "being able to read" is a requirement for a library, then it did have that function from the start. Later another website appeared on demo.openlibrary.org, containing catalog records. The demo website is what you now find as openlibrary.org. It is as if the online viewer and the bibliographic database are two different projects, and the Internet Archive put the new project under the old domain. But the online viewer is still there, for the books that have been digitized.
To some, it seems that OL will reach the holy grail first,
The OpenLibrary has a head start. Any project started now will have to spend much time to catch up. Any good ideas that might go into a new project, could be used in the existing Openlibrary.
For example, a new project might download the database dump from OpenLibrary and start to weed out the "junk records". But that junk sorting could also take place inside OpenLibrary. Why not?
If a new project goes to a library to ask for a copy of their catalog, they might get the question "we already gave (or didn't give) that to OpenLibrary, so how is your project any different?" And what should the new project answer to that?
I want to encourage wikipedians and wikisourcerers to join the OpenLibrary project, just like you should also join OpenStreetMap and other good projects for free knowledge and information. Bring your experience. If you get tired of one project, as I do sometimes, work on another one for a while.
OpenLibrary has author pages for 6.5 million author names. Some of these are "junk" duplicates that should be merged, but still there are quite a large number of authors there. These have a field for a Wikipedia URL, but only 1100 records have a value. Connecting author pages in OpenLibrary to Wikipedia biographies is just one way where we can do a lot, without needing to start a new project.
On Fri, Sep 4, 2009 at 7:21 PM, Lars Aronssonlars@aronsson.se wrote:
... For example, a new project might download the database dump from OpenLibrary and start to weed out the "junk records". But that junk sorting could also take place inside OpenLibrary. Why not?
Because metadata without digital objects are next to useless. Worldcat already provides a directory of where physical books are held.
A database of metadata with lots of duplicates and no means for the reader to fix them, and discuss them, is disrespectful.
If a new project goes to a library to ask for a copy of their catalog, they might get the question "we already gave (or didn't give) that to OpenLibrary, so how is your project any different?" And what should the new project answer to that?
See above. I dont see any value in going back to the libraries. Doing that would only end up with the same result that OpenLibrary has; it would be simpler to take the OpenLibrary dump.
I want to encourage wikipedians and wikisourcerers to join the OpenLibrary project, just like you should also join OpenStreetMap and other good projects for free knowledge and information. Bring your experience. If you get tired of one project, as I do sometimes, work on another one for a while.
Tell me _one_ thing that I can do at OpenLibrary that I can not do at Wikisource.
OpenLibrary has author pages for 6.5 million author names. Some of these are "junk" duplicates that should be merged, but still there are quite a large number of authors there. These have a field for a Wikipedia URL, but only 1100 records have a value. Connecting author pages in OpenLibrary to Wikipedia biographies is just one way where we can do a lot, without needing to start a new project.
_Most_ of them are duplicates.
http://openlibrary.org/search?q=Jules+Gabriel+Verne
I have an account at OpenLibrary, and I am responsible for 0.2% of the Wikipedia links :P
I am not keen on becoming attached to a project that is littered with so much crap, especially when I am not given the tools required to fix the crap, nor do I have any say in whether more crap can be imported.
http://openlibrary.org/user/jayvdb
These two need to be merged.
http://openlibrary.org/a/OL2296708A/Charles-C.-Nott http://openlibrary.org/a/OL2544127A/Charles-Cooper-Nott
Both of them look terrible, because I have no control over the presentation of the pages. Dups, lack of sorting, etc.
I haven't found the OpenLibrary coolaid; I'll stick with Wikisource, for good or ill.
-- John Vandenberg
John Vandenberg wrote:
I haven't found the OpenLibrary coolaid; I'll stick with Wikisource, for good or ill.
If that makes you happy, that's good for you. But now we were talking about the need for a project (either OpenLibrary or a new project) to list all the books ever published. When and how will Wikisource contain that? After every book has been scanned?
wikisource-l@lists.wikimedia.org