Hi,
Maybe this is interesting as an import source for bibliographic info http://blogs.ifla.org/bibliography/2013/08/06/german-national-library-offers...
Cheers, Micru
This seems very interesting: maybe, there will be a time when Wikidata (or similar) will host the bibliographic records of thousands of libraries...
Right now, I'm not sure if we want to discuss a massive upload of these records in WD, because: * they are in MARC, which is way more complex than the list of bibliographic properties currently in WD * they will be in German * do we really need so many records of books, articles and work, at this early stage?
Aubrey
On Mon, Aug 26, 2013 at 2:32 PM, David Cuenca dacuetu@gmail.com wrote:
Hi,
Maybe this is interesting as an import source for bibliographic info
http://blogs.ifla.org/bibliography/2013/08/06/german-national-library-offers...
Cheers, Micru
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
No, we don't need to import them all, but there was always the question if we were allowed to import that data from external sources. At least for the DNB that question has been settled.
Cheers, Micru
On Mon, Aug 26, 2013 at 8:37 AM, Andrea Zanni zanni.andrea84@gmail.comwrote:
This seems very interesting: maybe, there will be a time when Wikidata (or similar) will host the bibliographic records of thousands of libraries...
Right now, I'm not sure if we want to discuss a massive upload of these records in WD, because:
- they are in MARC, which is way more complex than the list of
bibliographic properties currently in WD
- they will be in German
- do we really need so many records of books, articles and work, at this
early stage?
Aubrey
On Mon, Aug 26, 2013 at 2:32 PM, David Cuenca dacuetu@gmail.com wrote:
Hi,
Maybe this is interesting as an import source for bibliographic info
http://blogs.ifla.org/bibliography/2013/08/06/german-national-library-offers...
Cheers, Micru
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
There are many such CC0 national bibliographies available, and other large datasets, if the decision is made to import them.
See here for a list:
http://datahub.io/group/bibliographic
Here is a direct link to the British National Bibliography page:
For further information on the open bibliographic project and the principles of open bibliographic metadata, check here:
http://openbiblio.net http://openbiblio.net/principles
Mark MacGillivray
On Mon, Aug 26, 2013 at 1:51 PM, David Cuenca dacuetu@gmail.com wrote:
No, we don't need to import them all, but there was always the question if we were allowed to import that data from external sources. At least for the DNB that question has been settled.
Cheers, Micru
On Mon, Aug 26, 2013 at 8:37 AM, Andrea Zanni zanni.andrea84@gmail.comwrote:
This seems very interesting: maybe, there will be a time when Wikidata (or similar) will host the bibliographic records of thousands of libraries...
Right now, I'm not sure if we want to discuss a massive upload of these records in WD, because:
- they are in MARC, which is way more complex than the list of
bibliographic properties currently in WD
- they will be in German
- do we really need so many records of books, articles and work, at this
early stage?
Aubrey
On Mon, Aug 26, 2013 at 2:32 PM, David Cuenca dacuetu@gmail.com wrote:
Hi,
Maybe this is interesting as an import source for bibliographic info
http://blogs.ifla.org/bibliography/2013/08/06/german-national-library-offers...
Cheers, Micru
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
-- Etiamsi omnes, ego non
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
If Wikidata has an ambition to be a really reliable database, we should do eveything we can to make it easy for users to use any source they want. In this perspective, if we got datas with guaranted high quality, it make it easy for Wikidatian to find and use these references for users. Entering a reference in the database seems to me a highly fastidious, boring, and easily automated task.
With that in mind, any reference that the user will not have to enter by hand is something good, and import high quality sources datas should pass every Wikidata community barriers easily. If there is no problem for the software to handle that many information, I say we really have no reason not to do the imports.
Tom
If the problem is to automate bibliographic data importing, a solution is what you propose, to import everything. Another one is to have an import tool to automatically import the data for the item that needs it. In WP they do that, there is a tool to import book/journal info by ISBN/doi. The same can be done in WD.
Micru
On Mon, Aug 26, 2013 at 9:23 AM, Thomas Douillard < thomas.douillard@gmail.com> wrote:
If Wikidata has an ambition to be a really reliable database, we should do eveything we can to make it easy for users to use any source they want. In this perspective, if we got datas with guaranted high quality, it make it easy for Wikidatian to find and use these references for users. Entering a reference in the database seems to me a highly fastidious, boring, and easily automated task.
With that in mind, any reference that the user will not have to enter by hand is something good, and import high quality sources datas should pass every Wikidata community barriers easily. If there is no problem for the software to handle that many information, I say we really have no reason not to do the imports.
Tom
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
I know, I started a discussion about porting the bot to WIkidata in scientific Journal Wikiproject. One answer I got : the bot owner had other things to do in his life than running the bot and was not around very often any more. Having everiyhing in Wikidata already will be a lot more reliable and lazier, no tool that works one day but not the other one, no effort to tell the newbies that they should go to another website, no significant problem.
Maybe one opposition would be that the data would be vandalised easily, but maybe we should find a way to deal with imported sourced datas which have no real reason to be modified, just marked deprecated or updated by another import from the same source.
2013/8/26 David Cuenca dacuetu@gmail.com
If the problem is to automate bibliographic data importing, a solution is what you propose, to import everything. Another one is to have an import tool to automatically import the data for the item that needs it. In WP they do that, there is a tool to import book/journal info by ISBN/doi. The same can be done in WD.
Micru
On Mon, Aug 26, 2013 at 9:23 AM, Thomas Douillard < thomas.douillard@gmail.com> wrote:
If Wikidata has an ambition to be a really reliable database, we should do eveything we can to make it easy for users to use any source they want. In this perspective, if we got datas with guaranted high quality, it make it easy for Wikidatian to find and use these references for users. Entering a reference in the database seems to me a highly fastidious, boring, and easily automated task.
With that in mind, any reference that the user will not have to enter by hand is something good, and import high quality sources datas should pass every Wikidata community barriers easily. If there is no problem for the software to handle that many information, I say we really have no reason not to do the imports.
Tom
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Etiamsi omnes, ego non
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Just found this thread while browsing my email archives (I'm/was inactive on Wikimedia for at least 2 years)
IMHO will be very helpfull if a central place hosting metadata from digitized works will be created.
In my past experience, I've found lots of PD-old books from languages like french, spanish and english in repositories from Brazil and Portugal, with UI mostly in portuguese (ie, with very low probabilities to get found by volunteers from subdomains from those languages), for example.
I particularly loves validating metadata more than proofreading books. Perhaps a tool/place like this makes new ways to contribute to Wikisource and helps on user retention (based on some wikipedians that gets fun making good articles but loves also sometimes to simply make trivial changes on their spare time)?
I know that the thread was focused on general metadata from all kinds and ages of books, but I had this idea while reading this
[[:m:User:555]]
On Mon, Aug 26, 2013 at 10:42 AM, Thomas Douillard < thomas.douillard@gmail.com> wrote:
I know, I started a discussion about porting the bot to WIkidata in scientific Journal Wikiproject. One answer I got : the bot owner had other things to do in his life than running the bot and was not around very often any more. Having everiyhing in Wikidata already will be a lot more reliable and lazier, no tool that works one day but not the other one, no effort to tell the newbies that they should go to another website, no significant problem.
Maybe one opposition would be that the data would be vandalised easily, but maybe we should find a way to deal with imported sourced datas which have no real reason to be modified, just marked deprecated or updated by another import from the same source.
2013/8/26 David Cuenca dacuetu@gmail.com
If the problem is to automate bibliographic data importing, a solution is what you propose, to import everything. Another one is to have an import tool to automatically import the data for the item that needs it. In WP they do that, there is a tool to import book/journal info by ISBN/doi. The same can be done in WD.
Micru
On Mon, Aug 26, 2013 at 9:23 AM, Thomas Douillard < thomas.douillard@gmail.com> wrote:
If Wikidata has an ambition to be a really reliable database, we should do eveything we can to make it easy for users to use any source they want. In this perspective, if we got datas with guaranted high quality, it make it easy for Wikidatian to find and use these references for users. Entering a reference in the database seems to me a highly fastidious, boring, and easily automated task.
With that in mind, any reference that the user will not have to enter by hand is something good, and import high quality sources datas should pass every Wikidata community barriers easily. If there is no problem for the software to handle that many information, I say we really have no reason not to do the imports.
Tom
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Etiamsi omnes, ego non
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Thanks for reviving this thread, Luiz. I also wanted to ask whether we should be updating parts of DNB and similar data. Maybe not create new entries, but for those that we already have, add some of the available data and point to the DNB dataset?
On Fri, Dec 6, 2013 at 3:24 PM, Luiz Augusto lugusto@gmail.com wrote:
Just found this thread while browsing my email archives (I'm/was inactive on Wikimedia for at least 2 years)
IMHO will be very helpfull if a central place hosting metadata from digitized works will be created.
In my past experience, I've found lots of PD-old books from languages like french, spanish and english in repositories from Brazil and Portugal, with UI mostly in portuguese (ie, with very low probabilities to get found by volunteers from subdomains from those languages), for example.
I particularly loves validating metadata more than proofreading books. Perhaps a tool/place like this makes new ways to contribute to Wikisource and helps on user retention (based on some wikipedians that gets fun making good articles but loves also sometimes to simply make trivial changes on their spare time)?
I know that the thread was focused on general metadata from all kinds and ages of books, but I had this idea while reading this
[[:m:User:555]]
On Mon, Aug 26, 2013 at 10:42 AM, Thomas Douillard < thomas.douillard@gmail.com> wrote:
I know, I started a discussion about porting the bot to WIkidata in scientific Journal Wikiproject. One answer I got : the bot owner had other things to do in his life than running the bot and was not around very often any more. Having everiyhing in Wikidata already will be a lot more reliable and lazier, no tool that works one day but not the other one, no effort to tell the newbies that they should go to another website, no significant problem.
Maybe one opposition would be that the data would be vandalised easily, but maybe we should find a way to deal with imported sourced datas which have no real reason to be modified, just marked deprecated or updated by another import from the same source.
2013/8/26 David Cuenca dacuetu@gmail.com
If the problem is to automate bibliographic data importing, a solution is what you propose, to import everything. Another one is to have an import tool to automatically import the data for the item that needs it. In WP they do that, there is a tool to import book/journal info by ISBN/doi. The same can be done in WD.
Micru
On Mon, Aug 26, 2013 at 9:23 AM, Thomas Douillard < thomas.douillard@gmail.com> wrote:
If Wikidata has an ambition to be a really reliable database, we should do eveything we can to make it easy for users to use any source they want. In this perspective, if we got datas with guaranted high quality, it make it easy for Wikidatian to find and use these references for users. Entering a reference in the database seems to me a highly fastidious, boring, and easily automated task.
With that in mind, any reference that the user will not have to enter by hand is something good, and import high quality sources datas should pass every Wikidata community barriers easily. If there is no problem for the software to handle that many information, I say we really have no reason not to do the imports.
Tom
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Etiamsi omnes, ego non
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
There are reasons for editing and there are also reasons for not editing.
One big reason *for* editing is if at any time data from new sources are being imported.
We in Librarianship/Information Sciences makes decisions on how the data will be available to our users/customers. Eg an author name. There are many ways to write the same name from the same individual. The same individual can adopt in his life dozens of nicknames, change their last name if gets married and so on. The rule choosen in a particular library can be the same in more libraries, or even an entire different one (based on how the local community of users from a library will search/wants the data), or even no rule is choosen and the data is recorded "as is" it is registered in the publication. Some libraries have additional records specially devoted to the synonymies for the same name, some not.
Google Book Search simply imported data from many libraries without making any attempt to standardize then, resulting in the large amount of duplicates and bullshits founds in some searches (specially those whose the imprints didn't standardizet themselves the data).
Some special kinds of data from the same work can also be stored in differents sets of "fields" and "subfields" of MARC21 records across different libraries, again because the user/client need of informations about the works can vary from place to place (ie you get data duplication in the same record if you simple merge records from libraries).
MARC21 specification have also an entire design that IMHO is impossible to reflect in the current MediaWiki schema, even with Semanctic MediaWiki.
And sometimes some libraries tells that their data is stored on MARC21 fields, but are on USMARC ones (yep, there are many flavours of MARC as there are many flavors of Ubuntu). Or it is *based* on MARC21 fields, with dozens of local adaptions.
I've just finished an internship in a library with 45k of records that was migrating data from REPIDISCA *based* fields (let's call it as a FreeBSD flavour) to MARC21 *based* fields (in this comparision, an Ubuntu flavour; and yep, *based*, with local adaptions, we needs those changes). The data is migrated in an automated fashion, but still needs to be validated record by record if the library wants those records in the MARC21 fields as it's.
What I'm saying is:
1) You can't simply import data from many sources without validations expecting a good quality end product. You will get a "search engine" quality data (tons of random informations that will make sense only with a continuously developed set of algorithms maybe more time+resources consuming than standartizing the data);
2) Data standardize is an epic work dozens of times more epic than writing an comprehensive encyclopedia about all subjects on all languages. Institutional support will be needed, and in more compreensive ways embracing more than just releasing their data to play around it (ie, with additional hands for standardization).
[[Paul Otlet]] (1868-1944) tried it in efforts that some argues he's the concept designer of Internet and hypertext. With no success, what is very unfortunate. Will the wikimedians gets any level of success on it?
[[:m:User:555]]
On Fri, Dec 6, 2013 at 9:59 PM, Denny Vrandečić vrandecic@gmail.com wrote:
Thanks for reviving this thread, Luiz. I also wanted to ask whether we should be updating parts of DNB and similar data. Maybe not create new entries, but for those that we already have, add some of the available data and point to the DNB dataset?
On Fri, Dec 6, 2013 at 3:24 PM, Luiz Augusto lugusto@gmail.com wrote:
Just found this thread while browsing my email archives (I'm/was inactive on Wikimedia for at least 2 years)
IMHO will be very helpfull if a central place hosting metadata from digitized works will be created.
In my past experience, I've found lots of PD-old books from languages like french, spanish and english in repositories from Brazil and Portugal, with UI mostly in portuguese (ie, with very low probabilities to get found by volunteers from subdomains from those languages), for example.
I particularly loves validating metadata more than proofreading books. Perhaps a tool/place like this makes new ways to contribute to Wikisource and helps on user retention (based on some wikipedians that gets fun making good articles but loves also sometimes to simply make trivial changes on their spare time)?
I know that the thread was focused on general metadata from all kinds and ages of books, but I had this idea while reading this
[[:m:User:555]]
On Mon, Aug 26, 2013 at 10:42 AM, Thomas Douillard < thomas.douillard@gmail.com> wrote:
I know, I started a discussion about porting the bot to WIkidata in scientific Journal Wikiproject. One answer I got : the bot owner had other things to do in his life than running the bot and was not around very often any more. Having everiyhing in Wikidata already will be a lot more reliable and lazier, no tool that works one day but not the other one, no effort to tell the newbies that they should go to another website, no significant problem.
Maybe one opposition would be that the data would be vandalised easily, but maybe we should find a way to deal with imported sourced datas which have no real reason to be modified, just marked deprecated or updated by another import from the same source.
2013/8/26 David Cuenca dacuetu@gmail.com
If the problem is to automate bibliographic data importing, a solution is what you propose, to import everything. Another one is to have an import tool to automatically import the data for the item that needs it. In WP they do that, there is a tool to import book/journal info by ISBN/doi. The same can be done in WD.
Micru
On Mon, Aug 26, 2013 at 9:23 AM, Thomas Douillard < thomas.douillard@gmail.com> wrote:
If Wikidata has an ambition to be a really reliable database, we should do eveything we can to make it easy for users to use any source they want. In this perspective, if we got datas with guaranted high quality, it make it easy for Wikidatian to find and use these references for users. Entering a reference in the database seems to me a highly fastidious, boring, and easily automated task.
With that in mind, any reference that the user will not have to enter by hand is something good, and import high quality sources datas should pass every Wikidata community barriers easily. If there is no problem for the software to handle that many information, I say we really have no reason not to do the imports.
Tom
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Etiamsi omnes, ego non
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Denny Vrandečić, 07/12/2013 00:59:
Thanks for reviving this thread, Luiz. I also wanted to ask whether we should be updating parts of DNB and similar data. Maybe not create new entries, but for those that we already have, add some of the available data and point to the DNB dataset?
Or maybe use openlibrary.org as a staging area for such data and fetch it from there? I'm not sure Wikidata should "compete" with openlibrary, it's a huge work and they already have an infrastructure for it; Wikidata/Wikimedia could "just" let the users easily import the data when it's needed. An obvious example is pre-filling of book/work metadata on Wikipedia articles, Wikisource books, Commons files (and associated Wikidata entries).
Nemo
2013/12/7 Federico Leva (Nemo) nemowiki@gmail.com
Or maybe use openlibrary.org as a staging area for such data and fetch it from there? I'm not sure Wikidata should "compete" with openlibrary, it's a huge work and they already have an infrastructure for it; Wikidata/Wikimedia could "just" let the users easily import the data when it's needed. An obvious example is pre-filling of book/work metadata on Wikipedia articles, Wikisource books, Commons files (and associated Wikidata entries).
Nemo
It's a good idea to search cooperation with projects which shares the same goals as Mediawikis projects, and we probably would gain a lot to seek for integration with them. other p What is one of the big driving force of Wikidata is the community, which does a terrific wotk into cleaning the data and by interlinking identifiers. One if the big future work on Wikidata is about bibliographical datas, as Wikipedias and other projects are a huge consumer of such datas and other Wikimedia projects, including Wikidata itself which is not the smallest one.
That's why I think we must do a lot more with such datas than just importing them from openlibrary, as they are really important to Mediawiki in general, and that the community as a whole is a powerful drinving force for Bibliographical datas. I'm not against cooperating with openlibrary, but we should seek deep cooperation and integration with them so both projects can benefits from each others community.
On Sat, Dec 7, 2013 at 12:47 PM, Thomas Douillard < thomas.douillard@gmail.com> wrote:
That's why I think we must do a lot more with such datas than just importing them from openlibrary, as they are really important to Mediawiki in general, and that the community as a whole is a powerful drinving force for Bibliographical datas. I'm not against cooperating with openlibrary, but we should seek deep cooperation and integration with them so both projects can benefits from each others community.
+1 on this
openlibrary.org have a limited set of fields.
Moreover, simply importing data at some random time of some random records will not benefit neither openlibrary neither Wikimedia.
You will first need to search if Wikidata don't have the needed information, search again for it in openlibrary, create the content in openlibrary, import the content into Wikidata, make the desired local changes and send back to openlibrary any local relevant changes.
But I had an idea: a MediaWiki User Interface to openlibrary data
openlibrary.org offers access to records in 3 ways:
* read/write of individual records through API; * read of individual records through RDF and JSON; * bulk download of the entire dataset
So i'ts possible to:
1) Import the bulk data; 2) Catch all changes from openlibrary.org in real time; 3) Allows that the synced data can be browsable and editable at any time on MediaWiki/Wikidata instances; 4) Sends back to openlibrary the changes, storing locally the data from custom fields in the MediaWiki instance (allowing further import at openlibrary instance if they creates the corresponding fields in their DB); 5) Sends back to openlibrary all new book records created on MediaWiki instances.
Due to weird reasons my message was sent only to wikidata-l. Re-sending to wikisource-l. Sorry for any inconvenience
On Sun, Dec 8, 2013 at 2:04 AM, Luiz Augusto lugusto@gmail.com wrote:
On Sat, Dec 7, 2013 at 12:47 PM, Thomas Douillard < thomas.douillard@gmail.com> wrote:
That's why I think we must do a lot more with such datas than just importing them from openlibrary, as they are really important to Mediawiki in general, and that the community as a whole is a powerful drinving force for Bibliographical datas. I'm not against cooperating with openlibrary, but we should seek deep cooperation and integration with them so both projects can benefits from each others community.
+1 on this
openlibrary.org have a limited set of fields.
Moreover, simply importing data at some random time of some random records will not benefit neither openlibrary neither Wikimedia.
You will first need to search if Wikidata don't have the needed information, search again for it in openlibrary, create the content in openlibrary, import the content into Wikidata, make the desired local changes and send back to openlibrary any local relevant changes.
But I had an idea: a MediaWiki User Interface to openlibrary data
openlibrary.org offers access to records in 3 ways:
- read/write of individual records through API;
- read of individual records through RDF and JSON;
- bulk download of the entire dataset
So i'ts possible to:
- Import the bulk data;
- Catch all changes from openlibrary.org in real time;
- Allows that the synced data can be browsable and editable at any time
on MediaWiki/Wikidata instances; 4) Sends back to openlibrary the changes, storing locally the data from custom fields in the MediaWiki instance (allowing further import at openlibrary instance if they creates the corresponding fields in their DB); 5) Sends back to openlibrary all new book records created on MediaWiki instances.
Bibliographical properties on Wikidata are listed here: https://www.wikidata.org/wiki/Wikidata:Books_task_force
In the last months, we tried to creade a metadata scheme to "cover" the main elements of book classification. It is not MARC21, of course, but I think that pretty much simple Dublin Core is covered. At the beginning, I drafted a mapping between different Wikimedia project templates (Wikipedia book Infobox, Commons' template Book, Wikisource's Index metadata form) https://docs.google.com/spreadsheet/ccc?key=0AlPNcNlN2oqvdFQyR2F5YmhrMWpXaUF... It is far from perfect, but it gives an idea of which things could be missing.
I'd love too to collaborate with openlibrary, but at the beginning of our IEG project, me and Micru contacted them, in the person of Karen Coyle (User:Kcoyle), a very famous and skilled metadata librarian who is somehow in charge of the project now. She told us that openlibrary is frozen, at the moment, and there is no staff nor funds to get that going. Openlibrary was previously funded but internet Archive.
If someone could build the tool you proposed, Luiz, that would be awesome, but I'm not a technical person and I'm not able to understnd if that is feasible or not. If we have other feedbacks on that, we could propose it as a projects for the next Google Summer of Code: that is a great way to getting technical things done.
Aubrey
On Sun, Dec 8, 2013 at 5:04 AM, Luiz Augusto lugusto@gmail.com wrote:
On Sat, Dec 7, 2013 at 12:47 PM, Thomas Douillard < thomas.douillard@gmail.com> wrote:
That's why I think we must do a lot more with such datas than just importing them from openlibrary, as they are really important to Mediawiki in general, and that the community as a whole is a powerful drinving force for Bibliographical datas. I'm not against cooperating with openlibrary, but we should seek deep cooperation and integration with them so both projects can benefits from each others community.
+1 on this
openlibrary.org have a limited set of fields.
Moreover, simply importing data at some random time of some random records will not benefit neither openlibrary neither Wikimedia.
You will first need to search if Wikidata don't have the needed information, search again for it in openlibrary, create the content in openlibrary, import the content into Wikidata, make the desired local changes and send back to openlibrary any local relevant changes.
But I had an idea: a MediaWiki User Interface to openlibrary data
openlibrary.org offers access to records in 3 ways:
- read/write of individual records through API;
- read of individual records through RDF and JSON;
- bulk download of the entire dataset
So i'ts possible to:
- Import the bulk data;
- Catch all changes from openlibrary.org in real time;
- Allows that the synced data can be browsable and editable at any time
on MediaWiki/Wikidata instances; 4) Sends back to openlibrary the changes, storing locally the data from custom fields in the MediaWiki instance (allowing further import at openlibrary instance if they creates the corresponding fields in their DB); 5) Sends back to openlibrary all new book records created on MediaWiki instances.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
+ for open library
On Mon, Dec 9, 2013 at 4:01 PM, Andrea Zanni zanni.andrea84@gmail.comwrote:
Bibliographical properties on Wikidata are listed here: https://www.wikidata.org/wiki/Wikidata:Books_task_force
In the last months, we tried to creade a metadata scheme to "cover" the main elements of book classification. It is not MARC21, of course, but I think that pretty much simple Dublin Core is covered. At the beginning, I drafted a mapping between different Wikimedia project templates (Wikipedia book Infobox, Commons' template Book, Wikisource's Index metadata form)
https://docs.google.com/spreadsheet/ccc?key=0AlPNcNlN2oqvdFQyR2F5YmhrMWpXaUF... It is far from perfect, but it gives an idea of which things could be missing.
I'd love too to collaborate with openlibrary, but at the beginning of our IEG project, me and Micru contacted them, in the person of Karen Coyle (User:Kcoyle), a very famous and skilled metadata librarian who is somehow in charge of the project now. She told us that openlibrary is frozen, at the moment, and there is no staff nor funds to get that going. Openlibrary was previously funded but internet Archive.
If someone could build the tool you proposed, Luiz, that would be awesome, but I'm not a technical person and I'm not able to understnd if that is feasible or not. If we have other feedbacks on that, we could propose it as a projects for the next Google Summer of Code: that is a great way to getting technical things done.
Aubrey
On Sun, Dec 8, 2013 at 5:04 AM, Luiz Augusto lugusto@gmail.com wrote:
On Sat, Dec 7, 2013 at 12:47 PM, Thomas Douillard < thomas.douillard@gmail.com> wrote:
That's why I think we must do a lot more with such datas than just importing them from openlibrary, as they are really important to Mediawiki in general, and that the community as a whole is a powerful drinving force for Bibliographical datas. I'm not against cooperating with openlibrary, but we should seek deep cooperation and integration with them so both projects can benefits from each others community.
+1 on this
openlibrary.org have a limited set of fields.
Moreover, simply importing data at some random time of some random records will not benefit neither openlibrary neither Wikimedia.
You will first need to search if Wikidata don't have the needed information, search again for it in openlibrary, create the content in openlibrary, import the content into Wikidata, make the desired local changes and send back to openlibrary any local relevant changes.
But I had an idea: a MediaWiki User Interface to openlibrary data
openlibrary.org offers access to records in 3 ways:
- read/write of individual records through API;
- read of individual records through RDF and JSON;
- bulk download of the entire dataset
So i'ts possible to:
- Import the bulk data;
- Catch all changes from openlibrary.org in real time;
- Allows that the synced data can be browsable and editable at any time
on MediaWiki/Wikidata instances; 4) Sends back to openlibrary the changes, storing locally the data from custom fields in the MediaWiki instance (allowing further import at openlibrary instance if they creates the corresponding fields in their DB); 5) Sends back to openlibrary all new book records created on MediaWiki instances.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
On Dec 9, 2013, at 5:31 AM, Andrea Zanni zanni.andrea84@gmail.com wrote:
I'd love too to collaborate with openlibrary, but at the beginning of our IEG project, me and Micru contacted them, in the person of Karen Coyle (User:Kcoyle), a very famous and skilled metadata librarian who is somehow in charge of the project now. She told us that openlibrary is frozen, at the moment, and there is no staff nor funds to get that going. Openlibrary was previously funded but internet Archive.
If someone could build the tool you proposed, Luiz, that would be awesome, but I'm not a technical person and I'm not able to understnd if that is feasible or not. If we have other feedbacks on that, we could propose it as a projects for the next Google Summer of Code: that is a great way to getting technical things done.
I’ve heard that OpenLibrary is in stasis right now too—but also from Karen :-)
It should be do-able to write a loader that can read in the latest OpenLibrary data dump, and write out data of interest [1] to Wikidata. Ideally the loader should be designed so that it can be re-run, when new OpenLibrary data dumps become available. However, I think it would be important to treat OpenLibrary as a data source, but not as a master. We would need to make sure that data that has changed on Wikidata isn’t stomped on by an OpenLibrary load.
If OpenLibrary gets active again, and they are interested in what is in Wikidata has they can always write an equivalent loader that takes data from Wikidata and loads it into their database.
If this sounds useful I’m willing to help out with the work. I guess it would involve a Wikidata RFC? //Ed
Edward Summers, 09/12/2013 12:18:
If OpenLibrary gets active again, [...]
Definition of active? The fact that there's no software development/investment doesn't mean it's inactive. Are there stats on users activity there and can it be compared in some way to ours as regards that kind of data?
Nemo
I think there is only one (paid) user working on the site right now. That's my definition of inactive :-)
Aubrey
On Mon, Dec 9, 2013 at 12:32 PM, Federico Leva (Nemo) nemowiki@gmail.comwrote:
Edward Summers, 09/12/2013 12:18:
If OpenLibrary gets active again, [...]
Definition of active? The fact that there's no software development/investment doesn't mean it's inactive. Are there stats on users activity there and can it be compared in some way to ours as regards that kind of data?
Nemo
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Andrea Zanni, 09/12/2013 12:40:
I think there is only one (paid) user working on the site right now. That's my definition of inactive :-)
I exhibit a counterexample: https://openlibrary.org/recentchanges shows at least 2 users editing in the last hour. Your lemma is disproven. ;-)
Nemo
On Dec 9, 2013, at 6:32 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Edward Summers, 09/12/2013 12:18: Definition of active? The fact that there's no software development/investment doesn't mean it's inactive. Are there stats on users activity there and can it be compared in some way to ours as regards that kind of data?
That’s a good question. By active I meant a situation where OpenLibrary has the resources (and the interest) to write software to synchronize OpenLibrary with Wikidata. I was suggesting that Wikidata doesn’t need to take on the additional burden of writing Wikidata updates back to OpenLibrary.
The reason why I said the OpenLibrary dump loader should be able to run more than once was the assumption that OpenLibrary is being updated periodically by editors. Their latest dump was generated a week or so ago, which makes OpenLibrary look promising as an ongoing data source.
//Ed
On Mon, Dec 9, 2013 at 12:58 PM, Edward Summers ehs@pobox.com wrote:
That’s a good question. By active I meant a situation where OpenLibrary has the resources (and the interest) to write software to synchronize OpenLibrary with Wikidata. I was suggesting that Wikidata doesn’t need to take on the additional burden of writing Wikidata updates back to OpenLibrary.
The main issue with OL is that its situation as entity is not clear. It is both part of IA and somewhat independent, but not integrated enough in IA nor independent enough to know in which direction is going.
The reason why I said the OpenLibrary dump loader should be able to run more than once was the assumption that OpenLibrary is being updated periodically by editors. Their latest dump was generated a week or so ago, which makes OpenLibrary look promising as an ongoing data source.
In an ideal world OL could run an instance of a Wikibase repo and use Wikidata entities whenever relevant (authors, locations, etc). I don't know if it would be possible for their wikii software to evolve in that direction, nor if there would be enough resources to do so, but if Commons is going to do just that, maybe it is possible for OL too?
Using an OL dump loader seems the easiest for the time being.
Cheers, Micru
David Cuenca, 09/12/2013 13:44:
On Mon, Dec 9, 2013 at 12:58 PM, Edward Summers wrote:
That’s a good question. By active I meant a situation where OpenLibrary has the resources (and the interest) to write software to synchronize OpenLibrary with Wikidata. I was suggesting that Wikidata doesn’t need to take on the additional burden of writing Wikidata updates back to OpenLibrary.
The main issue with OL is that its situation as entity is not clear. It is both part of IA and somewhat independent, but not integrated enough in IA nor independent enough to know in which direction is going.
Sounds like the definition of any non-Wikipedia Wikimedia project. :)
Nemo
This doesn't reflect my understanding of the situation at OpenLibrary.
On Mon, Dec 9, 2013 at 5:31 AM, Andrea Zanni zanni.andrea84@gmail.comwrote:
I'd love too to collaborate with openlibrary, but at the beginning of our IEG project, me and Micru contacted them, in the person of Karen Coyle (User:Kcoyle), a very famous and skilled metadata librarian who is somehow in charge of the project now. She told us that openlibrary is frozen, at the moment, and there is no staff nor funds to get that going. Openlibrary was previously funded but internet Archive.
Karen has worked for OpenLibrary in the past, but I don't know if she currently does and she's certainly not in charge.
OpenLibrary is owned and funded by the Internet Archive (ie Brewster Kahle). It is funded at a much lower level than it has been historically
If someone could build the tool you proposed, Luiz, that would be awesome, but I'm not a technical person and I'm not able to understnd if that is feasible or not.
I'm not sure I agree. There's a lot of good data in OpenLibrary, but there's also a lot of junk. Freebase imported a bunch of OpenLibrary data, after winnowing it to what they thought was the good stuff, and still ended up deleting a bunch of the supposedly "good" stuff later because they found their goodness criteria hadn't been strict enough.
One of the reasons OpenLibrary is such a mess is because *they* arbitrarily imported junky data (e.g. Amazon scraped records). The last thing the world needs is more duplicate copies of random junk. We've already got the DPLA for that. :-)
Another issue with the OpenLibrary metadata is that there's no clear license associated with it. IA's position is that they got it from wherever they got it from and you're own your own if you want to reuse it, which isn't very helpful. The provenance for major chunks of it is traceable and new stuff by users is nominally being contributed under CC0, so they could probably be sorted out with enough effort (although the same thing is true of the data quality issues too).
If Ed Summers (or any other capable programmer) is going to sign up to solve this problem for you guys, I'm happy to help with my knowledge of the state of play, but it's a *very * sizeable project.
Tom
On Mon, Dec 9, 2013 at 12:18 PM, Tom Morris tfmorris@gmail.com wrote:
I'm not sure I agree. There's a lot of good data in OpenLibrary, but there's also a lot of junk. Freebase imported a bunch of OpenLibrary data, after winnowing it to what they thought was the good stuff, and still ended up deleting a bunch of the supposedly "good" stuff later because they found their goodness criteria hadn't been strict enough.
One of the reasons OpenLibrary is such a mess is because *they* arbitrarily imported junky data (e.g. Amazon scraped records). The last thing the world needs is more duplicate copies of random junk. We've already got the DPLA for that. :-)
Another issue with the OpenLibrary metadata is that there's no clear license associated with it. IA's position is that they got it from wherever they got it from and you're own your own if you want to reuse it, which isn't very helpful. The provenance for major chunks of it is traceable and new stuff by users is nominally being contributed under CC0, so they could probably be sorted out with enough effort (although the same thing is true of the data quality issues too).
Gosh, I withdraw my support for fully reusage of Open Library data.
That was probably the best efforts they can do in past years, before the mass disponibilization of data dumps directly from well known libraries catalogs, but now we are in a very different scenario.
Even a simple mass import from the already mentioned datahub [1] in the openlibrary engine (open source software) without further editing will generate best quality data.
Am 07.12.2013 15:34, schrieb Federico Leva (Nemo):
Or maybe use openlibrary.org as a staging area for such data and fetch it from there? I'm not sure Wikidata should "compete" with openlibrary, it's a huge work and they already have an infrastructure for it; Wikidata/Wikimedia could "just" let the users easily import the data when it's needed. An obvious example is pre-filling of book/work metadata on Wikipedia articles, Wikisource books, Commons files (and associated Wikidata entries).
+1
-- daniel