There are reasons for editing and there are also reasons for not editing.

One big reason *for* editing is if at any time data from new sources are being imported.

We in Librarianship/Information Sciences makes decisions on how the data will be available to our users/customers. Eg an author name. There are many ways to write the same name from the same individual. The same individual can adopt in his life dozens of nicknames, change their last name if gets married and so on. The rule choosen in a particular library can be the same in more libraries, or even an entire different one (based on how the local community of users from a library will search/wants the data), or even no rule is choosen and the data is recorded "as is" it is registered in the publication. Some libraries have additional records specially devoted to the synonymies for the same name, some not.

Google Book Search simply imported data from many libraries without making any attempt to standardize then, resulting in the large amount of duplicates and bullshits founds in some searches (specially those whose the imprints didn't standardizet themselves the data).

Some special kinds of data from the same work can also be stored in differents sets of "fields" and "subfields" of MARC21 records across different libraries, again because the user/client need of informations about the works can vary from place to place (ie you get data duplication in the same record if you simple merge records from libraries).

MARC21 specification have also an entire design that IMHO is impossible to reflect in the current MediaWiki schema, even with Semanctic MediaWiki.

And sometimes some libraries tells that their data is stored on MARC21 fields, but are on USMARC ones (yep, there are many flavours of MARC as there are many flavors of Ubuntu). Or it is *based* on MARC21 fields, with dozens of local adaptions.

I've just finished an internship in a library with 45k of records that was migrating data from REPIDISCA *based* fields (let's call it as a FreeBSD flavour) to MARC21 *based* fields (in this comparision, an Ubuntu flavour; and yep, *based*, with local adaptions, we needs those changes). The data is migrated in an automated fashion, but still needs to be validated record by record if the library wants those records in the MARC21 fields as it's.

What I'm saying is:

1) You can't simply import data from many sources without validations expecting a good quality end product. You will get a "search engine" quality data (tons of random informations that will make sense only with a continuously developed set of algorithms maybe more time+resources consuming than standartizing the data);

2) Data standardize is an epic work dozens of times more epic than writing an comprehensive encyclopedia about all subjects on all languages. Institutional support will be needed, and in more compreensive ways embracing more than just releasing their data to play around it (ie, with additional hands for standardization).

[[Paul Otlet]] (1868-1944) tried it in efforts that some argues he's the concept designer of Internet and hypertext. With no success, what is very unfortunate. Will the wikimedians gets any level of success on it?

[[:m:User:555]]

On Fri, Dec 6, 2013 at 9:59 PM, Denny Vrandečić <vrandecic@gmail.com> wrote:

Thanks for reviving this thread, Luiz. I also wanted to ask whether we should be updating parts of DNB and similar data. Maybe not create new entries, but for those that we already have, add some of the available data and point to the DNB dataset?

On Fri, Dec 6, 2013 at 3:24 PM, Luiz Augusto <lugusto@gmail.com> wrote:

Just found this thread while browsing my email archives (I'm/was inactive on Wikimedia for at least 2 years)

IMHO will be very helpfull if a central place hosting metadata from digitized works will be created.

In my past experience, I've found lots of PD-old books from languages like french, spanish and english in repositories from Brazil and Portugal, with UI mostly in portuguese (ie, with very low probabilities to get found by volunteers from subdomains from those languages), for example.

I particularly loves validating metadata more than proofreading books. Perhaps a tool/place like this makes new ways to contribute to Wikisource and helps on user retention (based on some wikipedians that gets fun making good articles but loves also sometimes to simply make trivial changes on their spare time)?

I know that the thread was focused on general metadata from all kinds and ages of books, but I had this idea while reading this

[[:m:User:555]]

On Mon, Aug 26, 2013 at 10:42 AM, Thomas Douillard <thomas.douillard@gmail.com> wrote:

I know, I started a discussion about porting the bot to WIkidata in scientific Journal Wikiproject. One answer I got : the bot owner had other things to do in his life than running the bot and was not around very often any more. Having everiyhing in Wikidata already will be a lot more reliable and lazier, no tool that works one day but not the other one, no effort to tell the newbies that they should go to another website, no significant problem.

Maybe one opposition would be that the data would be vandalised easily, but maybe we should find a way to deal with imported sourced datas which have no real reason to be modified, just marked deprecated or updated by another import from the same source.

2013/8/26 David Cuenca <dacuetu@gmail.com>

If the problem is to automate bibliographic data importing, a solution is what you propose, to import everything. Another one is to have an import tool to automatically import the data for the item that needs it. In WP they do that, there is a tool to import book/journal info by ISBN/doi. The same can be done in WD.

Micru

On Mon, Aug 26, 2013 at 9:23 AM, Thomas Douillard <thomas.douillard@gmail.com> wrote:

If Wikidata has an ambition to be a really reliable database, we should do eveything we can to make it easy for users to use any source they want. In this perspective, if we got datas with guaranted high quality, it make it easy for Wikidatian to find and use these references for users. Entering a reference in the database seems to me a highly fastidious, boring, and easily automated task.

With that in mind, any reference that the user will not have to enter by hand is something good, and import high quality sources datas should pass every Wikidata community barriers easily. If there is no problem for the software to handle that many information, I say we really have no reason not to do the imports.

Tom

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

--
Etiamsi omnes, ego non

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l