Re: [Wikisource-l] [Wikidata-l] DNB 11M bibliographic records as CC0 - Wikisource-l

6 Dec 2013

There are reasons for editing and there are also reasons for not editing.

One big reason *for* editing is if at any time data from new sources are
being imported.

We in Librarianship/Information Sciences makes decisions on how the data
will be available to our users/customers. Eg an author name. There are many
ways to write the same name from the same individual. The same individual
can adopt in his life dozens of nicknames, change their last name if gets
married and so on. The rule choosen in a particular library can be the same
in more libraries, or even an entire different one (based on how the local
community of users from a library will search/wants the data), or even no
rule is choosen and the data is recorded "as is" it is registered in the
publication. Some libraries have additional records specially devoted to
the synonymies for the same name, some not.

Google Book Search simply imported data from many libraries without making
any attempt to standardize then, resulting in the large amount of
duplicates and bullshits founds in some searches (specially those whose the
imprints didn't standardizet themselves the data).

Some special kinds of data from the same work can also be stored in
differents sets of "fields" and "subfields" of MARC21 records across
different libraries, again because the user/client need of informations
about the works can vary from place to place (ie you get data duplication
in the same record if you simple merge records from libraries).

MARC21 specification have also an entire design that IMHO is impossible to
reflect in the current MediaWiki schema, even with Semanctic MediaWiki.

And sometimes some libraries tells that their data is stored on MARC21
fields, but are on USMARC ones (yep, there are many flavours of MARC as
there are many flavors of Ubuntu). Or it is *based* on MARC21 fields, with
dozens of local adaptions.

I've just finished an internship in a library with 45k of records that was
migrating data from REPIDISCA *based* fields (let's call it as a FreeBSD
flavour) to MARC21 *based* fields (in this comparision, an Ubuntu flavour;
and yep, *based*, with local adaptions, we needs those changes). The data
is migrated in an automated fashion, but still needs to be validated record
by record if the library wants those records in the MARC21 fields as it's.

What I'm saying is:

1) You can't simply import data from many sources without validations
expecting a good quality end product. You will get a "search engine"
quality data (tons of random informations that will make sense only with a
continuously developed set of algorithms maybe more time+resources
consuming than standartizing the data);

2) Data standardize is an epic work dozens of times more epic than writing
an comprehensive encyclopedia about all subjects on all languages.
Institutional support will be needed, and in more compreensive ways
embracing more than just releasing their data to play around it (ie, with
additional hands for standardization).

[[Paul Otlet]] (1868-1944) tried it in efforts that some argues he's the
concept designer of Internet and hypertext. With no success, what is very
unfortunate. Will the wikimedians gets any level of success on it?

[[:m:User:555]]

On Fri, Dec 6, 2013 at 9:59 PM, Denny Vrandečić &lt;vrandecic(a)gmail.com&gt; wrote:

...
  Thanks for reviving this thread, Luiz. I also wanted
to ask whether we
 should be updating parts of DNB and similar data. Maybe not create new
 entries, but for those that we already have, add some of the available data
 and point to the DNB dataset?

 On Fri, Dec 6, 2013 at 3:24 PM, Luiz Augusto &lt;lugusto(a)gmail.com&gt; wrote:

  Just found this thread while browsing my email
archives (I'm/was inactive
 on Wikimedia for at least 2 years)

 IMHO will be very helpfull if a central place hosting metadata from
 digitized works will be created.

 In my past experience, I've found lots of PD-old books from languages
 like french, spanish and english in repositories from Brazil and Portugal,
 with UI mostly in portuguese (ie, with very low probabilities to get found
 by volunteers from subdomains from those languages), for example.

 I particularly loves validating metadata more than proofreading books.
 Perhaps a tool/place like this makes new ways to contribute to Wikisource
 and helps on user retention (based on some wikipedians that gets fun making
 good articles but loves also sometimes to simply make trivial changes on
 their spare time)?

 I know that the thread was focused on general metadata from all kinds and
 ages of books, but I had this idea while reading this

 [[:m:User:555]]

 On Mon, Aug 26, 2013 at 10:42 AM, Thomas Douillard <
 thomas.douillard(a)gmail.com&gt; wrote:

  I know, I started a discussion about porting the
bot to WIkidata in
 scientific Journal Wikiproject. One answer I got : the bot owner had other
 things to do in his life than running the bot and was not around very often
 any more. Having everiyhing in Wikidata already will be a lot more reliable
 and lazier, no tool that works one day but not the other one, no effort to
 tell the newbies that they should go to another website, no significant
 problem.

 Maybe one opposition would be that the data would be vandalised easily,
 but maybe we should find a way to deal with imported sourced datas which
 have no real reason to be modified, just marked deprecated or updated by
 another import from the same source.

 2013/8/26 David Cuenca &lt;dacuetu(a)gmail.com&gt;

  If the problem is to automate bibliographic data
importing, a solution
 is what you propose, to import everything. Another one is to have an import
 tool to automatically import the data for the item that needs it. In WP
 they do that, there is a tool to import book/journal info by ISBN/doi. The
 same can be done in WD.

 Micru

 On Mon, Aug 26, 2013 at 9:23 AM, Thomas Douillard <
 thomas.douillard(a)gmail.com&gt; wrote:

> If Wikidata has an ambition to be a really reliable database, we
> should do eveything we can to make it easy for users to use any source they
> want. In this perspective, if we got datas with guaranted high quality, it
> make it easy for Wikidatian to find and use these references for users.
> Entering a reference in the database seems to me a highly fastidious,
> boring, and easily automated task.
>
> With that in mind, any reference that the user will not have to enter
> by hand is something good, and import high quality sources datas should
> pass every Wikidata community barriers easily. If there is no problem for
> the software to handle that many information, I say we really have no
> reason not to do the imports.
>
> Tom
>
>
> _______________________________________________
> Wikidata-l mailing list
> Wikidata-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
>

 --
 Etiamsi omnes, ego non

 _______________________________________________
 Wikidata-l mailing list
 Wikidata-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l

 _______________________________________________
 Wikidata-l mailing list
 Wikidata-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l

 _______________________________________________
 Wikidata-l mailing list
 Wikidata-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l

 _______________________________________________
 Wikidata-l mailing list
 Wikidata-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l