2015-01-17 4:27 GMT-05:00 Lydia Pintscher <lydia.pintscher@wikimedia.de>:

The log is at https://meta.wikimedia.org/wiki/IRC_office_hours/Office_hours_2015-01-16
for anyone who couldn't make it.

Denny discusses importing all missing VIAF keys from Freebase using "multichill" (unclear what that is from the context) on the assumption that the error rate is low.  It would be worth checking assumptions like that with folks who are familiar with the Freebase data before acting on them.

Here are some things that I think are true about the VIAF keys in Freebase:

- they were assigned by a user, not by Google/Metawab (not necessarily a bad thing since some of the biggest problems in Freebase were created by G/M and some users have contributed very high quality data)

- they keys were, I believe, assigned based heavily on existing Library of Congress identifiers that had previously been assigned by Metaweb.  Those key assignments are not as high quality as other parts of Freebase.  One easy thing to check for is people with two LC (and thus two VIAF) keys assigned.  In cases where there are more than key and the extra keys don't represent pseudonyms, this is a clear error.

- Freebase doesn't create separate entities for pseudonyms, unlike the library cataloging world.  Depending on what decision Wikidata makes in this regard, it's something to watch out for when reusing Freebase author data (including VIAF keys)

- much Freebase author data was imported from OpenLibrary which has its own set of quality issues.  A bunch of this data was later deleted, leaving that portion of the graph somewhat thready and moth-eaten.  It's unclear whether that was a net gain or loss in overall bibliographic data quality for Freebase.

- I suspect that most VIAF keys which are in Freebase and not Wikidata represent entities which are not in Wikidata which means they aren't useful anyway since he wants to focus on creating new links, not new entities (a direction that I'm not sure I agree with, but that's a whole separate discussion).

One of the key inputs to judging the quality of assertions is their provenance. Fortunately, this is recorded for all assertions in Freebase and it's possible to trace a given fact back to the user, toolchain, or process that added it to the database. Unfortunately, this information is only available through the Freebase API, not the bulk data dump.  Hopefully, this will change before Google completely abandons Freebase.

If any Wikidata folk want to discuss VIAF keys in Freebase (or its author data in general), feel free to get in touch.

Tom