Hi all,
the GlobalFactSync project [1] will start tomorrow and we have established a good team to deal with many of these issues.
[1] https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE
As part of the project we will:
- work on syncing Wikidata and Wikipedias Infoboxes across languages and also try to sync references from infoboxes:
-- http://infoboxes.net/en/Giuseppe%20Garibaldi
- we had a comment to sync up with Musicbrainz as well https://meta.wikimedia.org/wiki/Grants_talk:Project/DBpedia/GlobalFactSyncRE#Interfacing_with_Wikidata's_data_quality_issues_in_certain_areas
- Note that there will be 10 concrete Sync targets to make
concrete progress. We hope for more suggestions.
As part of DBpedias development, we already have a way to query
all IDs and assign a global one:
https://global.dbpedia.org/same-thing/lookup/?uri=http://dbpedia.org/resource/Giuseppe_Garibaldi
We are also devising a strategy to manage all links centrally and
give feedback to sources.
There is also an engine to compare all values:
https://svn.aksw.org/papers/2019/ISWC_FlexiFusion/public.pdf
That said, these are all early prototypes and usability is
lacking, but as proof-of-concept they are quite promising. Apache
Spark provides scalability.
All the best,
Sebastian
That's by design, since identifiers on Wikidata are not some kind of
top-down process where ever single actor's responsibility is defined
from the beginning.
This doesn't preclude good things from happening, as we've seen with LOC:
https://blogs.loc.gov/thesignal/2019/05/integrating-wikidata-at-the-library-of-congress/
I know that, but indeed that does not preclude deeper collaboration and synchronisation of Wikidata datas and other resources, and that’s the status of those potential collaboration and their maturity/workflow I’m interested in. Liking is useful of course but things happens also on Wikidata like data curation, duplicates entry identification on the external database working with their identifiers if it occurs that the same Wikidata item has two identifiers (for example : https://www.wikidata.org/wiki/Q539#P1015 has two values at the post time). I’m interested to know if this potential is exploited by some workflow authority control organisation by a periodic review/report of comparison of their datas against Wikidata community differences. And conversely if we imported datas from a datasource, if the changes made on the original live data source are reflected on the Wikidata datas.
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata