Wow, Haklae, great work, thanks!
I took the mappings that your team has provided, and was able to increase the number of mapped triples by more than a factor of two, i.e. in the relevant step I have an increase from 8.5M mapped claims to 20.7M mapped statements. That is pretty awesome. There are almost four times as many mappings in the Samsung data release than there are in Wikidata.
I cannot say anything about the quality of the mapping, obviously, I only had one day to play around with it. I will continue for now to use only the mapping which is actually in Wikidata itself, but either the community can decide to move faster by uploading more mapping data from your source to Wikidata, or else Tpt will certainly look more into the data once he is here.
Thank you for the dataset!
Also, is there a description of all the files in the github repo? I would prefer not to guess.
On Wed, Apr 8, 2015 at 6:16 AM Kim Haklae haklaekim@gmail.com wrote:
Hi all,
I am pleased to announce that the Freebase-Wikidata mappings are shared in public.
http://github.com/Samsung/KnowledgeSharingPlatform
Google is already providing the mapping relation between Freebase and Wikidata (https://developers.google.com/freebase/data), however, they might not offer a updated version. We extract a set of identical relations from both Freebase and Wikidata datasets using Wikipedia links; several algorithms are also tested to find out same entity pairs. Although this approach is limited to identifying all same entities of both datasets, it would be a useful source to understand instances of both data sources. The source code for extracting this data will also be shared soon.
The data is serialised using the N-Triples format, and the following is the details of this data:
Total 4,395,258 triples (same entity pairs)
Updated: February 13, 2015
Data Format: N-Triples RDF
License: CC0
File size: 236 MB zip
File size: 2.5 GB (uncompressed)
Feel free to ask me if you have any questions.
Cheers,
Haklae Kim
Senior Engineer
Samsung Electronics Co., Ltd. scot.kim@samsung.com / haklaekim@gmail.com
-- Dr.Dr. Haklae Kim Semantic Web and Open Data Hacker Open Knowledge Foundation Korea http://thedatahub.kr http://getthedata.kr http://blogweb.co.kr Tel: +82-(0)10-3201-0714 Who's Who in the World's 27th Edition - 2010 IBC 2000 Outstanding Scientists - 2010
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l