On 10 April 2015 at 10:27, Kim Haklae <haklaekim@gmail.com> wrote:

Hi Denny,

Thanks. We extracted the mapping relations using several algorithms for a high precison, however, we need to check quality of this datasets. We are now investigating how to integrate both Freebase and Wikidata datasets using a single schema. The mappings can be useful for that job. However, a coverage of the mappings is still very limited. Is there any plasn for merging both datasets?
I will update the git with descriptions.

Best,
Haklae

On Friday, April 10, 2015, Denny Vrandečić <vrandecic@google.com> wrote:
Wow, Haklae, great work, thanks!

I took the mappings that your team has provided, and was able to increase the number of mapped triples by more than a factor of two, i.e. in the relevant step I have an increase from 8.5M mapped claims to 20.7M mapped statements. That is pretty awesome. There are almost four times as many mappings in the Samsung data release than there are in Wikidata.

I cannot say anything about the quality of the mapping, obviously, I only had one day to play around with it. I will continue for now to use only the mapping which is actually in Wikidata itself, but either the community can decide to move faster by uploading more mapping data from your source to Wikidata, or else Tpt will certainly look more into the data once he is here.

Thank you for the dataset!

Also, is there a description of all the files in the github repo? I would prefer not to guess.

On Wed, Apr 8, 2015 at 6:16 AM Kim Haklae <haklaekim@gmail.com> wrote:
Hi all,

I am pleased to announce that the Freebase-Wikidata mappings are shared in public.

http://github.com/Samsung/KnowledgeSharingPlatform

Google is already providing the mapping relation between Freebase and Wikidata (https://developers.google.com/freebase/data), however, they might not offer a updated version. We extract a set of identical relations from both Freebase and Wikidata datasets using Wikipedia links; several algorithms are also tested to find out same entity pairs. Although this approach is limited to identifying all same entities of both datasets, it would be a useful source to understand instances of both data sources. The source code for extracting this data will also be shared soon.

The data is serialised using the N-Triples format, and the following is the details of this data:

- Total 4,395,258 triples (same entity pairs)

- Updated: February 13, 2015

- Data Format: N-Triples RDF

- License: CC0

- File size: 236 MB zip

- File size: 2.5 GB (uncompressed)

Feel free to ask me if you have any questions.

Cheers,

Haklae Kim

Senior Engineer
Samsung Electronics Co., Ltd.
scot.kim@samsung.com / haklaekim@gmail.com

--
Dr.Dr. Haklae Kim
Semantic Web and Open Data Hacker
Open Knowledge Foundation Korea
http://thedatahub.kr
http://getthedata.kr
http://blogweb.co.kr
Tel: +82-(0)10-3201-0714
Who's Who in the World's 27th Edition - 2010
IBC 2000 Outstanding Scientists - 2010

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

--
Dr.Dr. Haklae Kim
Semantic Web and Open Data Hacker
Open Knowledge Foundation Korea
http://thedatahub.kr
http://getthedata.kr
http://blogweb.co.kr
Tel: +82-(0)10-3201-0714
Who's Who in the World's 27th Edition - 2010
IBC 2000 Outstanding Scientists - 2010

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l