Hi Denny,
Thanks. We extracted the mapping relations using several algorithms for a
high precison, however, we need to check quality of this datasets. We are
now investigating how to integrate both Freebase and Wikidata datasets
using a single schema. The mappings can be useful for that job. However, a
coverage of the mappings is still very limited. Is there any plasn for
merging both datasets?
I will update the git with descriptions.
Best,
Haklae
On Friday, April 10, 2015, Denny Vrandečić <vrandecic(a)google.com> wrote:
Wow, Haklae, great work, thanks!
I took the mappings that your team has provided, and was able to increase
the number of mapped triples by more than a factor of two, i.e. in the
relevant step I have an increase from 8.5M mapped claims to 20.7M mapped
statements. That is pretty awesome. There are almost four times as many
mappings in the Samsung data release than there are in Wikidata.
I cannot say anything about the quality of the mapping, obviously, I only
had one day to play around with it. I will continue for now to use only the
mapping which is actually in Wikidata itself, but either the community can
decide to move faster by uploading more mapping data from your source to
Wikidata, or else Tpt will certainly look more into the data once he is
here.
Thank you for the dataset!
Also, is there a description of all the files in the github repo? I would
prefer not to guess.
On Wed, Apr 8, 2015 at 6:16 AM Kim Haklae <haklaekim(a)gmail.com
<javascript:_e(%7B%7D,'cvml','haklaekim@gmail.com');>> wrote:
Hi all,
I am pleased to announce that the Freebase-Wikidata mappings are shared
in public.
http://github.com/Samsung/KnowledgeSharingPlatform
Google is already providing the mapping relation between Freebase and
Wikidata (
https://developers.google.com/freebase/data), however, they
might not offer a updated version. We extract a set of identical
relations from both Freebase and Wikidata datasets using Wikipedia
links; several algorithms are also tested to find out same entity pairs.
Although this approach is limited to identifying all same entities of
both datasets, it would be a useful source to understand instances of both
data sources. The source code for extracting this data will also be shared
soon.
The data is serialised using the N-Triples format, and the following is
the details of this data:
- Total 4,395,258 triples (same entity pairs)
- Updated: February 13, 2015
- Data Format: N-Triples RDF
- License: CC0
- File size: 236 MB zip
- File size: 2.5 GB (uncompressed)
Feel free to ask me if you have any questions.
Cheers,
Haklae Kim
Senior Engineer
Samsung Electronics Co., Ltd.
scot.kim(a)samsung.com
<javascript:_e(%7B%7D,'cvml','scot.kim@samsung.com');> /
haklaekim(a)gmail.com
<javascript:_e(%7B%7D,'cvml','haklaekim@gmail.com');>
--
Dr.Dr. Haklae Kim
Semantic Web and Open Data Hacker
Open Knowledge Foundation Korea
http://thedatahub.kr
http://getthedata.kr
http://blogweb.co.kr
Tel: +82-(0)10-3201-0714
Who's Who in the World's 27th Edition - 2010
IBC 2000 Outstanding Scientists - 2010
_______________________________________________
Wikidata-l mailing list
Wikidata-l(a)lists.wikimedia.org
<javascript:_e(%7B%7D,'cvml','Wikidata-l@lists.wikimedia.org');>
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
--
Dr.Dr. Haklae Kim
Semantic Web and Open Data Hacker
Open Knowledge Foundation Korea
Tel: +82-(0)10-3201-0714
Who's Who in the World's 27th Edition - 2010
IBC 2000 Outstanding Scientists - 2010