Wow, Haklae, great work, thanks!
I took the mappings that your team has provided, and was able to increase
the number of mapped triples by more than a factor of two, i.e. in the
relevant step I have an increase from 8.5M mapped claims to 20.7M mapped
statements. That is pretty awesome. There are almost four times as many
mappings in the Samsung data release than there are in Wikidata.
I cannot say anything about the quality of the mapping, obviously, I only
had one day to play around with it. I will continue for now to use only the
mapping which is actually in Wikidata itself, but either the community can
decide to move faster by uploading more mapping data from your source to
Wikidata, or else Tpt will certainly look more into the data once he is
here.
Thank you for the dataset!
Also, is there a description of all the files in the github repo? I would
prefer not to guess.
On Wed, Apr 8, 2015 at 6:16 AM Kim Haklae <haklaekim(a)gmail.com> wrote:
Hi all,
I am pleased to announce that the Freebase-Wikidata mappings are shared in
public.
http://github.com/Samsung/KnowledgeSharingPlatform
Google is already providing the mapping relation between Freebase and
Wikidata (
https://developers.google.com/freebase/data), however, they
might not offer a updated version. We extract a set of identical
relations from both Freebase and Wikidata datasets using Wikipedia links;
several algorithms are also tested to find out same entity pairs. Although
this approach is limited to identifying all same entities of both datasets,
it would be a useful source to understand instances of both data sources.
The source code for extracting this data will also be shared soon.
The data is serialised using the N-Triples format, and the following is
the details of this data:
- Total 4,395,258 triples (same entity pairs)
- Updated: February 13, 2015
- Data Format: N-Triples RDF
- License: CC0
- File size: 236 MB zip
- File size: 2.5 GB (uncompressed)
Feel free to ask me if you have any questions.
Cheers,
Haklae Kim
Senior Engineer
Samsung Electronics Co., Ltd.
scot.kim(a)samsung.com / haklaekim(a)gmail.com
--
Dr.Dr. Haklae Kim
Semantic Web and Open Data Hacker
Open Knowledge Foundation Korea
http://thedatahub.kr
http://getthedata.kr
http://blogweb.co.kr
Tel: +82-(0)10-3201-0714
Who's Who in the World's 27th Edition - 2010
IBC 2000 Outstanding Scientists - 2010
_______________________________________________
Wikidata-l mailing list
Wikidata-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l