Hi all,
I am pleased to announce that the Freebase-Wikidata mappings are shared in public.
http://github.com/Samsung/KnowledgeSharingPlatform
Google is already providing the mapping relation between Freebase and Wikidata (https://developers.google.com/freebase/data), however, they might not offer a updated version. We extract a set of identical relations from both Freebase and Wikidata datasets using Wikipedia links; several algorithms are also tested to find out same entity pairs. Although this approach is limited to identifying all same entities of both datasets, it would be a useful source to understand instances of both data sources. The source code for extracting this data will also be shared soon.
The data is serialised using the N-Triples format, and the following is the details of this data:
- Total 4,395,258 triples (same entity pairs)
- Updated: February 13, 2015
- Data Format: N-Triples RDF
- License: CC0
- File size: 236 MB zip
- File size: 2.5 GB (uncompressed)
Feel free to ask me if you have any questions.
Cheers,
Haklae Kim
Senior Engineer
Samsung Electronics Co., Ltd. scot.kim@samsung.com / haklaekim@gmail.com
Nice work Haklae and team !
And thanks for making this investment and sharing it publicly. This will help everyone involved as migration progresses forward.
Thanks again,
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Wed, Apr 8, 2015 at 8:03 AM, Kim Haklae haklaekim@gmail.com wrote:
Hi all,
I am pleased to announce that the Freebase-Wikidata mappings are shared in public.
http://github.com/Samsung/KnowledgeSharingPlatform
Google is already providing the mapping relation between Freebase and Wikidata (https://developers.google.com/freebase/data), however, they might not offer a updated version. We extract a set of identical relations from both Freebase and Wikidata datasets using Wikipedia links; several algorithms are also tested to find out same entity pairs. Although this approach is limited to identifying all same entities of both datasets, it would be a useful source to understand instances of both data sources. The source code for extracting this data will also be shared soon.
The data is serialised using the N-Triples format, and the following is the details of this data:
Total 4,395,258 triples (same entity pairs)
Updated: February 13, 2015
Data Format: N-Triples RDF
License: CC0
File size: 236 MB zip
File size: 2.5 GB (uncompressed)
Feel free to ask me if you have any questions.
Cheers,
Haklae Kim
Senior Engineer
Samsung Electronics Co., Ltd. scot.kim@samsung.com / haklaekim@gmail.com
-- Dr.Dr. Haklae Kim Semantic Web and Open Data Hacker Open Knowledge Foundation Korea http://thedatahub.kr http://getthedata.kr http://blogweb.co.kr Tel: +82-(0)10-3201-0714 Who's Who in the World's 27th Edition - 2010 IBC 2000 Outstanding Scientists - 2010
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Very nice work Haklae and team !
Best, Scott
On Wed, Apr 8, 2015 at 12:52 PM, Thad Guidry thadguidry@gmail.com wrote:
Nice work Haklae and team !
And thanks for making this investment and sharing it publicly. This will help everyone involved as migration progresses forward.
Thanks again,
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Wed, Apr 8, 2015 at 8:03 AM, Kim Haklae haklaekim@gmail.com wrote:
Hi all,
I am pleased to announce that the Freebase-Wikidata mappings are shared in public.
http://github.com/Samsung/KnowledgeSharingPlatform
Google is already providing the mapping relation between Freebase and Wikidata (https://developers.google.com/freebase/data), however, they might not offer a updated version. We extract a set of identical relations from both Freebase and Wikidata datasets using Wikipedia links; several algorithms are also tested to find out same entity pairs. Although this approach is limited to identifying all same entities of both datasets, it would be a useful source to understand instances of both data sources. The source code for extracting this data will also be shared soon.
The data is serialised using the N-Triples format, and the following is the details of this data:
Total 4,395,258 triples (same entity pairs)
Updated: February 13, 2015
Data Format: N-Triples RDF
License: CC0
File size: 236 MB zip
File size: 2.5 GB (uncompressed)
Feel free to ask me if you have any questions.
Cheers,
Haklae Kim
Senior Engineer
Samsung Electronics Co., Ltd. scot.kim@samsung.com / haklaekim@gmail.com
-- Dr.Dr. Haklae Kim Semantic Web and Open Data Hacker Open Knowledge Foundation Korea http://thedatahub.kr http://getthedata.kr http://blogweb.co.kr Tel: +82-(0)10-3201-0714 Who's Who in the World's 27th Edition - 2010 IBC 2000 Outstanding Scientists - 2010
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hi everyone
All this sounds really good and useful! I was wondering: is there a relation between the Samsung mappings, and the Freebase/Wikidata script that Thomas Steiner has recently shared? https://github.com/google/primarysources/tree/master/frontend
Cheers,
Antoine
On 4/8/15 9:52 PM, Thad Guidry wrote:
Nice work Haklae and team !
And thanks for making this investment and sharing it publicly. This will help everyone involved as migration progresses forward.
Thanks again,
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Wed, Apr 8, 2015 at 8:03 AM, Kim Haklae <haklaekim@gmail.com mailto:haklaekim@gmail.com> wrote:
Hi all, I am pleased to announce that the Freebase-Wikidata mappings are shared in public. __ http://github.com/Samsung/KnowledgeSharingPlatform Google is already providing the mapping relation between Freebase and Wikidata (https://developers.google.com/freebase/data), however, they might not offer a updated version. We extract a set of identical relations from both Freebase and Wikidata datasets using Wikipedia links; several algorithms are also tested to find out same entity pairs. Although this approach is limited to identifying all same entities of both datasets, it would be a useful source to understand instances of both data sources. The source code for extracting this data will also be shared soon. The data is serialised using the N-Triples format, and the following is the details of this data: - Total 4,395,258 triples (same entity pairs) - Updated: February 13, 2015 - Data Format: N-Triples RDF - License: CC0 - File size: 236 MB zip - File size: 2.5 GB (uncompressed) Feel free to ask me if you have any questions. Cheers, Haklae Kim Senior Engineer Samsung Electronics Co., Ltd. scot.kim@samsung.com <mailto:scot.kim@samsung.com> / haklaekim@gmail.com <mailto:haklaekim@gmail.com>__ -- Dr.Dr. Haklae Kim Semantic Web and Open Data Hacker Open Knowledge Foundation Korea http://thedatahub.kr http://getthedata.kr http://blogweb.co.kr Tel: +82-(0)10-3201-0714 <tel:%2B82-%280%2910-3201-0714> Who's Who in the World's 27th Edition - 2010 IBC 2000 Outstanding Scientists - 2010 _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wow, Haklae, great work, thanks!
I took the mappings that your team has provided, and was able to increase the number of mapped triples by more than a factor of two, i.e. in the relevant step I have an increase from 8.5M mapped claims to 20.7M mapped statements. That is pretty awesome. There are almost four times as many mappings in the Samsung data release than there are in Wikidata.
I cannot say anything about the quality of the mapping, obviously, I only had one day to play around with it. I will continue for now to use only the mapping which is actually in Wikidata itself, but either the community can decide to move faster by uploading more mapping data from your source to Wikidata, or else Tpt will certainly look more into the data once he is here.
Thank you for the dataset!
Also, is there a description of all the files in the github repo? I would prefer not to guess.
On Wed, Apr 8, 2015 at 6:16 AM Kim Haklae haklaekim@gmail.com wrote:
Hi all,
I am pleased to announce that the Freebase-Wikidata mappings are shared in public.
http://github.com/Samsung/KnowledgeSharingPlatform
Google is already providing the mapping relation between Freebase and Wikidata (https://developers.google.com/freebase/data), however, they might not offer a updated version. We extract a set of identical relations from both Freebase and Wikidata datasets using Wikipedia links; several algorithms are also tested to find out same entity pairs. Although this approach is limited to identifying all same entities of both datasets, it would be a useful source to understand instances of both data sources. The source code for extracting this data will also be shared soon.
The data is serialised using the N-Triples format, and the following is the details of this data:
Total 4,395,258 triples (same entity pairs)
Updated: February 13, 2015
Data Format: N-Triples RDF
License: CC0
File size: 236 MB zip
File size: 2.5 GB (uncompressed)
Feel free to ask me if you have any questions.
Cheers,
Haklae Kim
Senior Engineer
Samsung Electronics Co., Ltd. scot.kim@samsung.com / haklaekim@gmail.com
-- Dr.Dr. Haklae Kim Semantic Web and Open Data Hacker Open Knowledge Foundation Korea http://thedatahub.kr http://getthedata.kr http://blogweb.co.kr Tel: +82-(0)10-3201-0714 Who's Who in the World's 27th Edition - 2010 IBC 2000 Outstanding Scientists - 2010
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hi Denny,
Thanks. We extracted the mapping relations using several algorithms for a high precison, however, we need to check quality of this datasets. We are now investigating how to integrate both Freebase and Wikidata datasets using a single schema. The mappings can be useful for that job. However, a coverage of the mappings is still very limited. Is there any plasn for merging both datasets? I will update the git with descriptions.
Best, Haklae
On Friday, April 10, 2015, Denny Vrandečić vrandecic@google.com wrote:
Wow, Haklae, great work, thanks!
I took the mappings that your team has provided, and was able to increase the number of mapped triples by more than a factor of two, i.e. in the relevant step I have an increase from 8.5M mapped claims to 20.7M mapped statements. That is pretty awesome. There are almost four times as many mappings in the Samsung data release than there are in Wikidata.
I cannot say anything about the quality of the mapping, obviously, I only had one day to play around with it. I will continue for now to use only the mapping which is actually in Wikidata itself, but either the community can decide to move faster by uploading more mapping data from your source to Wikidata, or else Tpt will certainly look more into the data once he is here.
Thank you for the dataset!
Also, is there a description of all the files in the github repo? I would prefer not to guess.
On Wed, Apr 8, 2015 at 6:16 AM Kim Haklae <haklaekim@gmail.com javascript:_e(%7B%7D,'cvml','haklaekim@gmail.com');> wrote:
Hi all,
I am pleased to announce that the Freebase-Wikidata mappings are shared in public.
http://github.com/Samsung/KnowledgeSharingPlatform
Google is already providing the mapping relation between Freebase and Wikidata (https://developers.google.com/freebase/data), however, they might not offer a updated version. We extract a set of identical relations from both Freebase and Wikidata datasets using Wikipedia links; several algorithms are also tested to find out same entity pairs. Although this approach is limited to identifying all same entities of both datasets, it would be a useful source to understand instances of both data sources. The source code for extracting this data will also be shared soon.
The data is serialised using the N-Triples format, and the following is the details of this data:
Total 4,395,258 triples (same entity pairs)
Updated: February 13, 2015
Data Format: N-Triples RDF
License: CC0
File size: 236 MB zip
File size: 2.5 GB (uncompressed)
Feel free to ask me if you have any questions.
Cheers,
Haklae Kim
Senior Engineer
Samsung Electronics Co., Ltd. scot.kim@samsung.com javascript:_e(%7B%7D,'cvml','scot.kim@samsung.com'); / haklaekim@gmail.com javascript:_e(%7B%7D,'cvml','haklaekim@gmail.com');
-- Dr.Dr. Haklae Kim Semantic Web and Open Data Hacker Open Knowledge Foundation Korea http://thedatahub.kr http://getthedata.kr http://blogweb.co.kr Tel: +82-(0)10-3201-0714 Who's Who in the World's 27th Edition - 2010 IBC 2000 Outstanding Scientists - 2010
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org javascript:_e(%7B%7D,'cvml','Wikidata-l@lists.wikimedia.org'); https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hoi, When you are able to show confidence of the quality of the combined data, when it is highly similar, the next step is providing the data missing as new data for Wikidata.
In Wikidata we have "Kian" as a project with an AI approach to the data. It may be able to give additional confidence when the confidence based on existing data is lacking.
My personal position is that having more data that comes with sufficient confidence is worth having added as soon as possible. Thanks, GerardM
On 10 April 2015 at 10:27, Kim Haklae haklaekim@gmail.com wrote:
Hi Denny,
Thanks. We extracted the mapping relations using several algorithms for a high precison, however, we need to check quality of this datasets. We are now investigating how to integrate both Freebase and Wikidata datasets using a single schema. The mappings can be useful for that job. However, a coverage of the mappings is still very limited. Is there any plasn for merging both datasets? I will update the git with descriptions.
Best, Haklae
On Friday, April 10, 2015, Denny Vrandečić vrandecic@google.com wrote:
Wow, Haklae, great work, thanks!
I took the mappings that your team has provided, and was able to increase the number of mapped triples by more than a factor of two, i.e. in the relevant step I have an increase from 8.5M mapped claims to 20.7M mapped statements. That is pretty awesome. There are almost four times as many mappings in the Samsung data release than there are in Wikidata.
I cannot say anything about the quality of the mapping, obviously, I only had one day to play around with it. I will continue for now to use only the mapping which is actually in Wikidata itself, but either the community can decide to move faster by uploading more mapping data from your source to Wikidata, or else Tpt will certainly look more into the data once he is here.
Thank you for the dataset!
Also, is there a description of all the files in the github repo? I would prefer not to guess.
On Wed, Apr 8, 2015 at 6:16 AM Kim Haklae haklaekim@gmail.com wrote:
Hi all,
I am pleased to announce that the Freebase-Wikidata mappings are shared in public.
http://github.com/Samsung/KnowledgeSharingPlatform
Google is already providing the mapping relation between Freebase and Wikidata (https://developers.google.com/freebase/data), however, they might not offer a updated version. We extract a set of identical relations from both Freebase and Wikidata datasets using Wikipedia links; several algorithms are also tested to find out same entity pairs. Although this approach is limited to identifying all same entities of both datasets, it would be a useful source to understand instances of both data sources. The source code for extracting this data will also be shared soon.
The data is serialised using the N-Triples format, and the following is the details of this data:
Total 4,395,258 triples (same entity pairs)
Updated: February 13, 2015
Data Format: N-Triples RDF
License: CC0
File size: 236 MB zip
File size: 2.5 GB (uncompressed)
Feel free to ask me if you have any questions.
Cheers,
Haklae Kim
Senior Engineer
Samsung Electronics Co., Ltd. scot.kim@samsung.com / haklaekim@gmail.com
-- Dr.Dr. Haklae Kim Semantic Web and Open Data Hacker Open Knowledge Foundation Korea http://thedatahub.kr http://getthedata.kr http://blogweb.co.kr Tel: +82-(0)10-3201-0714 Who's Who in the World's 27th Edition - 2010 IBC 2000 Outstanding Scientists - 2010
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Dr.Dr. Haklae Kim Semantic Web and Open Data Hacker Open Knowledge Foundation Korea http://thedatahub.kr http://getthedata.kr http://blogweb.co.kr Tel: +82-(0)10-3201-0714 Who's Who in the World's 27th Edition - 2010 IBC 2000 Outstanding Scientists - 2010
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l