Hey everyone,
As you may already know, I am currently working on the importation of Freebase content into Wikidata [1] using the primary source tool [2].
One of the big challenges of the migration is to build a good mapping of the properties of Freebase to Wikidata ones.There are a few thousand of properties so it is a task too big to be done alone. Your help is far more than welcome for this task on this page: https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase/Mapping
Cheers,
Thomas
[1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase [2] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool
Hi!
As you may already know, I am currently working on the importation of Freebase content into Wikidata [1] using the primary source tool [2].
One of the big challenges of the migration is to build a good mapping of the properties of Freebase to Wikidata ones.There are a few thousand of properties so it is a task too big to be done alone. Your help is far more than welcome for this task on this page: https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase/Mapping
Some of these look a bit challenging due to different semantics. I.e. https://www.freebase.com/award/ranking/year probably matches point in time (P585) but the former is for specific awards, while the latter is a generic property that would be applied to the award claim. So it's not 1-1 transition. Would it still be useful to match P585 to https://www.freebase.com/award/ranking/year and do the same in similar cases? I.e. pretty much all /year ones would be P585, but in different context.
Hi!
Yes, such matching are useful. The goal here is to move data from Freebase to Wikidata. So, we don't need 1-1 relation because the direction Wikidata to Freebase is not important. What we want here is that, if we replace the Freebase property by the Wikidata one, the relation remains valid.
So, P585 is a good matching for https://www.freebase.com/award/ranking/year because if we have "X award/ranking/year 1966" in Freebase then "X P585 1966" is valid too.
Cheers,
Thomas
On Mon, Jun 15, 2015 at 9:26 PM, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
As you may already know, I am currently working on the importation of Freebase content into Wikidata [1] using the primary source tool [2].
One of the big challenges of the migration is to build a good mapping of the properties of Freebase to Wikidata ones.There are a few thousand of properties so it is a task too big to be done alone. Your help is far more than welcome for this task on this page:
https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase/Mapping
Some of these look a bit challenging due to different semantics. I.e. https://www.freebase.com/award/ranking/year probably matches point in time (P585) but the former is for specific awards, while the latter is a generic property that would be applied to the award claim. So it's not 1-1 transition. Would it still be useful to match P585 to https://www.freebase.com/award/ranking/year and do the same in similar cases? I.e. pretty much all /year ones would be P585, but in different context.
-- Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 16 June 2015 at 00:24, Thomas Pellissier-Tanon thomaspt@google.com wrote:
As you may already know, I am currently working on the importation of Freebase content into Wikidata [1] using the primary source tool [2].
One of the big challenges of the migration is to build a good mapping of the properties of Freebase to Wikidata ones.There are a few thousand of properties so it is a task too big to be done alone. Your help is far more than welcome for this task on this page: https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase/Mapping
https://www.freebase.com/book/book_edition/ISBN could be P212 (ISBN-13) or P957 (ISBN-10). How would you like us to map that?
ISBN is an interesting case - could we simply say "map to P212", then later do a sanity-check on P212 imports to move ten-character ones to P957? (Or vice versa, depending on which is most common). I am guessing a bot does this already for misplaced values...
Andrew.
On 16 June 2015 at 11:00, Andy Mabbett andy@pigsonthewing.org.uk wrote:
On 16 June 2015 at 00:24, Thomas Pellissier-Tanon thomaspt@google.com wrote:
As you may already know, I am currently working on the importation of Freebase content into Wikidata [1] using the primary source tool [2].
One of the big challenges of the migration is to build a good mapping of the properties of Freebase to Wikidata ones.There are a few thousand of properties so it is a task too big to be done alone. Your help is far more than welcome for this task on this page: https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase/Mapping
https://www.freebase.com/book/book_edition/ISBN could be P212 (ISBN-13) or P957 (ISBN-10). How would you like us to map that?
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Andy,
The /book/book_edition/ISBN is a mix of -13 and -10 sometimes even weirder :)
It was deprecated (but still holds value) about 3 years ago by Jeff Prucher @ Google and we moved the -13 instances into https://www.freebase.com/media_common/cataloged_instance/isbn13
Although I do not now if the copy (done by a bot script at the time by Metaweb) was complete. It seems to be however by quick glance... there are 1.6 Million Cataloged instances alone from various OPACs, etc. that were imported into Freebase in 2012.
I should probably jump into this and help you guys with this mapping.
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Tue, Jun 16, 2015 at 5:00 AM, Andy Mabbett andy@pigsonthewing.org.uk wrote:
On 16 June 2015 at 00:24, Thomas Pellissier-Tanon thomaspt@google.com wrote:
As you may already know, I am currently working on the importation of Freebase content into Wikidata [1] using the primary source tool [2].
One of the big challenges of the migration is to build a good mapping of
the
properties of Freebase to Wikidata ones.There are a few thousand of properties so it is a task too big to be done alone. Your help is far
more
than welcome for this task on this page: https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase/Mapping
https://www.freebase.com/book/book_edition/ISBN could be P212 (ISBN-13) or P957 (ISBN-10). How would you like us to map that?
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
So Thomas,
You want "Type of Value" mapping as well as "Equivalent Property in Wikidata" ?
And both types of mapping are useful to you and the team...is that the correct understanding ?
Thad +ThadGuidry https://www.google.com/+ThadGuidry
Dear Thomas,
I am personally interested in the chemistry bits, and checked the list, but did not see the "chemistry" domain from Freebase:
https://www.freebase.com/chemistry/chemical_compound?schema=
Is that just a matter of timing, or is it left out because WP/WD already has a good coverage?
Egon
On Tue, Jun 16, 2015 at 1:24 AM, Thomas Pellissier-Tanon thomaspt@google.com wrote:
Hey everyone,
As you may already know, I am currently working on the importation of Freebase content into Wikidata [1] using the primary source tool [2].
One of the big challenges of the migration is to build a good mapping of the properties of Freebase to Wikidata ones.There are a few thousand of properties so it is a task too big to be done alone. Your help is far more than welcome for this task on this page: https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase/Mapping
Cheers,
Thomas
[1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase [2] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Egon,
Chemistry has some very good bits of data in regards to SMILES property in Freebase, btw I did some of that editing work in Freebase myself long ago ! :)
Also the Chemical Classifications are pretty useful as well, I would say, but would need careful review work upon importing into Wikidata. Here's a nice view of the Chemical Classifications https://www.freebase.com/chemistry/chemical_classification?instances=
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Tue, Jun 16, 2015 at 10:45 AM, Egon Willighagen < egon.willighagen@gmail.com> wrote:
Dear Thomas,
I am personally interested in the chemistry bits, and checked the list, but did not see the "chemistry" domain from Freebase:
https://www.freebase.com/chemistry/chemical_compound?schema=
Is that just a matter of timing, or is it left out because WP/WD already has a good coverage?
Egon
On Tue, Jun 16, 2015 at 1:24 AM, Thomas Pellissier-Tanon thomaspt@google.com wrote:
Hey everyone,
As you may already know, I am currently working on the importation of Freebase content into Wikidata [1] using the primary source tool [2].
One of the big challenges of the migration is to build a good mapping of
the
properties of Freebase to Wikidata ones.There are a few thousand of properties so it is a task too big to be done alone. Your help is far
more
than welcome for this task on this page: https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase/Mapping
Cheers,
Thomas
[1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase [2] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- E.L. Willighagen Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers ORCID: 0000-0001-7542-0286 ImpactStory: https://impactstory.org/EgonWillighagen
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Thank you everyone for your answers!
About ISBN there are in the Freebase dump: - 698735 triples with media_common/cataloged_instance/isbn13 - 692557 triples with book/book_edition/isbn So, I think we may just ignore book/book_edition/isbn and map media_common/cataloged_instance/isbn13 to P212. What do you think about it?
I have only added the most used 1000 properties in order to don't hide important properties with less important ones. But feel free to add properties that are not listed there. I've just added to the page properties for chemistry: https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase/Mapping#Chemistr...
Cheers,
Thomas
On Tue, Jun 16, 2015 at 9:12 AM, Thad Guidry thadguidry@gmail.com wrote:
Egon,
Chemistry has some very good bits of data in regards to SMILES property in Freebase, btw I did some of that editing work in Freebase myself long ago ! :)
Also the Chemical Classifications are pretty useful as well, I would say, but would need careful review work upon importing into Wikidata. Here's a nice view of the Chemical Classifications https://www.freebase.com/chemistry/chemical_classification?instances=
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Tue, Jun 16, 2015 at 10:45 AM, Egon Willighagen < egon.willighagen@gmail.com> wrote:
Dear Thomas,
I am personally interested in the chemistry bits, and checked the list, but did not see the "chemistry" domain from Freebase:
https://www.freebase.com/chemistry/chemical_compound?schema=
Is that just a matter of timing, or is it left out because WP/WD already has a good coverage?
Egon
On Tue, Jun 16, 2015 at 1:24 AM, Thomas Pellissier-Tanon thomaspt@google.com wrote:
Hey everyone,
As you may already know, I am currently working on the importation of Freebase content into Wikidata [1] using the primary source tool [2].
One of the big challenges of the migration is to build a good mapping
of the
properties of Freebase to Wikidata ones.There are a few thousand of properties so it is a task too big to be done alone. Your help is far
more
than welcome for this task on this page: https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase/Mapping
Cheers,
Thomas
[1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase [2] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- E.L. Willighagen Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers ORCID: 0000-0001-7542-0286 ImpactStory: https://impactstory.org/EgonWillighagen
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Thomas,
I agree and think that is the right mapping for ISBN13.
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Tue, Jun 16, 2015 at 1:43 PM, Thomas Pellissier-Tanon < thomaspt@google.com> wrote:
Thank you everyone for your answers!
About ISBN there are in the Freebase dump:
- 698735 triples with media_common/cataloged_instance/isbn13
- 692557 triples with book/book_edition/isbn
So, I think we may just ignore book/book_edition/isbn and map media_common/cataloged_instance/isbn13 to P212. What do you think about it?
I have only added the most used 1000 properties in order to don't hide important properties with less important ones. But feel free to add properties that are not listed there. I've just added to the page properties for chemistry: https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase/Mapping#Chemistr...
Cheers,
Thomas
On Tue, Jun 16, 2015 at 9:12 AM, Thad Guidry thadguidry@gmail.com wrote:
Egon,
Chemistry has some very good bits of data in regards to SMILES property in Freebase, btw I did some of that editing work in Freebase myself long ago ! :)
Also the Chemical Classifications are pretty useful as well, I would say, but would need careful review work upon importing into Wikidata. Here's a nice view of the Chemical Classifications https://www.freebase.com/chemistry/chemical_classification?instances=
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Tue, Jun 16, 2015 at 10:45 AM, Egon Willighagen < egon.willighagen@gmail.com> wrote:
Dear Thomas,
I am personally interested in the chemistry bits, and checked the list, but did not see the "chemistry" domain from Freebase:
https://www.freebase.com/chemistry/chemical_compound?schema=
Is that just a matter of timing, or is it left out because WP/WD already has a good coverage?
Egon
On Tue, Jun 16, 2015 at 1:24 AM, Thomas Pellissier-Tanon thomaspt@google.com wrote:
Hey everyone,
As you may already know, I am currently working on the importation of Freebase content into Wikidata [1] using the primary source tool [2].
One of the big challenges of the migration is to build a good mapping
of the
properties of Freebase to Wikidata ones.There are a few thousand of properties so it is a task too big to be done alone. Your help is far
more
than welcome for this task on this page: https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase/Mapping
Cheers,
Thomas
[1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase [2] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- E.L. Willighagen Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers ORCID: 0000-0001-7542-0286 ImpactStory: https://impactstory.org/EgonWillighagen
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 16 June 2015 at 19:43, Thomas Pellissier-Tanon thomaspt@google.com wrote:
I've just added to the page properties for chemistry: https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase/Mapping#Chemistr...
It seems to me that, before the dats can be imported, we'll need to add several new Wikidata properties. Was that your intention?
I'm happy to help with drafting new property proposals, for chemistry and any other domain.
On Wed, Jun 17, 2015 at 4:01 PM, Andy Mabbett andy@pigsonthewing.org.uk wrote:
On 16 June 2015 at 19:43, Thomas Pellissier-Tanon thomaspt@google.com wrote:
I've just added to the page properties for chemistry: https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase/Mapping#Chemistr...
It seems to me that, before the dats can be imported, we'll need to add several new Wikidata properties. Was that your intention?
Yeah, I noted that too. I think this should be discussed in the Wikichemistry project, and we should also look at what Marco sent around...
Egon
@Andy
It seems to me that, before the dats can be imported, we'll need to add
several new Wikidata properties. Was that your intention? Run multiple times my mapping script from Freebase to Wikidata is easy so I think I'll run it again as soon as the mapping is really improved and add the added statements to Primary Source. So, yes, feel free to start the property creation process. I'm still at Google for 8 weeks so we have time to go throw the property creation process.
Depends -what is the overlap between the two sets of items described by
those triples? As Thad said most of the ISBN 13 set in book/book_edition/ISBN have been moved into media_common/cataloged_instance/isbn13 so I believe only import media_common/cataloged_instance/isbn13 is a first good step even if we miss some data. I believe that there are so many more triple that are more important to import than these missed ISBNs. But if someone propose a good way to import them too I am ready to implement it.
@Marco Thank you very much for the links to these discussions. I was not aware of them.
- Are these mappings [1, 2] coming from earlier discussions [3] already
validated/integrated? Not yet, but I plan to integrate [1] into the mapping with a human validation. For [2] I currently don't do anything related to types but it may be something interested to do if I have time.
- As the maintainer of the DBpedia mapper bot, the procedure to add
schema mappings to Wikidata classes and properties has already been discussed [4].
The bot code can be easily adapted for Freebase, let me know and I can
volunteer to open a request for the new bot. Thank you very much! It would be very nice to do so. Yes, please, open a such request.
About these mapping I have made a property proposals for a "subproperty" [4] and a "subproperty of" [5] properties in order to have something less strict than "equivalent property" for mappings.
Thomas
[4] https://www.wikidata.org/wiki/Wikidata:Property_proposal/Property_metadata#s... [5] https://www.wikidata.org/wiki/Wikidata:Property_proposal/Property_metadata#s...
On Wed, Jun 17, 2015 at 10:15 AM, Egon Willighagen < egon.willighagen@gmail.com> wrote:
On Wed, Jun 17, 2015 at 4:01 PM, Andy Mabbett andy@pigsonthewing.org.uk wrote:
On 16 June 2015 at 19:43, Thomas Pellissier-Tanon thomaspt@google.com
wrote:
I've just added to the page properties for chemistry:
https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase/Mapping#Chemistr...
It seems to me that, before the dats can be imported, we'll need to add several new Wikidata properties. Was that your intention?
Yeah, I noted that too. I think this should be discussed in the Wikichemistry project, and we should also look at what Marco sent around...
Egon
-- E.L. Willighagen Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers ORCID: 0000-0001-7542-0286 ImpactStory: https://impactstory.org/EgonWillighagen
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 17 June 2015 at 20:17, Thomas Pellissier-Tanon thomaspt@google.com wrote:
@Andy
It seems to me that, before the dats can be imported, we'll need to add several new Wikidata properties. Was that your intention?
feel free to start the property creation process. I'm still at Google for 8 weeks so we have time to go throw the property creation process.
I'm happy to do that; but I may need some help with knowing what is meant by each property. For example:
https://www.freebase.com/chemistry/chemical_classification/chemicals_of_this...
seems to be saying that Amylopectin is (in Wikidata terms) an "instance of" (P31) a starch, but that doesn't seem right.
Also, properties which require data-types that have not yet been set up, such as measurements, will have to wait for those data types to become available. That won't occur within eight weeks.
Messaggio originale Da: Andy Mabbett Inviato: giovedì 18 giugno 2015 14:52 A: Discussion list for the Wikidata project. Rispondi a: Discussion list for the Wikidata project. Oggetto: Re: [Wikidata] Help needed for Freebase to Wikidata migration
On 17 June 2015 at 20:17, Thomas Pellissier-Tanon thomaspt@google.com wrote:
@Andy
It seems to me that, before the dats can be imported, we'll need to add several new Wikidata properties. Was that your intention?
feel free to start the property creation process. I'm still at Google for 8 weeks so we have time to go throw the property creation process.
I'm happy to do that; but I may need some help with knowing what is meant by each property. For example:
https://www.freebase.com/chemistry/chemical_classification/chemicals_of_this...
seems to be saying that Amylopectin is (in Wikidata terms) an "instance of" (P31) a starch, but that doesn't seem right.
Also, properties which require data-types that have not yet been set up, such as measurements, will have to wait for those data types to become available. That won't occur within eight weeks.
Andy,
Yes, your understanding of the /chemicals_of_this_type is correct. It is used to classify "instance of this type" ... and not "part of" parent-child relationships. Here is the actual definition used in Freebase:
Chemical compounds that are included in this classification
However, there are a few assumption errors in the data (Object/Values), that I agree probably need to be corrected later.
Not sure how chemists classify a component of a Thing, i.e., the following statements need to be analyzed and understood by a chemist, but then there are layman concepts that should also probably be held in a new Wikidata property if the community finds it useful to explain chemicals:
" It is one of the two components of starch, the other being amylose. " " Starch branching enzyme https://en.wikipedia.org/wiki/Starch_branching_enzyme introduces 1,6-alpha glycosidic bonds between these chains, creating the branched amylopectin. "
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Thu, Jun 18, 2015 at 7:50 AM, Andy Mabbett andy@pigsonthewing.org.uk wrote:
On 17 June 2015 at 20:17, Thomas Pellissier-Tanon thomaspt@google.com wrote:
@Andy
It seems to me that, before the dats can be imported, we'll need to add several new Wikidata properties. Was that your intention?
feel free to start the property creation process. I'm still at Google for 8 weeks so we have
time
to go throw the property creation process.
I'm happy to do that; but I may need some help with knowing what is meant by each property. For example:
https://www.freebase.com/chemistry/chemical_classification/chemicals_of_this...
seems to be saying that Amylopectin is (in Wikidata terms) an "instance of" (P31) a starch, but that doesn't seem right.
Also, properties which require data-types that have not yet been set up, such as measurements, will have to wait for those data types to become available. That won't occur within eight weeks.
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 17 June 2015 at 18:15, Egon Willighagen egon.willighagen@gmail.com wrote:
It seems to me that, before the dats can be imported, we'll need to add several new Wikidata properties. Was that your intention?
Yeah, I noted that too. I think this should be discussed in the Wikichemistry project, and we should also look at what Marco sent around...
Maybe, but his doesn't just apply to chemistry; we'll need new properties in several domains.
Hello Tom,
Thank you very much for your big review!
What I intended to do with this page is to get a mapping for some of the most important properties in order to be able to test "in real world conditions" the importation workflow. Not to do a complete mapping of all properties.
About which property should be mapped and not Denny said me that ignore some range of properties, even if they are deprecated, is maybe not a good idea because they may still provides valuable data. So, I assumed that I should not just ignore big set of properties because they are deprecated or something else.
Most of the stuff in the /base/* domains should probably be ignored for a
first pass and some, like /base/schemastaging should be ignored permanently unless they've also got an alias in the commons namespace due to being promoted. Yes, you are right. Denny said me there may be some valuable data in /base/ and /user/ but for a first step, yes they should maybe be ignored. It was a mistake to add some of them in this first mapping page.
I don't see anything from the /authority namespace which is arguably the
most important part of Freebase I don't have added these because I haven't take time yet to see how they are structured. But I will do it soon.
There's a bunch of internal bookkeeping cruft included in that list that
should be excluded, e.g.: Thank you very much. I have removed them from the list.
Overall my impression is that there's still a significant amount of very,
very basic groundwork to be completed before it's reasonable to ask people to contribute their effort to doing/reviewing the mappings. Yes, I wanted to move forward quickly. Maybe too much. When we will go to a more bigger mapping I'll do things more carefully.
Has any effort been made to do an initial automated mapping that humans could
then review? Not yet, but I plan to work on it if I have time.
Thank you again,
Thomas
On Wed, Jun 17, 2015 at 12:39 PM, Andy Mabbett andy@pigsonthewing.org.uk wrote:
On 17 June 2015 at 18:15, Egon Willighagen egon.willighagen@gmail.com wrote:
It seems to me that, before the dats can be imported, we'll need to add several new Wikidata properties. Was that your intention?
Yeah, I noted that too. I think this should be discussed in the Wikichemistry project, and we should also look at what Marco sent around...
Maybe, but his doesn't just apply to chemistry; we'll need new properties in several domains.
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 16 June 2015 at 19:43, Thomas Pellissier-Tanon thomaspt@google.com wrote:
- 698735 triples with media_common/cataloged_instance/isbn13
- 692557 triples with book/book_edition/isbn
So, I think we may just ignore book/book_edition/isbn and map media_common/cataloged_instance/isbn13 to P212. What do you think about it?
Depends -what is the overlap between the two sets of items described by those triples?
On Tue, Jun 16, 2015 at 2:43 PM, Thomas Pellissier-Tanon thomaspt@google.com wrote:
I have only added the most used 1000 properties in order to don't hide important properties with less important ones. But feel free to add properties that are not listed there.
Most frequent isn't the same as most important. Has this been reviewed by anyone who's familiar with the Freebase schema? Google knows all the stuff I've listed below and could save a lot of wasted effort by people who aren't familiar with the schema.
Most of the stuff in the /base/* domains should probably be ignored for a first pass and some, like /base/schemastaging should be ignored permanently unless they've also got an alias in the commons namespace due to being promoted. Ditto for the /user/* domains. I don't see anything from the /authority namespace which is arguably the most important part of Freebase -- all it's reconciled strong identifiers for IMDB, Library of Congress, New York Times, etc.
Some of the properties which are included have been replaced by keys in the /authority namespace tree, e.g.
https://www.freebase.com/book/author/openlibrary_id https://www.freebase.com/user/narphorium/people/nndb_person/nndb_id
These can be identified programaticly by looking at the schema where the /type/property/enumeration property will point at the namespace where the identifier is stored (/authority/openlibrary/author & /authority/nndb, respectively). Note that, for historical reasons, some of the earlier key namespaces have aliases outside of the tree rooted at /authority. For example, the property https://www.freebase.com/biology/organism_classification/itis_tsn enumerates its identifiers in a namespace which is aliased as both /biology/itis and /authority/itis. https://www.freebase.com/biology/itis?keys=
All hidden properties should probably be ignored. Most (all?) deprecated properties should probably be ignored. There was a discussion about ISBN, but these can be identified by introspecting the schema: https://www.freebase.com/book/book_edition/ISBN
There's a bunch of internal bookkeeping cruft included in that list that should be excluded, e.g.:
https://www.freebase.com/dataworld/gardening_hint/split_to https://www.freebase.com/dataworld/mass_data_operation/authority https://www.freebase.com/dataworld/mass_data_operation/ended_operation https://www.freebase.com/dataworld/mass_data_operation/estimated_primitive_c... https://www.freebase.com/dataworld/mass_data_operation/operator https://www.freebase.com/dataworld/mass_data_operation/software_tool_used https://www.freebase.com/dataworld/mass_data_operation/started_operation https://www.freebase.com/dataworld/mass_data_operation/using_account https://www.freebase.com/dataworld/provenance/data_operation https://www.freebase.com/dataworld/provenance/tool https://www.freebase.com/dataworld/software_tool/provenances
https://www.freebase.com/freebase/acre_doc/based_on https://www.freebase.com/freebase/acre_doc/handler https://www.freebase.com/freebase/domain_profile/expert_group https://www.freebase.com/freebase/domain_profile/featured_views https://www.freebase.com/freebase/domain_profile/hidden https://www.freebase.com/freebase/domain_profile/show_commons https://www.freebase.com/freebase/flag_judgment/flag https://www.freebase.com/freebase/flag_judgment/item https://www.freebase.com/freebase/flag_judgment/vote https://www.freebase.com/freebase/flag_kind/flags https://www.freebase.com/freebase/flag_vote/judgments
https://www.freebase.com/freebase/review_flag/item https://www.freebase.com/freebase/review_flag/judgments https://www.freebase.com/freebase/review_flag/kind
https://www.freebase.com/freebase/type_profile/instance_count https://www.freebase.com/freebase/user_activity/primitives_live https://www.freebase.com/freebase/user_activity/primitives_written https://www.freebase.com/freebase/user_activity/topics_live https://www.freebase.com/freebase/user_activity/types_live https://www.freebase.com/freebase/user_activity/user
https://www.freebase.com/pipeline/delete_task/delete_guid https://www.freebase.com/pipeline/task/status https://www.freebase.com/pipeline/task/votes https://www.freebase.com/pipeline/vote/vote_value
The properties below are for text and images which were uploaded and are of questionable provenance/rights status, so can probably be ignored (and aren't made available by Google in current data dumps):
https://www.freebase.com/type/content/blob_id https://www.freebase.com/type/content_import/content https://www.freebase.com/type/content_import/header_blob_id https://www.freebase.com/type/content_import/uri https://www.freebase.com/type/content/languagelanguage of work (or name) (P407)See also original language of work (P364) https://www.freebase.com/type/content/length https://www.freebase.com/type/content/media_type https://www.freebase.com/type/content/source https://www.freebase.com/type/content/text_encoding https://www.freebase.com/type/content/uploaded_by
Overall my impression is that there's still a significant amount of very, very basic groundwork to be completed before it's reasonable to ask people to contribute their effort to doing/reviewing the mappings. Have you asked Google staffers familiar with the Freebase schema to review?
I'll second Marco's questions about the status of the previous attempts at automated mappings and automated mappings in general. Has any effort been made to do an initial automated mapping that humans could then review?
Tom