Hello,
I wanted to make you aware of our new paper "Doctoral Advisor or Medical Condition: Towards Entity-specific Rankings of Knowledge Base Properties", which deals with the problem of determining the interestingness of Wikidata properties for individual entities.
In the paper we develop a dataset of 350 random (entity, property1, property2) records, and use human judgments to determine the more interesting property in each record. We then show that state-of-the-art techniques (Wikidata Property Suggestor, Google search) achieve 61% precision on predicting the winner in high-agreement records, which can be lifted to 74% by using linguistic similarity, but remains still significantly below human performance (87.5% precision).
Paper: http://www.simonrazniewski.com/2017_ADMA.pdf (to appear at ADMA 2017). Dataset: https://www.kaggle.com/srazniewski/wikidatapropertyranking
Best wishes, Simon Razniewski
Hi Simon,
This is amazing. Congratulations and Kudos to your team.
I just liked your Kaggle Dataset and would love to experiment with it by developing a new kernel.
Please let me know if I can be of any help.
Have a nice day.
Regards Amit Kumar Jaiswal ᐧ
Amit Kumar Jaiswal Mozilla Representative http://reps.mozilla.org/u/amitkumarj441 | LinkedIn http://in.linkedin.com/in/amitkumarjaiswal1 | Portfolio http://amitkumarj441.github.io New Delhi, India M : +91-8081187743 | T : @AMIT_GKP | PGP : EBE7 39F0 0427 4A2C
On Sat, Aug 26, 2017 at 6:18 PM, Simon Razniewski srazniew@gmail.com wrote:
Hello,
I wanted to make you aware of our new paper "Doctoral Advisor or Medical Condition: Towards Entity-specific Rankings of Knowledge Base Properties", which deals with the problem of determining the interestingness of Wikidata properties for individual entities.
In the paper we develop a dataset of 350 random (entity, property1, property2) records, and use human judgments to determine the more interesting property in each record. We then show that state-of-the-art techniques (Wikidata Property Suggestor, Google search) achieve 61% precision on predicting the winner in high-agreement records, which can be lifted to 74% by using linguistic similarity, but remains still significantly below human performance (87.5% precision).
Paper: http://www.simonrazniewski.com/2017_ADMA.pdf (to appear at ADMA 2017). Dataset: https://www.kaggle.com/srazniewski/wikidatapropertyranking
Best wishes, Simon Razniewski
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Amit,
I look forward to your kernel! The things we tried for approximating human judgment are so far all somewhat basic (property frequency, regression trained on entities that have one but not the other property, topic similarity between entities and properties), so any ideas that can improve the approximations would be great contributions.
Let me know if you have technical questions about the dataset.
Cheers, Simon
On 26 August 2017 at 15:37, AMIT KUMAR JAISWAL amitkumarj441@gmail.com wrote:
Hi Simon,
This is amazing. Congratulations and Kudos to your team.
I just liked your Kaggle Dataset and would love to experiment with it by developing a new kernel.
Please let me know if I can be of any help.
Have a nice day.
Regards Amit Kumar Jaiswal ᐧ
Amit Kumar Jaiswal Mozilla Representative http://reps.mozilla.org/u/amitkumarj441 | LinkedIn http://in.linkedin.com/in/amitkumarjaiswal1 | Portfolio http://amitkumarj441.github.io New Delhi, India M : +91-8081187743 <+91%2080811%2087743> | T : @AMIT_GKP | PGP : EBE7 39F0 0427 4A2C
On Sat, Aug 26, 2017 at 6:18 PM, Simon Razniewski srazniew@gmail.com wrote:
Hello,
I wanted to make you aware of our new paper "Doctoral Advisor or Medical Condition: Towards Entity-specific Rankings of Knowledge Base Properties", which deals with the problem of determining the interestingness of Wikidata properties for individual entities.
In the paper we develop a dataset of 350 random (entity, property1, property2) records, and use human judgments to determine the more interesting property in each record. We then show that state-of-the-art techniques (Wikidata Property Suggestor, Google search) achieve 61% precision on predicting the winner in high-agreement records, which can be lifted to 74% by using linguistic similarity, but remains still significantly below human performance (87.5% precision).
Paper: http://www.simonrazniewski.com/2017_ADMA.pdf (to appear at ADMA 2017). Dataset: https://www.kaggle.com/srazniewski/wikidatapropertyranking
Best wishes, Simon Razniewski
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Simon,
thanks for the paper, interesting findings!
Let me play the devil's advocate a bit here with your example: "For a politician, for instance, the political party is generally much more important than music instruments played."
Now let's compare the following two did-you-know facts: - Did you know that Bill Clinton is a famous politician from the Democrat Party. - Did you know that Bill Clinton is a famous politician who is also a saxophonist.
To me, the second is more interesting :)
But here, the interestingness is related to the degree of being unusual. Any thoughts on this?
Regards, Fariz
On Sat, Aug 26, 2017 at 7:48 PM, Simon Razniewski srazniew@gmail.com wrote:
Hello,
I wanted to make you aware of our new paper "Doctoral Advisor or Medical Condition: Towards Entity-specific Rankings of Knowledge Base Properties", which deals with the problem of determining the interestingness of Wikidata properties for individual entities.
In the paper we develop a dataset of 350 random (entity, property1, property2) records, and use human judgments to determine the more interesting property in each record. We then show that state-of-the-art techniques (Wikidata Property Suggestor, Google search) achieve 61% precision on predicting the winner in high-agreement records, which can be lifted to 74% by using linguistic similarity, but remains still significantly below human performance (87.5% precision).
Paper: http://www.simonrazniewski.com/2017_ADMA.pdf (to appear at ADMA 2017). Dataset: https://www.kaggle.com/srazniewski/wikidatapropertyranking
Best wishes, Simon Razniewski
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata