New research paper + dataset on property ranking

List overview All Threads
Download

newer

older

YAGO open source

Wikidata Query Service and SPARQL...

Simon Razniewski

26 Aug 2017 26 Aug '17

12:48 p.m.

Hello,

I wanted to make you aware of our new paper "Doctoral Advisor or Medical Condition: Towards Entity-specific Rankings of Knowledge Base Properties", which deals with the problem of determining the interestingness of Wikidata properties for individual entities.

In the paper we develop a dataset of 350 random (entity, property1, property2) records, and use human judgments to determine the more interesting property in each record. We then show that state-of-the-art techniques (Wikidata Property Suggestor, Google search) achieve 61% precision on predicting the winner in high-agreement records, which can be lifted to 74% by using linguistic similarity, but remains still significantly below human performance (87.5% precision).

Paper: http://www.simonrazniewski.com/2017_ADMA.pdf (to appear at ADMA 2017). Dataset: https://www.kaggle.com/srazniewski/wikidatapropertyranking

Best wishes, Simon Razniewski

Attachments:

attachment.htm (text/html — 1.2 KB)

Show replies by date

AMIT KUMAR JAISWAL

26 Aug 26 Aug

1:37 p.m.

Hi Simon,

This is amazing. Congratulations and Kudos to your team.

I just liked your Kaggle Dataset and would love to experiment with it by developing a new kernel.

Please let me know if I can be of any help.

Have a nice day.

Regards Amit Kumar Jaiswal ᐧ

Amit Kumar Jaiswal Mozilla Representative http://reps.mozilla.org/u/amitkumarj441 | LinkedIn http://in.linkedin.com/in/amitkumarjaiswal1 | Portfolio http://amitkumarj441.github.io New Delhi, India M : +91-8081187743 | T : @AMIT_GKP | PGP : EBE7 39F0 0427 4A2C

On Sat, Aug 26, 2017 at 6:18 PM, Simon Razniewski srazniew@gmail.com wrote:

...

Hello,

I wanted to make you aware of our new paper "Doctoral Advisor or Medical Condition: Towards Entity-specific Rankings of Knowledge Base Properties", which deals with the problem of determining the interestingness of Wikidata properties for individual entities.

In the paper we develop a dataset of 350 random (entity, property1, property2) records, and use human judgments to determine the more interesting property in each record. We then show that state-of-the-art techniques (Wikidata Property Suggestor, Google search) achieve 61% precision on predicting the winner in high-agreement records, which can be lifted to 74% by using linguistic similarity, but remains still significantly below human performance (87.5% precision).

Paper: http://www.simonrazniewski.com/2017_ADMA.pdf (to appear at ADMA 2017). Dataset: https://www.kaggle.com/srazniewski/wikidatapropertyranking

Best wishes, Simon Razniewski

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Simon Razniewski

6:26 p.m.

Hi Amit,

I look forward to your kernel! The things we tried for approximating human judgment are so far all somewhat basic (property frequency, regression trained on entities that have one but not the other property, topic similarity between entities and properties), so any ideas that can improve the approximations would be great contributions.

Let me know if you have technical questions about the dataset.

Cheers, Simon

On 26 August 2017 at 15:37, AMIT KUMAR JAISWAL amitkumarj441@gmail.com wrote:

...

Hi Simon,

This is amazing. Congratulations and Kudos to your team.

I just liked your Kaggle Dataset and would love to experiment with it by developing a new kernel.

Please let me know if I can be of any help.

Have a nice day.

Regards Amit Kumar Jaiswal ᐧ

Amit Kumar Jaiswal Mozilla Representative http://reps.mozilla.org/u/amitkumarj441 | LinkedIn http://in.linkedin.com/in/amitkumarjaiswal1 | Portfolio http://amitkumarj441.github.io New Delhi, India M : +91-8081187743 <+91%2080811%2087743> | T : @AMIT_GKP | PGP : EBE7 39F0 0427 4A2C

On Sat, Aug 26, 2017 at 6:18 PM, Simon Razniewski srazniew@gmail.com wrote:

...
Hello,

I wanted to make you aware of our new paper "Doctoral Advisor or Medical Condition: Towards Entity-specific Rankings of Knowledge Base Properties", which deals with the problem of determining the interestingness of Wikidata properties for individual entities.

In the paper we develop a dataset of 350 random (entity, property1, property2) records, and use human judgments to determine the more interesting property in each record. We then show that state-of-the-art techniques (Wikidata Property Suggestor, Google search) achieve 61% precision on predicting the winner in high-agreement records, which can be lifted to 74% by using linguistic similarity, but remains still significantly below human performance (87.5% precision).

Paper: http://www.simonrazniewski.com/2017_ADMA.pdf (to appear at ADMA 2017). Dataset: https://www.kaggle.com/srazniewski/wikidatapropertyranking

Best wishes, Simon Razniewski

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Fariz Darari

3 Sep 3 Sep

4:53 a.m.

Hi Simon,

thanks for the paper, interesting findings!

Let me play the devil's advocate a bit here with your example: "For a politician, for instance, the political party is generally much more important than music instruments played."

Now let's compare the following two did-you-know facts: - Did you know that Bill Clinton is a famous politician from the Democrat Party. - Did you know that Bill Clinton is a famous politician who is also a saxophonist.

To me, the second is more interesting :)

But here, the interestingness is related to the degree of being unusual. Any thoughts on this?

Regards, Fariz

On Sat, Aug 26, 2017 at 7:48 PM, Simon Razniewski srazniew@gmail.com wrote:

...

Hello,

I wanted to make you aware of our new paper "Doctoral Advisor or Medical Condition: Towards Entity-specific Rankings of Knowledge Base Properties", which deals with the problem of determining the interestingness of Wikidata properties for individual entities.

In the paper we develop a dataset of 350 random (entity, property1, property2) records, and use human judgments to determine the more interesting property in each record. We then show that state-of-the-art techniques (Wikidata Property Suggestor, Google search) achieve 61% precision on predicting the winner in high-agreement records, which can be lifted to 74% by using linguistic similarity, but remains still significantly below human performance (87.5% precision).

Paper: http://www.simonrazniewski.com/2017_ADMA.pdf (to appear at ADMA 2017). Dataset: https://www.kaggle.com/srazniewski/wikidatapropertyranking

Best wishes, Simon Razniewski

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

2648

Age (days ago)

2656

Last active (days ago)

wikidata@lists.wikimedia.org

3 comments

3 participants

tags (0)

participants (3)

AMIT KUMAR JAISWAL
Fariz Darari
Simon Razniewski