Hello All,
It struck me that one interesting way to see if subclasses are useful was to test this hypothesis.
Let QID_a and QID_b be two Wikidata items.
Conjecture: if QID_b is subclass of QID_a, then count_stelinks(QID_b) <= count_sitelinks(QID_a).
Has anyone investigated this problem, or can think of an efficient way to test it? Or can tell me why it ought not to be true?
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
Not sure if I udnerstood it well, but this could be a counterexample:
Beer (https://www.wikidata.org/wiki/Q44) is a subclass of Alcoholic beverage (https://www.wikidata.org/wiki/Q154) Beer: 142 links Alcoholic beverage: 73 links
On Tue, Sep 24, 2013 at 1:12 AM, Klein,Max kleinm@oclc.org wrote:
Hello All,
It struck me that one interesting way to see if subclasses are useful was to test this hypothesis.
Let QID_a and QID_b be two Wikidata items.
Conjecture: if QID_b is subclass of QID_a, then count_stelinks(QID_b) <= count_sitelinks(QID_a).
Has anyone investigated this problem, or can think of an efficient way to test it? Or can tell me why it ought not to be true?
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
What about alcohol-free beer?
I would be surprised if that theory held true. I expect that both very abstract (fruit) and extremely specific (golden delicious) items would have a lower sitelink count than the "golden layer of most useful terms" (apple) in the hierarchy (I am reminded of the theory of word length and term frequency in linguistics).
But I would assume that indeed in the subclass hierarchy that Wikidata will eventually exhibit would have such a "golden layer" (and that these terms are not randomly distributed over the hierarchy).
Would be fun to examine :)
Cheers, Denny
2013/9/24 Klein,Max kleinm@oclc.org
Hello All,
It struck me that one interesting way to see if subclasses are useful was to test this hypothesis.
Let QID_a and QID_b be two Wikidata items.
Conjecture: if QID_b is subclass of QID_a, then count_stelinks(QID_b) <= count_sitelinks(QID_a).
Has anyone investigated this problem, or can think of an efficient way to test it? Or can tell me why it ought not to be true?
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
will give you all the items that have the "subclass of" property, and the respective item they are a subclass of. Enough to make a subclass tree for all of Wikidata.
You'll have to get the labels and page counts yourself ;-)
On Tue, Sep 24, 2013 at 4:27 PM, Denny Vrandečić < denny.vrandecic@wikimedia.de> wrote:
I would be surprised if that theory held true. I expect that both very abstract (fruit) and extremely specific (golden delicious) items would have a lower sitelink count than the "golden layer of most useful terms" (apple) in the hierarchy (I am reminded of the theory of word length and term frequency in linguistics).
But I would assume that indeed in the subclass hierarchy that Wikidata will eventually exhibit would have such a "golden layer" (and that these terms are not randomly distributed over the hierarchy).
Would be fun to examine :)
Cheers, Denny
2013/9/24 Klein,Max kleinm@oclc.org
Hello All,
It struck me that one interesting way to see if subclasses are useful was to test this hypothesis.
Let QID_a and QID_b be two Wikidata items.
Conjecture: if QID_b is subclass of QID_a, then count_stelinks(QID_b) <= count_sitelinks(QID_a).
Has anyone investigated this problem, or can think of an efficient way to test it? Or can tell me why it ought not to be true?
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
This is really useful, thanks Magnus, otherwise I thought I was going to have to put Wikidata in RAM myself.
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
________________________________ From: wikidata-l-bounces@lists.wikimedia.org wikidata-l-bounces@lists.wikimedia.org on behalf of Magnus Manske magnusmanske@googlemail.com Sent: Tuesday, September 24, 2013 8:45 AM To: Discussion list for the Wikidata project. Subject: Re: [Wikidata-l] Counting sitelinks of subclasses.
will give you all the items that have the "subclass of" property, and the respective item they are a subclass of. Enough to make a subclass tree for all of Wikidata.
You'll have to get the labels and page counts yourself ;-)
On Tue, Sep 24, 2013 at 4:27 PM, Denny Vrande?i? <denny.vrandecic@wikimedia.demailto:denny.vrandecic@wikimedia.de> wrote: I would be surprised if that theory held true. I expect that both very abstract (fruit) and extremely specific (golden delicious) items would have a lower sitelink count than the "golden layer of most useful terms" (apple) in the hierarchy (I am reminded of the theory of word length and term frequency in linguistics).
But I would assume that indeed in the subclass hierarchy that Wikidata will eventually exhibit would have such a "golden layer" (and that these terms are not randomly distributed over the hierarchy).
Would be fun to examine :)
Cheers, Denny
2013/9/24 Klein,Max <kleinm@oclc.orgmailto:kleinm@oclc.org> Hello All,
It struck me that one interesting way to see if subclasses are useful was to test this hypothesis.
Let QID_a and QID_b be two Wikidata items.
Conjecture: if QID_b is subclass of QID_a, then count_stelinks(QID_b) <= count_sitelinks(QID_a).
Has anyone investigated this problem, or can think of an efficient way to test it? Or can tell me why it ought not to be true?
Maximilian Klein Wikipedian in Residence, OCLC +17074787023tel:%2B17074787023
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.orgmailto:Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0tel:%2B49-30-219%20158%2026-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur F?rderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinn?tzig anerkannt durch das Finanzamt f?r K?rperschaften I Berlin, Steuernummer 27/681/51985tel:27%2F681%2F51985.
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.orgmailto:Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- undefined