On 08.09.2014 19:02, Benjamin Good wrote:
How are "related properties" calculated?
Let me start with the second question:
Is the definition of a Class "something that has a subclass relationship?" Or?
Basically yes: a class is something that participates in a "subclass of" relation, or that is used as a value for "instance of". Moreover, for the display, I filter out all the classes that only occur as a subclass in "subclass of" (no own instances or subclasses) to reduce data size a bit.
I calculate related properties for properties and classes. For classes, I look at the items that are "instance of" the class (direct instances, suclass of is ignored). For properties, I look at the items that have a statement with this property.
For each of the items I look at, I count how often other properties (potentially "related properties") occur in their statements. From this I can compute which ratio of the items (in a class or with a property) have some other property. If this ratio is notably higher than the ratio of overall items using the property, then I consider it as related.
The idea is to find properties that are "typical": they should be notably more likely to occur with an item in this class than they would be in general. This also helps to filter out properties that occur everywhere (boring ones, like "image" or "freebase identifier"): they are most frequent but not what you want to know most about when you lok at a specific class. I have some custom scoring function in the code that I tweaked to adjust this until it seemed right, but there is no deeper principle behind this.
Very cool...
Thanks :-)
Markus
-Ben
On Mon, Sep 8, 2014 at 9:24 AM, Markus Krötzsch <markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org> wrote:
On 08.09.2014 14:53, Markus Krötzsch wrote: ... > http://tools.wmflabs.org/__wikidata-exports/miga/#_item=__1204 <http://tools.wmflabs.org/wikidata-exports/miga/#_item=1204> That first shows "population". When then clicking on the link, you see the data type is quantity, not string. Yes, I think this is a bug in how we use IRIs and labels for datatypes. The main id now is the label, but the properties all use the IRI to refer to the datatype. Seems that this does not work properly in Miga. I will try to make a version with the labels used everywhere. Ok, fixed in the code and on the Web (requires reload). The only bug we still have is that the recent Monolingual Text datatype is not known yet. It appears as "Unknown" in properties. Will be fixed when we have it in the parser. Markus _________________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/__mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l