Hi,
I just updated the data for the Wikidata classes and properties browser [1] -- was about time -- and added some improvements on the way:
(1) Classes and properties are now always ordered by usage (most used first), which was not possible to do before. Examples:
** properties related to humans (or anything else with sex or gender) ordered by usage:
http://tools.wmflabs.org/wikidata-exports/miga/#_cat=Properties/Related%20pr...
(for properties, usage includes the use in qualifiers and references)
** Most used months:
http://tools.wmflabs.org/wikidata-exports/miga/#_cat=Classes/All%20superclas...
Seems that May is most popular so far. Thanks to whoever added the pretty pictures :-) You can replace the word "month" with other things, such as "band", "building", or "mythical character" to see what kinds of these things we have. In fact, the individual pages for the classes will also show the same list at their bottom, but without any pictures.
(2) Classes with the same English label are no longer confused. This fixes ambiguities and wrong links for many things.
The data is from 1st September.
Cheers,
Markus
[1] http://tools.wmflabs.org/wikidata-exports/miga/ Reload the page (CTRL+R) to get the new data.
Hey,
\o/
Where are the source code and issue tracker for this? Probably good if those where linked from the tool.
If you load this in Firefox, it spends several seconds loading, after which one gets the "use another browser" error. Would be nice if this was shown before the rest was loaded. Of course it'd be much nicer if the biggest free browser could also be supported.
That first shows "population". When then clicking on the link, you see the data type is quantity, not string.
Cheers
-- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3
On 08.09.2014 14:27, Jeroen De Dauw wrote:
Hey,
\o/
Where are the source code and issue tracker for this? Probably good if those where linked from the tool.
True, but it's not quite in our master branch yet: the code is part of the extended WDTK examples module, see
https://github.com/Wikidata/Wikidata-Toolkit/tree/cleaner-examples/wdtk-exam...
This currently depends on the yet-to-be-completed branch of WDTK that has the support for the new JSON dumps and format:
https://github.com/Wikidata/Wikidata-Toolkit/pull/91
Right now, this still needs more testing before it can be merged. Because of the change in the XML dump format, the master branch of WDTk is not currently able to process any of the recent dumps, hence the example would not work there.
Anyway, you could use the Wikidata Toolkit issue tracker already.
If you load this in Firefox, it spends several seconds loading, after which one gets the "use another browser" error. Would be nice if this was shown before the rest was loaded. Of course it'd be much nicer if the biggest free browser could also be supported.
Yes, this is because of the tool we use (Miga), which is not part of our code. If you were asking for this above, you could have a look at http://migadv.com/.
That first shows "population". When then clicking on the link, you see the data type is quantity, not string.
Yes, I think this is a bug in how we use IRIs and labels for datatypes. The main id now is the label, but the properties all use the IRI to refer to the datatype. Seems that this does not work properly in Miga. I will try to make a version with the labels used everywhere.
Markus
On 08.09.2014 14:53, Markus Krötzsch wrote: ...
That first shows "population". When then clicking on the link, you see the data type is quantity, not string.
Yes, I think this is a bug in how we use IRIs and labels for datatypes. The main id now is the label, but the properties all use the IRI to refer to the datatype. Seems that this does not work properly in Miga. I will try to make a version with the labels used everywhere.
Ok, fixed in the code and on the Web (requires reload). The only bug we still have is that the recent Monolingual Text datatype is not known yet. It appears as "Unknown" in properties. Will be fixed when we have it in the parser.
Markus
How are "related properties" calculated?
Is the definition of a Class "something that has a subclass relationship?" Or?
Very cool...
-Ben
On Mon, Sep 8, 2014 at 9:24 AM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
On 08.09.2014 14:53, Markus Krötzsch wrote: ...
That first shows "population". When then clicking on the link, you see the data type is quantity, not string.
Yes, I think this is a bug in how we use IRIs and labels for datatypes. The main id now is the label, but the properties all use the IRI to refer to the datatype. Seems that this does not work properly in Miga. I will try to make a version with the labels used everywhere.
Ok, fixed in the code and on the Web (requires reload). The only bug we still have is that the recent Monolingual Text datatype is not known yet. It appears as "Unknown" in properties. Will be fixed when we have it in the parser.
Markus
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On 08.09.2014 19:02, Benjamin Good wrote:
How are "related properties" calculated?
Let me start with the second question:
Is the definition of a Class "something that has a subclass relationship?" Or?
Basically yes: a class is something that participates in a "subclass of" relation, or that is used as a value for "instance of". Moreover, for the display, I filter out all the classes that only occur as a subclass in "subclass of" (no own instances or subclasses) to reduce data size a bit.
I calculate related properties for properties and classes. For classes, I look at the items that are "instance of" the class (direct instances, suclass of is ignored). For properties, I look at the items that have a statement with this property.
For each of the items I look at, I count how often other properties (potentially "related properties") occur in their statements. From this I can compute which ratio of the items (in a class or with a property) have some other property. If this ratio is notably higher than the ratio of overall items using the property, then I consider it as related.
The idea is to find properties that are "typical": they should be notably more likely to occur with an item in this class than they would be in general. This also helps to filter out properties that occur everywhere (boring ones, like "image" or "freebase identifier"): they are most frequent but not what you want to know most about when you lok at a specific class. I have some custom scoring function in the code that I tweaked to adjust this until it seemed right, but there is no deeper principle behind this.
Very cool...
Thanks :-)
Markus
-Ben
On Mon, Sep 8, 2014 at 9:24 AM, Markus Krötzsch <markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org> wrote:
On 08.09.2014 14:53, Markus Krötzsch wrote: ... > http://tools.wmflabs.org/__wikidata-exports/miga/#_item=__1204 <http://tools.wmflabs.org/wikidata-exports/miga/#_item=1204> That first shows "population". When then clicking on the link, you see the data type is quantity, not string. Yes, I think this is a bug in how we use IRIs and labels for datatypes. The main id now is the label, but the properties all use the IRI to refer to the datatype. Seems that this does not work properly in Miga. I will try to make a version with the labels used everywhere. Ok, fixed in the code and on the Web (requires reload). The only bug we still have is that the recent Monolingual Text datatype is not known yet. It appears as "Unknown" in properties. Will be fixed when we have it in the parser. Markus _________________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/__mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l