On 01/07/14 22:33, David Cuenca wrote:
Markus, could your algorithm work together with human
direction? Like,
if we entered which properties are common for a class, and then a user
creates an instance of that class, would the algorithm be able to sort
those properties based on how often they appear on the database?
My algorithm is all about *detecting* "which properties are common for a
class". If you want this to be entered by humans instead, that's fine
too, but then you don't need an algorithm. Sorting a list of properties
by how often they appear in the database is easy to do. My algorithm
does not do this though, because the most often used property is usually
not the most intersting one (for instance, many classes are related with
Freebase IDs, but you don't want this to be the first suggestion you
get; I want the things that are "special" for the instances of a class
as compared to the rest of the data, not the things that are most common
overall).
Cheers,
Markus
Thanks,
Micru
On Tue, Jul 1, 2014 at 10:23 PM, Markus Krötzsch
<markus(a)semantic-mediawiki.org <mailto:markus@semantic-mediawiki.org>>
wrote:
On 01/07/14 22:14, Markus Krötzsch wrote:
...
(2) "Grade I listed building"
http://tools.wmflabs.org/__wikidata-exports/miga/?__classes#_cat=Classes/Id…
<http://tools.wmflabs.org/wikidata-exports/miga/?classes#_cat=Classes/Id=Q15700818>
Related properties: English Heritage list number, masts, Minor
Planet
Center observatory code, home port, coordinate location, OS grid
reference, mother house, architect, manager/director, Emporis ID,
MusicBrainz place ID, country, architectural style, visitors per
year,
Commons category, Structurae ID (structure), officially opened by,
floors above ground, inspired by, religious order, number of
platforms,
street, owned by, diocese
These are computed fully automatically from the data, with no manual
filtering or user input. But don't get me wrong -- great work!
Brilliant
to have such a thing integrated into the UI. In any case, my
algorithm
for computing the related properties is certainly very different
from
theirs; I am sure it also has its glitches.
P.S. One weakness of my algorithm you can already see: it has
troubles estimating the relevance of very rare properties, such as
"Minor Planet Center observatory code" above. A single wrong
annotation may then lead to wrong suggestions. Also, it seems from
my list under (2) that some Grade I listed buildings are ships. This
seems to be an error that is amplified by the fact that property
"masts" is used only 11 times in the dataset I evaluated (last
week's data). I guess the new property suggester rather errs on the
other side, being tricked into suggesting very frequent properties
even in places that don't need them.
-- Markus
_________________________________________________
Wikidata-l mailing list
Wikidata-l(a)lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
<https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
--
Etiamsi omnes, ego non