Hi,
On Fri, Dec 21, 2012 at 6:14 PM, Denny Vrandečić denny.vrandecic@wikimedia.de wrote:
Friedrich, the term "query answering" simply means the ability to answer queries against the database in Phase 3, e.g. the list of cities located in Ghana with a population over 25,000 ordered by population.
A query system that deals well with intervals -- I would need a pointer for that. For now I was always assuming to use a single value internally to answer such queries. If the values is 90+-20 then the query >100? would not contain that result. Sucks, but I don't know of any better system.
So taking the current data model and our eiffel tower example: We have the entity "Eiffel Tower". We want to represent "The Eiffel Tower is 324 meters high", so we would create the statement ( Assumptions: wikipedia pages as ids, I understood the wikidata object notation correctly, Number(upperlimit lowerlimit unit) is the object notation for numbers. I omitted the quantity field because i think its redundant at least in this example and the confidence could be added as an additional PropertySnak(?)) ) Statement(' http://en.wikipedia.org/wiki/Eiffel_Tower PropertyValueSnak(' http://en.wikipedia.org/wiki/Height Number(324 324 http://en.wikipedia.org/wiki/Metre) ') {reference and rank omitted} ')
(Note: I though I read somewhere it was decided that all statements on wikidata should at least have one reference, but in the object notation definition the {references} seems to imply this is a optional argument. Also the visual has a 0..* relation, did i miss something?)
Assuming a System of tables by unit and multiples i.e. (meter is m0) is used the table could look like this
Table m0:
propertyid | property | max | min | other information 1234 | Height | 324 | 324 | ... (or 323 | 325 )
an index could be put over min and max and property and the query for all buildings higher then 300 meters could start with:
SELECT propertyid FROM m0 WHERE property = Height AND min > 300 OR max > 300;
This would allow a query for things with a "Height" greater then 300 and it would even include things defined as 290+-20, since the max value would be over 300. The much harder thing imho is the "located in Ghana" part of the query, but I think there are such things as spatial queries for the big databases ( f.e. http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html ).
This would also work for queries on Dates with tables that have mindate, maxdate. ( appropo dates here is a short but interesting discussion on how they might be saved in databases if arbitrary dates are needed here: http://stackoverflow.com/a/2487792 )
Representation of "temporal knowledge" (?) seems to be a huge research topic anyway (f.e. http://www.math.unipd.it/~kvenable/RT/corso2009/Allen.pdf or http://www.cs.ox.ac.uk/boris.motik/pubs/nm03fuzzy.pdf, ...); where the problems about time intervals vs. time points, uncertainty and "vagueness" and their representation is discussed.
hope this helps,
Friedrich